A Skew Logistic Distribution for Modelling COVID-19 Waves and Its Evaluation Using the Empirical Survival Jensen–Shannon Divergence

Mark Levene

doi:10.3390/e24050600

Abstract

A novel yet simple extension of the symmetric logistic distribution is proposed by introducing a skewness parameter. It is shown how the three parameters of the ensuing skew logistic distribution may be estimated using maximum likelihood. The skew logistic distribution is then extended to the skew bi-logistic distribution to allow the modelling of multiple waves in epidemic time series data. The proposed skew-logistic model is validated on COVID-19 data from the UK, and is evaluated for goodness-of-fit against the logistic and normal distributions using the recently formulated empirical survival Jensen–Shannon divergence (

E S J S

) and the Kolmogorov–Smirnov two-sample test statistic (

K S 2

). We employ 95% bootstrap confidence intervals to assess the improvement in goodness-of-fit of the skew logistic distribution over the other distributions. The obtained confidence intervals for the

E S J S

are narrower than those for the

K S 2

on using this dataset, implying that the

E S J S

is more powerful than the

K S 2

.

Keywords:

empirical survival Jensen–Shannon divergence; Kolmogorov–Smirnov two-sample test; skew logistic distribution; bi-logistic growth; epidemic waves; COVID-19 data

1. Introduction

In exponential growth, the population grows at a rate proportional to its current size. This is unrealistic, since in reality, growth will not exceed some maximum, called its carrying capacity. The logistic equation [1] (Chapter 6) deals with this problem by ensuring that the growth rate of the population decreases once the population reaches its carrying capacity [2]. Statistical modelling of the logistic equation’s growth and decay is accomplished with the logistic distribution [3] and [4] (Chapter 22), noting that the tails of the logistic distribution are heavier than those of the ubiquitous normal distribution. The normal and logistic distributions are both symmetric, however, real data often exhibits skewness [5], which has given rise to extensions of the normal distribution to accommodate for skewness, as in the skew normal [6] and epsilon skew normal [7] distributions. Subsequently, skew logistic distributions were also devised, as in [8,9].

Epidemics, such as COVID-19, are traditionally modelled by compartmental models such as the SIR (Susceptible-Infected-Removed) model and its extension, the SEIR (Susceptible-Exposed-Infected-Removed) model, which estimate the trajectory of an epidemic [10]. These models typically rely on assumptions on how the disease is transmitted and progresses [11], and are routinely used to understand the consequences of policies such as mask wearing and social distancing [12]. Time series models [13], on the other hand, employ historical data to make forecasts about the future, are generally simpler than compartmental models, and are able to make forecasts on, for example, number of cases, hospitalisations and deaths. The SIR model can be interpreted as a logistic growth model [14,15]. However, as the data is inherently skewed, a skewed logistic statistical model would be a natural choice, although, as such, it does not rely on biological assumptions in its forecasts [16].

Herein, we present a novel yet simple (one may argue the simplest), three parameter skewed extension to the logistic distribution to allow for asymmetry; c.f. [16]. Nevertheless, if instead of our extension we deploy one of the other skew logistic distributions (such as the one described in [8]) the results would no doubt be comparable to the results we obtain herein; however, we pursue our simpler extension, detailing its statistical properties.

In the context of analysing epidemics, the logistic distribution is normally preferred, as it is a natural distribution to use in modelling population growth and decay. However, we still briefly mention a comparison of the results we obtain in modelling COVID-19 waves with the skew logistic distribution, to one which, instead, employs a skew normal distribution (more specifically we choose the, flexible, epsilon skew normal distribution [7]). The result of this comparison implies that utilising the epsilon skew normal distribution leads, overall, to results which are comparable to those when utilising the skew logistic distribution. However, in practice, it is still preferable to make use of the skew logistic distribution as it is the natural model to deploy in this context [17], since, on the whole, it is more consistent with the data as its tails are heavier than those of a skew normal distribution.

Epidemics are said to come in “waves”. The precise definition of a wave is somewhat elusive [18], but it is generally accepted that, assuming we have a time series of the number of, say, daily hospitalisations, a wave will span over a period from one valley (minima) in the time series to another valley, with a peak (maxima) in between them. There is no strict requirement that waves do not overlap, although, for simplicity we will not consider any overlap as such; see [18], for an attempt to give an operational definition of the concept of epidemic wave. In order to combine waves, we make use of the concept of bi-logistic growth [19,20], or more generally, multi-logistic growth, which allows us to sum two or more instances of logistic growth when the time series spans over more than a single wave.

To fit the skew logistic distribution to the time series data we employ maximum likelihood, and to evaluate the goodness-of-fit we make use of the recently formulated empirical survival Jensen–Shannon divergence (

E S J S

) [21,22] and the well-established Kolmogorov–Smirnov two-sample test statistic (

K S 2

) [23] (Section 6.3). The

E S J S

is an information-theoretic goodness-of-fit measure of a fitted parametric continuous distribution, which overcomes the inadequacy of the coefficient of determination,

R^{2}

, as a goodness-of-fit measure for nonlinear models [24]. The

K S 2

statistic also satisfies this criteria regarding

R^{2}

; however, we observe that the 95% bootstrap confidence intervals [25] we obtain for the

E S J S

are narrower than those for the

K S 2

, suggesting that the

E S J S

is more powerful [26] than the

K S 2

. Another well-known limitation of the

K S 2

statistic is that it is less sensitive to discrepancies at the tails of the distribution than the

E S J S

statistic is, in the sense that as opposed to

E S J S

it is “local”, i.e., its value is determined by a single point [27].

The rest of the paper is organised as follows. In Section 2, we introduce a skew logistic distribution, which is a simple extension of the standard, symmetric, logistic distribution obtained by adding to it a single skew parameter and derive some of its properties. In Section 3, we formulate the solution to the maximum likelihood estimation of the parameters of the skew logistic distribution. In Section 4, we make use of an extension of the skew logistic distribution to the bi-skew logistic distribution to model a time series of COVID-19 data items having more than a single wave. In Section 5, we provide analysis of daily COVID-19 deaths in the UK from 30 January 2020 to 30 July 2021, assuming the skew logistic distribution as an underlying model of the data. The evaluation of goodness-of-fit of the skew logistic distribution to the data makes use of the recently formulated

E S J S

, and compares the results to those when employing the

K S 2

instead. We observe that the same technique, which we applied to the analysis of COVID-19 deaths, can be used to model new cases and hospitalisations. Finally, in Section 6, we present our concluding remarks. It is worth noting that in the more general setting of information modelling, being able to detect epidemic waves may help supply chains in planning increased resistance to such adverse events [28]. We note that all computations were carried out using the Matlab software package.

2. A Skew Logistic Distribution

Here, we introduce a novel skew logistic distribution, which extends, in straightforward manner, the standard two parameter logistic distribution [3] and [4] (Chapter 22) by adding to it a skew parameter. The rationale for introducing the distribution is that, apart from its simple formulation, we believe that the maximum likelihood solution presented below is also simpler than those derived for other skew logistic distributions, such as the ones investigated in [8,9]. This point provides further justification for our skew logistic distribution when introducing the bi-skew logistic distribution in Section 4.

Now, let

μ

be a location parameter, s be a scale parameter and

λ

be a skew parameter, where

s > 0

and

0 < λ < 2

. Then, the probability density function of the skew logistic distribution at a value x of the random variable X, denoted as

f (x; λ, μ, s)

, is given by:

f (x; λ, μ, s) = κ_{λ} \frac{exp (- λ \frac{x - μ}{s})}{s {(1 + exp (- \frac{x - μ}{s}))}^{2}},

(1)

noting that for clarity we write

x - μ

above as a shorthand for

(x - μ)

, and

κ_{λ}

is a normalisation constant, which depends on

λ

.

When

λ = 1

, the skew logistic distribution reduces to the standard logistic distribution as in [3] and [4] (Chapter 22), which is symmetric. On the other hand, when

0 < λ < 1

, the skew logistic distribution is positively skewed, and when

1 < λ < 2

, it is negatively skewed, , and when

1 < λ < 2

, it is negatively skewed. So, when

λ = 1

,

κ_{λ} = 1

, and, for example, when

λ = 0.5

or

1.5

,

κ_{λ} = 2 / π

. For simplicity, from now on, unless necessary, we will omit to mention the constant

κ_{λ}

as it will not effect any of the results.

The skewness of a random variable X [4,5], is defined as:

E [{(\frac{X - μ}{s})}^{3}],

and thus, assuming for simplicity of exposition (due the linearity of expectations [5]) that

μ = 0

and

s = 1

, the skewness of the skew logistic distribution, denoted by

γ (λ)

, is given by:

γ (λ) = \int_{- \infty}^{\infty} x^{3} \frac{exp (- λ x)}{s {(1 + exp (- x))}^{2}} d x .

(2)

First, we will show that letting

λ_{1} = λ

, with

0 < λ_{1} < 1

, we have

γ (λ_{1}) > 0

, that is

f (x; λ_{1}, 0, 1)

is positively skewed. We can split the integral in (2) into two integrals for the negative part from

- \infty

to 0 and the positive part from 0 to ∞, noting that when

x = 0

, the expression to the right of the integral is equal to 0. Then, on setting

y = - x

for the negative part, and

y = x

for the positive part, the result follows, as by algebraic manipulation it can be shown that:

\frac{exp (- λ_{1} y)}{{(1 + exp (- y))}^{2}} > \frac{exp (λ_{1} y)}{{(1 + exp (y))}^{2}},

(3)

implying that

γ (λ_{1}) > 0

as required.

Second, in a similar fashion to above, on letting

λ_{2} = λ_{1} + 1 = λ

, with

1 < λ_{2} < 2

, it follows that

γ (λ_{2}) < 0

, that is

f (x; λ_{2}, 0, 1)

is negatively skewed. In particular, by algebraic manipulation we have that:

\frac{exp (- λ_{2} y)}{{(1 + exp (- y))}^{2}} < \frac{exp (λ_{2} y)}{{(1 + exp (y))}^{2}},

(4)

implying that

γ (λ_{2}) < 0

as required.

The cumulative distribution function of the skew logistic distribution at a value x of the random variable X is obtained by integrating

f (x; λ, μ, s)

, to obtain

F (x; μ, s, λ)

, which is given by:

\begin{matrix} F (x; λ, μ, s) = κ_{λ} exp (- (λ - 2) \frac{x - μ}{s}) & (\frac{1}{(1 + exp (\frac{x - μ}{s}))} - \\ \frac{λ - 1}{λ - 2}_{2} F_{1} (1, 2 - λ; 3 - λ; - exp (\frac{x - μ}{s}))), \end{matrix}

(5)

where

_{2} F_{1} (a, b; c; z)

is the Gauss hypergemoetric function [29] (Chapter 15); we assume

a, b

and c are positive real numbers, and that z is a real number extended outside the unit disk by analytic continuation [30].

The hypergeometric function has the following integral representation [29] (Chapter 15),

\frac{Γ (c)}{Γ (b) Γ (c - b)} \int_{0}^{1} \frac{t^{b - 1} {(1 - t)}^{c - b - 1}}{{(1 - t z)}^{a}} d t,

(6)

where

c > b

. Now, assuming without loss of generality that

μ = 0

and

s = 1

, we have that:

_{2} F_{1} (1, 2 - λ; 3 - λ; - exp (x)) = (2 - λ) \int_{0}^{1} \frac{t^{1 - λ}}{(1 + t exp (x))} d t,

(7)

where x is a real number.

Therefore, from (7) it can be verified that: (i)

_{2} F_{1} (1, 2 - λ; 3 - λ; - exp (x))

is monotonically decreasing with x, (ii) as x tends to plus infinity,

_{2} F_{1} (1, 2 - λ; 3 - λ; - exp (x))

tends to 0 and (iii) as x tends to minus infinity,

_{2} F_{1} (1, 2 - λ; 3 - λ; - exp (x))

tends to 1, since:

(2 - λ) \int_{0}^{1} t^{1 - λ} d t = 1 .

3. Maximum Likelihood Estimation for the Skew Logistic Distribution

We now formulate the maximum likelihood estimation [31] of the parameters

μ, s

and

λ

of the skew logistic distribution. Let

{x_{1}, x_{2}, \dots, x_{n}}

be a random sample of n values from the density function of the skew logistic distribution in (1). Then, the log likelihood function of its three parameters is given by:

ln L (λ, μ, s) = - n ln (s) - \frac{λ}{s} \sum_{i = 1}^{n} (x_{i} - μ) - 2 \sum_{i = 1}^{n} ln (1 + exp (- \frac{x_{i} - μ}{s})) .

(8)

In order to solve the log likelihood function, we first partially differentiate

ln L (λ, μ, s)

as follows:

\begin{matrix} \frac{\partial ln L (λ, μ, s)}{\partial λ} & = \sum_{i = 1}^{n} \frac{μ - x_{i}}{s}, \\ \frac{\partial ln L (λ, μ, s)}{\partial μ} & = \frac{λ n}{s} - \frac{2}{s} \sum_{i = 1}^{n} \frac{1}{1 + exp (\frac{x_{i} - μ}{s})} and \\ \frac{\partial ln L (λ, μ, s)}{\partial s} & = - \frac{n}{s} + \frac{1}{s^{2}} \sum_{i = 1}^{n} (x_{i} - μ) (λ - \frac{2}{1 + exp (\frac{x_{i} - μ}{s})}) . \end{matrix}

(9)

It is therefore implied that the maximum likelihood estimators are the solutions to the following three equations:

\begin{matrix} μ & = \frac{\sum_{i = 1}^{n} x_{i}}{n}, \\ λ & = \frac{2}{n} \sum_{i = 1}^{n} \frac{1}{1 + exp (\frac{x_{i} - μ}{s})} and \\ s & = \frac{1}{n} \sum_{i = 1}^{n} (x_{i} - μ) (λ - \frac{2}{1 + exp (\frac{x_{i} - μ}{s})}), \end{matrix}

(10)

which can be solved numerically.

We observe that the equation for

μ

in (10) does not contribute to solving the maximum likelihood, since the location parameter

μ

is equal to the mean only when

λ = 1

. We thus look at an alternative equation for

μ

, which involves the mode of the skew logistic distribution.

To derive the mode of the skew logistic distribution we solve the equation,

\frac{\partial}{\partial x} \frac{exp (- λ \frac{x - μ}{s})}{s {(1 + exp (- \frac{x - μ}{s}))}^{2}} = 0,

(11)

to obtain:

μ = x - s log (- \frac{λ - 2}{λ}) .

(12)

Thus, motivated by (12) we replace the equation for

μ

in (10) with:

μ = m - s log (- \frac{λ - 2}{λ}),

(13)

where m is the mode of the random sample.

4. The Bi-Skew Logistic Distribution for Modelling Epidemic Waves

We start by defining the bi-skew logistic distribution, which will enable us to model more than one wave of infections at a time. We then discuss how we partition the data into single waves, in a way that we can apply the maximum likelihood from the previous section to the data in a consistent manner.

We present the bi-skew logistic distribution, which is described by the sum,

f (x; λ_{1}, μ_{1}, s_{1}) + f (x; λ_{2}, μ_{2}, s_{2}),

of two skew logistic distributions. It is given in full as:

\frac{exp (- λ_{1} \frac{x - μ_{1}}{s_{1}})}{s_{1} {(1 + exp (- \frac{x - μ_{1}}{s_{1}}))}^{2}} + \frac{exp (- λ_{2} \frac{x - μ_{2}}{s_{2}})}{s_{2} {(1 + exp (- \frac{x - μ_{2}}{s_{2}}))}^{2}},

(14)

which characterises two distinct phases of logistic growth (c.f. [19,32]). We note that (14) can be readily extended to the general case of the sum of multiple skew logistic distributions; however, for simplicity, we only present the formula for the bi-skew logistic case. Thus, while the (single) skew logistic distribution can only model one wave of infected cases (or deaths, or hospitalisations), the bi-skew logistic distribution can model two waves of infections, and in the general cases, any number of waves.

In the presence of two waves, the maximum likelihood solution to (14), would give us access to the necessary model parameters, and solving the general case in the presence of multiple waves, when the sum in (14) may have two or more skew logistic distributions, is evidently even more challenging. Thus, we simplify the solution for the multiple wave case, and concentrate on an approximation assuming a sequential time series when one wave strictly follows the next. More specifically, we assume that each wave is modelled by a single skewed logistic distribution describing the growth phase until a peak is reached, followed by a decline phase; see [33] who consider epidemic waves in the context of the standard logistic distribution. Thus, a wave is represented by a temporal pattern of growth and decline, and the time series as a whole describes several waves as they evolve.

To provide further clarification of the model, we mention that the skew-bi logistic distribution is not a mixture model per se, in which case there is a mixture weight for each distribution in the sum, as in, say, a Gaussian mixture [34] (Chapter 9). In the bi-skew logistic distribution case we do not have mixture weights, rather, we have two phases in our context waves, which are sequential in nature, possibly with some overlap, as can be seen in Figure 1 (c.f. [19,32]). Strictly speaking, the bi-skew logistic distribution can be viewed as a mixture model where the mixture weights are each

0.5

and a scaling factor of 2 is applied. Thus, as an approximation, we add a preprocessing step where we segment the time series into distinct waves, resulting in a considerable reduction to the complexity of the maximum likelihood estimation. We do, however, remark that the maximum likelihood estimation for the bi-skew logistic distribution is much simpler than that of a corresponding mixture model, due to the absence of mixture weights. In particular, although we could, in principle, make use of the EM (expectation-maximisation) algorithm [34] (Chapter 9) and [35] to approximate the maximum likelihood estimates of the parameters, this would not be strictly necessary in the bi-skew logistic case, cf. [36]. The only caveat, which holds independently of whether the EM algorithm is deployed or not, is the additional number of parameters present in the equations being solved. We leave this investigation as future work, and focus on our approximation, which does not require the solution to the maximum likelihood of (14); the details of the preprocessing heuristic we apply are given in the following section.

Figure 1. Reported daily COVID-19 deaths from 30 January 2020 to 30 July 2021 and their minima labelled ‘*’, resulting in four distinct waves; a moving average with a centred sliding window of 7 days was applied to the raw data.

5. Data Analysis of COVID-19 Deaths in the UK

Here, we provide a full analysis of COVID-19 deaths in the UK from 30 January 2020 to 30 July 2021, employing the

E S J S

goodness-of-fit statistic and comparing it to the

K S 2

statistic. The daily UK COVID-19 data we used was obtained from [37].

As a proof of concept of the modelling capability of the skew logistic distribution, we now provide a detailed analysis of the time series of COVID-19 deaths in the UK from 30 January 2020 to 30 July 2021.

To separate the waves, we first smoothed the raw data using a moving average with a centred sliding window of 7 days. We then applied a simple heuristic, where we identified all the minima in the time series and defined a wave as a consecutive portion of the time, of at least 72 days, with the endpoints of each wave being local minima apart from the first wave, which starts from day 0. The resulting four waves in the time series are shown in Figure 1; see last column of Table 1 for the endpoints of the four waves. It would be worthwhile, as future work, to investigate other heuristics, which may, for example, allow overlap between the waves to obtain more accurate start and end points and to distribute the number of cases between the waves when there is overlap between them.

Table 1. Parameters from maximum likelihood fits of the skew logistic distribution to the four waves, and the day of the local minimum (End), which is the end point of the wave.

In Table 1, we show the parameters resulting from maximum likelihood fits of the skew logistic distribution to the four waves. Figure 2 shows histograms of the four COVID-19 waves, each overlaid with the curve of the maximum likelihood fit of the skew logistic distribution to the data. Pearson’s moment and median skewness coefficients [38] for the four waves are recorded in Table 2. It can be seen that the correlation between these and

1 - λ

is close to 1, as we would expect.

Figure 2. Histograms for the four waves of COVID-19 deaths from 30 January 2020 to 30 July 2021, each overlaid with the curve of the maximum likelihood fit of the skew logistic distribution to the data.

Table 2. Pearson’s moment and median skewness coefficients for the four waves, and the correlation between

1 - λ

and these coefficients.

We now turn to the evaluation of goodness-of-fit using the

E S J S

(empirical survival Jensen–Shannon divergence) [21,22], which generalises the Jensen–Shannon divergence [39] to survival functions, and the well-known

K S 2

(Kolmogorov–Smirnov two-sample test statistic) [23] (Section 6.3). We will also employ 95% bootstrap confidence intervals [25] to measure the improvement in the

E S J S

and

K S 2

, goodness-of-fit measures, of the skew-logistic over the logistic and normal distributions, respectively. For completeness, we formally define the

E S J S

and

K S 2

.

To set the scene, we assume a time series [40],

x = {x_{1}, x_{2}, \dots, x_{n}}

, where

x_{t}

, for

t = 1, 2, \dots, n

is a value indexed by time, t, in our case modelling the number of daily COVID-19 deaths. We are, in particular, interested in the marginal distribution of

x

, which we suppose comes from an underlying parametric continuous distribution D.

The empirical survival function of a value z for the time series

x

, denoted by

\hat{S} (x) [z]

, is given by:

\hat{S} (x) [z] = \frac{1}{n} \sum_{i = 1}^{n} I_{{x_{i} > z}},

(15)

where I is the indicator function. In the following, we will let

\hat{P} (z) = \hat{S} (x) [z]

stand for the empirical survival function

\hat{S} (x) [z]

, where the time series

x

is assumed to be understood from context. We will generally be interested in the empirical survival function

\hat{P}

, which we suppose arises from the survival function P of the parametric continuous distribution D, mentioned above.

The empirical survival Jensen–Shannon divergence (

E S J S

) between two empirical survival functions,

{\hat{Q}}_{1}

and

{\hat{Q}}_{2}

arising from the survival functions

Q_{1}

and

Q_{2}

, is given by:

E S J S ({\hat{Q}}_{1}, {\hat{Q}}_{2}) = \frac{1}{2} \int_{0}^{\infty} {\hat{Q}}_{1} (z) log (\frac{{\hat{Q}}_{1} (z)}{\hat{M} (z}) + {\hat{Q}}_{2} (z) log (\frac{{\hat{Q}}_{2} (z)}{\hat{M} (z)}) d z,

(16)

where:

\hat{M} (z) = \frac{1}{2} ({\hat{Q}}_{1} (z) + {\hat{Q}}_{2} (z)) .

We note that the

E S J S

is bounded and can thus be normalised, so it is natural to assume its values are between 0 and 1; in particular, when

{\hat{Q}}_{1} = {\hat{Q}}_{2}

its value is zero. Moreover, its square root is a metric [41], cf. [21].

The Kolmogorov–Smirnov two-sample test statistic between

{\hat{Q}}_{1}

and

{\hat{Q}}_{2}

as above, is given by:

K S 2 ({\hat{Q}}_{1}, {\hat{Q}}_{2}) = \underset{z}{m a x} | {\hat{Q}}_{1} (z) - {\hat{Q}}_{2} (z) |,

(17)

where

m a x

is the maximum function, and

| v |

is the absolute value of a number v. We note that

K S 2

is bounded between 0 and 1, and is also a metric.

For a parametric continuous distribution D, we let

ϕ = ϕ (D, \hat{P})

be the parameters that are obtained from fitting D to the empirical survival function,

\hat{P}

, using maximum likelihood estimation. In addition, we let

P_{ϕ} = S_{ϕ} (x)

be the survival function of

x

, for D with parameters

ϕ

. Thus, the empirical survival Jensen–Shannon divergence and the Kolmogorov–Smirnov two-sample test statistic, between

\hat{P}

and

P_{ϕ}

, are given by

E S J S (\hat{P}, P_{ϕ})

and

K S 2 (\hat{P}, P_{ϕ})

, respectively, where

\hat{P}

and

P_{ϕ}

are omitted below as they will be understood from context. These values provide us with two measures of goodness-of-fit for how well D with parameters

ϕ

is fitted to

x

[22].

We are now ready to present the results of the evaluation. In Table 3, we show the

E S J S

values for the four waves and the said improvements, while in Table 4, we show the corresponding

K S 2

values and improvements. In all cases, the skew logistic is a preferred model over both the logistic and normal distributions, justifying the addition of a skewness parameter as can be see in Figure 2. Moreover, in all but one case the logistic distribution was preferred over the normal distribution—wave 3, where the

K S 2

statistic of the normal distribution was smaller than that of the logistic distribution. We observe that, for the second wave, the

E S J S

and

K S 2

values for the skew logistic and logistic distribution were the closest, since, as can be seen from Table 1, the second wave was more or less symmetric, in which case the skew logistic distribution reduces to the logistic distribution.

Table 3.

E S J S

values for the skew logistic (SL), logistic (Logit) and normal (Norm) distributions, and the improvement percentage of the skew logistic over the logistic (SL-Logit) and normal (SL-Norm) distributions, respectively.

Table 4.

K S 2

values for the skew logistic (SL), logistic (Logit) and normal (Norm) distributions, and the improvement percentage of the skew logistic over the logistic (SL-Logit) and normal (SL-Norm) distributions, respectively.

In Table 5 and Table 6, we present the bootstrap 95% confidence intervals of the

E S J S

and

K S 2

improvements, respectively, using the percentile method, while in Table 7 and Table 8, we provide the 95% confidence intervals of the

E S J S

and

K S 2

improvements, respectively, using the bias-corrected and accelerated (BCa) method [25], which adjusts the confidence intervals for bias and skewness in the empirical bootstrap distribution. In all cases, the mean of the bootstrap samples is above zero with a very tight standard deviation. As noted above, the second wave is more or less symmetric, so we expect that the standard logistic distribution will provide a fit to the data, which is as good as the skew logistic fit. It is thus not surprising that in this case the improvement percentages are, generally, not significant. In addition, the improvements for the third wave are also, generally, not significant, which may be due to the starting point of the third wave, given our heuristic, being close to its peak; see Figure 1. We observe that, for this dataset, it is not clear whether deploying the BCa method yields a significant advantage over simply deploying the percentile method.

Table 5. Results from the percentile method for the confidence interval of the difference of the

E S J S

between the logistic (Logit) and skew logistic (SL), and between the normal (Norm) and skew logistic (SL) distributions, respectively; Diff, LB, UB, CI, Mean and STD stand for difference, lower bound, upper bound, confidence interval, mean of samples and standard deviation of samples, respectively.

Table 6. Results from the percentile method for the confidence interval of the difference of the

K S 2

between the logistic (Logit) and skew logistic (SL), and between the normal (Norm) and skew logistic (SL) distributions, respectively; Diff, LB, UB, CI, Mean and STD stand for difference, lower bound, upper bound, confidence interval, mean of samples and standard deviation of samples, respectively.

Table 7. Results from the BCa method for the confidence interval of the difference of the

E S J S

between the logistic (Logit) and skew logistic (SL), and between the normal (Norm) and skew logistic (SL) distributions, respectively; Diff, LB, UB, CI, Mean and STD stand for difference, lower bound, upper bound, confidence interval, mean of samples and standard deviation of samples, respectively.

Table 8. Results from the BCa method for the confidence interval of the difference of the

K S 2

between the logistic (Logit) and skew logistic (SL), and between the normal (Norm) and skew logistic (SL) distributions, respectively; Diff, LB, UB, CI, Mean and STD stand for difference, lower bound, upper bound, confidence interval, mean of samples and standard deviation of samples, respectively.

In Table 9, we show the mean and standard deviation statistics of the confidence interval widths, of the metrics we used to compare the distributions, implying that, in general, the

E S J S

goodness-of-fit measure is more powerful than the

K S 2

goodness-of-fit measure. This is based on the known result that statistical tests using measures resulting in smaller confidence intervals are normally considered to be more powerful, implying that a smaller sample size may be deployed [42].

Table 9. Mean and standard deviation (STD) statistics for the confidence interval (CI) widths using the percentile (P) and BCa methods.

As mentioned in the introduction, we obtained comparable results to the above when modelling epidemic waves with the epsilon skew normal distribution [7] as opposed to using the skew logistic distribution; see also [43] for a comparison of a skew logistic and skew normal distribution in the context of insurance loss data, showing that the skew logistic performed better than the skew normal distribution for fitting the datasets tested. Further to the note in the introduction that the skew logistic distribution is a more natural one to deploy in this case due to its heavier tails, we observe that in an epidemic scenario, the number of cases counted can only be non-negative, while the epsilon skew normal also supports negative values.

6. Concluding Remarks

We have proposed the skew-logistic and bi-logistic distributions as models for single and multiple epidemic waves, respectively. The model is a simple extension of the symmetric logistic distribution, which can readily be deployed in the presence of skewed data that exhibits growth and decay. We provided validation for the proposed model using the

E S J S

as a goodness-of-fit statistic, showing that it is a good fit to COVID-19 data in UK and more powerful than the alternative

K S 2

statistic. As future work, we could use the model to compare the progression of multiple waves across different countries, extending the work of [16].

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Publicly available datasets were analysed in this study. This data can be found here: https://coronavirus.data.gov.uk/details/download, accessed on 20 March 2022.

Conflicts of Interest

The author declares no conflict of interest.

References

Bacaër, N. A Short History of Mathematical Population Dynamics; Springer: London, UK, 2011. [Google Scholar]
Panik, M. Growth Curve Modeling: Theory and Applications; John Wiley & Sons: Hoboken, NJ, USA, 2014. [Google Scholar]
Johnson, N.; Kotz, S.; Balkrishnan, N. Continuous Univariate Distributions, Volume 2; Wiley Series in Probability and Mathematical Statistics; John Wiley & Sons: New York, NY, USA, 1995; Chapter 23 Logistic Distribution; pp. 113–163. [Google Scholar]
Krishnamoorthy, K. Handbook of Statistical Distributions with Applications, 2nd ed.; CRC Press: Boca Raton, FL, USA, 2015. [Google Scholar]
DasGupta, A. Fundamentals of Probability: A First Course; Springer Texts in Statistics; Springer Science+Business Media: New York, NY, USA, 2010. [Google Scholar]
Azzalini, A.; Capitanio, A. The Skew-Normal and Related Families; Institute of Mathematical Statistics Monographs, Cambridge University Press: Cambridge, UK, 2014. [Google Scholar]
Mudholkar, G.; Hutson, A. The epsilon–skew–normal distribution for analyzing near-normal data. J. Stat. Plan. Inference 2000, 83, 291–309. [Google Scholar] [CrossRef]
Nadarajah, S. The skew logistic distribution. AStA Adv. Stat. Anal. 2009, 93, 187–203. [Google Scholar] [CrossRef]
Sastry, D.; Bhati, D. A new skew logistic distribution: Properties and applications. Braz. J. Probab. Stat. 2016, 30, 248–271. [Google Scholar] [CrossRef]
Li, M. An Introduction to Mathematical Modeling of Infectious Diseases; Mathematics of Planet Earth, Springer Nature: Cham, Switzerland, 2018. [Google Scholar]
Ioannidis, J.; Cripps, S.; Tanner, M. Forecasting for COVID-19 has failed. Int. J. Forecast. 2022, 38, 423–438. [Google Scholar] [CrossRef]
Davies, N.; Kucharski, A.; Eggo, R.; Gimma, A.; Edmunds, W. Effects of non-pharmaceutical interventions on COVID-19 cases, deaths, and demand for hospital services in the UK: A modelling study. LANCET Public Health 2020, 5, e375–e385. [Google Scholar] [CrossRef]
Harvey, A.; Kattuman, P.; Thamotheram, C. Tracking the mutant: Forecasting and nowcasting COVID-19 in the UK in 2021. Natl. Inst. Econ. Rev. 2021, 256, 110–126. [Google Scholar] [CrossRef]
De la Sen, M.; Ibeas, A. On an Sir epidemic model for the COVID-19 Pandemic and the logistic equation. Discret. Dyn. Nat. Soc. 2020, 2020, 1382870. [Google Scholar] [CrossRef]
Postnikov, E. Estimation of COVID-19 dynamics “on a back-of-envelope”: Does the simplest SIR model provide quantitative parameters and predictions? Chaos Solitons Fractals 2020, 135, 109841-1–109841-6. [Google Scholar] [CrossRef]
Dye, C.; Cheng, R.; Dagpunar, J.; Williams, B. The scale and dynamics of COVID-19 epidemics across Europe. R. Soc. Open Sci. 2020, 7, 201726-1–201726-8. [Google Scholar] [CrossRef]
Pelinovsky, E.; Kurkin, A.; Kurkina, O.; Kokoulina, M.; Epifanova, A. Logistic equation and COVID-19. Chaos Solitons Fractals 2020, 140, 110241-1–110241-13. [Google Scholar] [CrossRef]
Zhang, S.; Marioli, F.; Gao, R. A second wave? What do people mean by COVID waves?—A working definition of epidemic waves. Risk Manag. Healthc. Policy 2021, 14, 3775–3782. [Google Scholar] [CrossRef]
Meyer, P. Bi-logistic growth. Technol. Forecast. Soc. Chang. 1994, 47, 89–102. [Google Scholar] [CrossRef]
Fenner, T.; Levene, M.; Loizou, G. A bi-logistic growth model for conference registration with an early bird deadline. Cent. Eur. J. Phys. 2013, 11, 904–909. [Google Scholar] [CrossRef][Green Version]
Levene, M.; Kononovicius, A. Empirical survival Jensen-Shannon divergence as a goodness-of-fit measure for maximum likelihood estimation and curve fitting. Commun. Stat.-Simul. Comput. 2021, 50, 3751–3767. [Google Scholar] [CrossRef]
Levene, M. A hypothesis test for the goodness-of-fit of the marginal distribution of a time series with application to stablecoin data. Eng. Proc. 2021, 5, 10. [Google Scholar]
Gibbons, J.; Chakraborti, S. Nonparametric Statistical Inference, 6th ed.; Marcel Dekker: New York, NY, USA, 2021. [Google Scholar]
Spiess, A.N.; Neumeyer, N. An evaluation of R² as an inadequate measure for nonlinear models in pharmacological and biochemical research: A Monte Carlo approach. BMC Pharmacol. 2010, 10, 6. [Google Scholar] [CrossRef] [PubMed]
Efron, B.; Tibshirani, R. An Introduction to the Bootstrap; Monographs on Statistics and Applied Probability 57; Springer Science+Business Media: New York, NY, USA, 1993. [Google Scholar]
Colegrave, N.; Ruxton, G. Power Analysis: An Introduction for the Life Sciences; Oxford Biology Primers; Oxford University Press: Oxford, UK, 2019. [Google Scholar]
Ben-David, A.; Liu, H.; Jackson, A. The Kullback-Leibler divergence as an estimator of the statistical properties of CMB maps. J. Cosmol. Astropart. Phys. 2015, 2015, JCAP06051. [Google Scholar] [CrossRef]
Semenov, I.; Jacyna, M. The synthesis model as a planning tool for effective supply chains resistant to adverse events. Maint. Reliab. 2022, 24, 140–152. [Google Scholar] [CrossRef]
Abramowitz, M.; Stegun, I. (Eds.) Handbook of Mathematical Functions with Formulas, Graphs and Mathematical Tables; Dover: New York, NY, USA, 1972. [Google Scholar]
Pearson, J.; Olver, S.; Porter, M. Numerical methods for the computation of the confluent and Gauss hypergeometric functions. Numer. Algorithms 2017, 74, 821–866. [Google Scholar] [CrossRef]
Ward, M.; Ahlquist, J. Maximum Likelihood for Social Science: Strategies for Analysis; Analytical Methods for Social Research; Cambridge University Press: Cambridge, UK, 2018. [Google Scholar]
Sheehy, J.; Mitchell, P.; Ferrer, A. Bi-phasic growth patterns in rice. Ann. Bot. 2004, 94, 811–817. [Google Scholar] [CrossRef]
Cliff, A.; Haggett, P. Methods for the measurement of epidemic velocity from time-series data. Int. J. Epidemiol. 1982, 11, 82–89. [Google Scholar] [CrossRef] [PubMed]
Bishop, C. Pattern Recognition and Machine Learning; Information Science and Statistics; Springer Science+Business Media: New York, NY, USA, 2006. [Google Scholar]
Redner, R.; Walker, H. Mixture densities, maximum likelihood and the Em algorithm. SIAM Rev. 1984, 26, 195–239. [Google Scholar] [CrossRef]
MacDonald, I. Is EM really necessary here? Examples where it seems simpler not to use EM. AStA Adv. Stat. Anal. 2021, 105, 629–647. [Google Scholar] [CrossRef]
GOV.UK. Coronavirus (COVID-19) in the UK, Download Data. 2021. Available online: https://coronavirus.data.gov.uk/details/download (accessed on 18 August 2021).
Doane, D.; Seward, L. Measuring skewness A forgotten statistic? J. Stat. Educ. 2011, 19, 1–19. [Google Scholar] [CrossRef]
Lin, J. Divergence measures based on the Shannon entropy. IEEE Trans. Inf. Theory 1991, 37, 145–151. [Google Scholar] [CrossRef]
Chatfield, C.; Xing, H. The Analysis of Time Series: An Introduction with R, 7th ed.; Text in Statistical Science; Chapman & Hall: London, UK, 2019. [Google Scholar]
Nguyen, H.; Vreeken, J. Non-parametric Jensen-Shannon divergence. In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD), Porto, Portugal, 7–11 September 2015; pp. 173–189. [Google Scholar]
Liu, X. Comparing sample size requirements for significance tests and confidence intervals. Couns. Outcome Res. Eval. 2013, 4, 3–12. [Google Scholar] [CrossRef]
Kazemi, R.; Noorizadeh, M. A comparison between skew-logistic and skew-normal distributions. Matematika 2015, 31, 15–24. [Google Scholar]

Figure 1. Reported daily COVID-19 deaths from 30 January 2020 to 30 July 2021 and their minima labelled ‘*’, resulting in four distinct waves; a moving average with a centred sliding window of 7 days was applied to the raw data.

Figure 2. Histograms for the four waves of COVID-19 deaths from 30 January 2020 to 30 July 2021, each overlaid with the curve of the maximum likelihood fit of the skew logistic distribution to the data.

Table 1. Parameters from maximum likelihood fits of the skew logistic distribution to the four waves, and the day of the local minimum (End), which is the end point of the wave.

Fitted Parameters for the Skew Logistic Distribution
Wave	$λ$	$μ$	$s$	End
1	0.2150	3.5137	3.8443	71
2	1.0741	196.5157	14.4323	239
3	0.2297	243.0709	4.5882	334
4	1.7306	502.2758	7.0195	532

Table 2. Pearson’s moment and median skewness coefficients for the four waves, and the correlation between

1 - λ

and these coefficients.

Table 2. Pearson’s moment and median skewness coefficients for the four waves, and the correlation between

1 - λ

and these coefficients.

Skewness
Wave	$1 - λ$	Moment	Median
1	0.7850	0.9314	0.2939
2	−0.0741	−0.7758	−0.0797
3	0.7703	0.9265	0.1939
4	−0.7306	−1.5555	−0.2413
Correlation		0.9931	0.9826

Table 3.

E S J S

values for the skew logistic (SL), logistic (Logit) and normal (Norm) distributions, and the improvement percentage of the skew logistic over the logistic (SL-Logit) and normal (SL-Norm) distributions, respectively.

Table 3.

E S J S

values for the skew logistic (SL), logistic (Logit) and normal (Norm) distributions, and the improvement percentage of the skew logistic over the logistic (SL-Logit) and normal (SL-Norm) distributions, respectively.

$E SJS$ Values for SL, Logit and Norm Distributions
Wave	SL	Logit	SL-Logit	Norm	SL-Norm
1	0.0419	0.0583	28.25%	0.0649	35.54%
2	0.0392	0.0448	12.52%	0.0613	36.17%
3	0.0316	0.0387	18.38%	0.0423	25.38%
4	0.0237	0.0927	74.47%	0.0939	74.79%

Table 4.

K S 2

values for the skew logistic (SL), logistic (Logit) and normal (Norm) distributions, and the improvement percentage of the skew logistic over the logistic (SL-Logit) and normal (SL-Norm) distributions, respectively.

Table 4.

K S 2

values for the skew logistic (SL), logistic (Logit) and normal (Norm) distributions, and the improvement percentage of the skew logistic over the logistic (SL-Logit) and normal (SL-Norm) distributions, respectively.

$KS 2$ Values for SL, Logit and Norm Distributions
Wave	SL	Logit	SL-Logit	Norm	SL-Norm
1	0.0621	0.1245	50.14%	0.1280	51.50%
2	0.0357	0.0391	8.57%	0.0420	15.01%
3	0.0571	0.0930	38.66%	0.0854	33.18%
4	0.0098	0.0817	87.98%	0.1046	90.61%

Table 5. Results from the percentile method for the confidence interval of the difference of the

E S J S

between the logistic (Logit) and skew logistic (SL), and between the normal (Norm) and skew logistic (SL) distributions, respectively; Diff, LB, UB, CI, Mean and STD stand for difference, lower bound, upper bound, confidence interval, mean of samples and standard deviation of samples, respectively.

Table 5. Results from the percentile method for the confidence interval of the difference of the

E S J S

between the logistic (Logit) and skew logistic (SL), and between the normal (Norm) and skew logistic (SL) distributions, respectively; Diff, LB, UB, CI, Mean and STD stand for difference, lower bound, upper bound, confidence interval, mean of samples and standard deviation of samples, respectively.

Percentile Confidence Intervals for $E SJS$ Improvement
Wave/Diff	LB of CI	UB of CI	Width of CI	Mean	STD
1/SL-Logit	0.0093	0.0317	0.0224	0.0211	0.0063
1/SL-Norm	0.0170	0.0382	0.0212	0.0278	0.0063
2/SL-Logit	−0.0010	0.0066	0.0076	0.0034	0.0049
2/SL-Norm	0.0154	0.0232	0.0078	0.0201	0.0051
3/SL-Logit	−0.0028	0.0112	0.0140	0.0083	0.0022
3/SL-Norm	0.0021	0.0149	0.0128	0.0120	0.0022
4/SL-Logit	0.0549	0.0810	0.0261	0.0714	0.0068
4/SL-Norm	0.0560	0.0821	0.0261	0.0722	0.0070

Table 6. Results from the percentile method for the confidence interval of the difference of the

K S 2

between the logistic (Logit) and skew logistic (SL), and between the normal (Norm) and skew logistic (SL) distributions, respectively; Diff, LB, UB, CI, Mean and STD stand for difference, lower bound, upper bound, confidence interval, mean of samples and standard deviation of samples, respectively.

Table 6. Results from the percentile method for the confidence interval of the difference of the

K S 2

between the logistic (Logit) and skew logistic (SL), and between the normal (Norm) and skew logistic (SL) distributions, respectively; Diff, LB, UB, CI, Mean and STD stand for difference, lower bound, upper bound, confidence interval, mean of samples and standard deviation of samples, respectively.

Percentile Confidence Intervals for $KS 2$ Improvement
Wave/Diff	LB of CI	UB of CI	Width of CI	Mean	STD
1/SL-Logit	0.0438	0.0760	0.0322	0.0621	0.0073
1/SL-Norm	0.0411	0.0821	0.0410	0.0684	0.0078
2/SL-Logit	0.0003	0.0047	0.0044	0.0033	0.0009
2/SL-Norm	0.0007	0.0092	0.0085	0.0065	0.0017
3/SL-Logit	−0.0073	0.0441	0.0514	0.0343	0.0082
3/SL-Norm	−0.0142	0.0365	0.0507	0.0267	0.0080
4/SL-Logit	0.0474	0.0728	0.0254	0.0680	0.0046
4/SL-Norm	0.0710	0.0962	0.0252	0.0905	0.0048

Table 7. Results from the BCa method for the confidence interval of the difference of the

E S J S

between the logistic (Logit) and skew logistic (SL), and between the normal (Norm) and skew logistic (SL) distributions, respectively; Diff, LB, UB, CI, Mean and STD stand for difference, lower bound, upper bound, confidence interval, mean of samples and standard deviation of samples, respectively.

Table 7. Results from the BCa method for the confidence interval of the difference of the

E S J S

between the logistic (Logit) and skew logistic (SL), and between the normal (Norm) and skew logistic (SL) distributions, respectively; Diff, LB, UB, CI, Mean and STD stand for difference, lower bound, upper bound, confidence interval, mean of samples and standard deviation of samples, respectively.

BCa Confidence Intervals for $E SJS$ Improvement
Wave/Diff	LB of CI	UB of CI	Width of CI	Mean	STD
1/SL-Logit	0.0087	0.0260	0.0173	0.0210	0.0062
1/SL-Norm	0.0165	0.0333	0.0168	0.0275	0.0063
2/SL-Logit	−0.0009	0.0258	0.0267	0.0036	0.0053
2/SL-Norm	0.0153	0.0425	0.0272	0.0201	0.0050
3/SL-Logit	−0.0024	0.0095	0.0119	0.0084	0.0023
3/SL-Norm	−0.0027	0.0135	0.0162	0.0119	0.0024
4/SL-Logit	0.0308	0.0703	0.0395	0.0708	0.0074
4/SL-Norm	0.0554	0.0713	0.0159	0.0726	0.0069

Table 8. Results from the BCa method for the confidence interval of the difference of the

K S 2

between the logistic (Logit) and skew logistic (SL), and between the normal (Norm) and skew logistic (SL) distributions, respectively; Diff, LB, UB, CI, Mean and STD stand for difference, lower bound, upper bound, confidence interval, mean of samples and standard deviation of samples, respectively.

Table 8. Results from the BCa method for the confidence interval of the difference of the

K S 2

between the logistic (Logit) and skew logistic (SL), and between the normal (Norm) and skew logistic (SL) distributions, respectively; Diff, LB, UB, CI, Mean and STD stand for difference, lower bound, upper bound, confidence interval, mean of samples and standard deviation of samples, respectively.

BCa Confidence Intervals for $K S 2$ Improvement
Wave/Diff	LB of CI	UB of CI	Width of CI	Mean	STD
1/SL-Logit	0.0428	0.0801	0.0373	0.0624	0.0074
1/SL-Norm	0.0444	0.0777	0.0333	0.0683	0.0078
2/SL-Logit	0.0005	0.0047	0.0042	0.0033	0.0008
2/SL-Norm	0.0001	0.0089	0.0088	0.0064	0.0017
3/SL-Logit	0.0013	0.0445	0.0432	0.0346	0.0077
3/SL-Norm	−0.0111	0.0368	0.0479	0.0263	0.0082
4/SL-Logit	0.0491	0.0739	0.0248	0.0676	0.0047
4/SL-Norm	0.0685	0.0985	0.0300	0.0908	0.0046

Table 9. Mean and standard deviation (STD) statistics for the confidence interval (CI) widths using the percentile (P) and BCa methods.

Summary Statistics for the CI Widths
Statistic	$E SJS$ -P	$KS 2$ -P	$E SJS$ -BCa	$KS 2$ -BCa
Mean	0.0172	0.0298	0.0214	0.0287
STD	0.0077	0.0176	0.0091	0.0155

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Article Metrics

Citations

Article Access Statistics

Journal Statistics

Multiple requests from the same IP address are counted as one view.