Next Article in Journal
Systemic Risk and Commercial Bank Stability in the Middle East and North Africa (MENA) Region
Previous Article in Journal
Advanced Operator Theory for Energy Market Trading: A New Framework
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Copula Modeling of COVID-19 Excess Mortality

Mathematics Department, University of St. Thomas, 2115 Summit Ave, St. Paul, MN 55105, USA
*
Author to whom correspondence should be addressed.
Risks 2025, 13(7), 119; https://doi.org/10.3390/risks13070119
Submission received: 14 April 2025 / Revised: 17 June 2025 / Accepted: 19 June 2025 / Published: 24 June 2025

Abstract

COVID-19’s effects on mortality are hard to quantify. Issues with attribution can cause problems with resulting conclusions. Analyzing excess mortality addresses this concern and allows for the analysis of broader effects of the pandemic. We propose separate ARIMA models to analyze excess mortality for several countries. For the model of joint excess mortality, we suggest vine copulas with Bayesian pair copula selection. This is a new methodology and after its discussion we offer an illustration. The present study examines weekly mortality data from 2019 to 2022 in the USA, Canada, France, Germany, Norway, and Sweden. Previously proposed ARIMA models have low lags and no residual autocorrelation. Only Norway’s residuals exhibited normality, while the remaining residuals suggest skewed Student t-distributions as a plausible fit. A vine copula model was then developed to model the association between the ARIMA residuals for different countries, with the countries farther apart geographically exhibiting weak or no association. The validity of fitted distributions and resulting vine copula was checked using 2023 data. Goodness of fit tests suggest that the fitted distributions were suitable, except for the USA, and that the vine copula used was also valid. We conclude that the time series models of COVID-19 excess mortality are viable. Overall, the suggested methodology seems suitable for creating joint forecasts of pandemic mortality for several countries or geographical regions.

1. Introduction

The COVID-19 pandemic began in late 2019, and quickly manifested itself as a massive increase in global mortality. However, there were problems related to attribution and causation. As such, when it comes to analyzing or modeling COVID-19 mortality data, there are two approaches. The first is to specifically use COVID-19-attributed mortality. This has the benefit of a clear causal structure, where patterns in the data can be more easily connected to the spread of the pandemic (Lei and Shemyakin 2023). However, there are notable problems with respect to proper attribution. Deaths related to COVID-19 are not attributed to the disease absent a positive COVID-19 test, which is not always possible in locations unable to test (Wang et al. 2022). This is to say nothing of other infrastructure issues or potentially missing deaths not directly caused by COVID-19, but instead by complications from an existing condition or a response to the pandemic (Ahamad et al. 2020; Zińczuk et al. 2023). However, using mortality data directly attributed to COVID-19 comes with the massive benefit of no ambiguity.
The second approach is to analyze overall mortality, usually via the concept of excess mortality. This is defined as the normalized difference between an expected (historical) death count and aggregate deaths (Britt et al. 2023). Although there is some ambiguity in calculating the expected number of deaths for a particular location, this approach has the benefit of capturing systemic effects the pandemic might have had (Martinez-Folgar et al. 2021; Zińczuk et al. 2023). For example, it allows data to include the effects of increased mortality in those with existing conditions of contracting COVID-19 (Martinez-Folgar et al. 2021), or possible increases in suicides (Yan et al. 2023). However, the data will also reflect a decrease in vehicle-related deaths due to lock-downs (Wang et al. 2022). This approach captures the net effect and accurately reflects the total effect of the pandemic. However, it also carries the possible risk of masking the magnitude of the positive effect on mortality.
Regardless, multiple approaches are used to model the resulting data. The Center for Disease Control (CDC) has suggested ensemble models (Johannson et al. 2020), used also in Imperial College COVID Response Team (2022), and in Wang et al. (2022). Martinez-Folgar et al. (2021) and Basellini et al. (2021) used generalized additive models to relate different demographic or location data with mortality. These models usually address multiple covariates and require massive data collection. The approach of this paper is different. It combines ARIMA analysis of excess mortality for different geographical areas, and then applies vine copulas with Bayesian pair copula model selection to study the dependence patterns between these areas. Therefore, the model is based on open source mortality data and does not require a detailed analysis of mortality covariates.
ARIMA models have frequently been used in COVID studies. For listed examples, see Britt et al. (2023) and Wang et al. (2022). Copula models have also been used to determine the relationships between mortality and other time series data, such as the correlation of interstate trends (Kim 2022), or by combining mortality data with temperature (Alanazi 2021); however, they were rarely used in conjunction with ARIMA modeling.
Here, we follow the method laid out by Lei and Shemyakin (2023), in which ARIMA models are developed for individual locations, and the model residuals are then related to each other via copula analysis. This allows for seasonality and intra-country effects to be accounted for before addressing cross-correlation between countries. This also allows for the non-normal residuals, which while technically a violation of ARIMA assumptions, allows for the interpretation of fat-tailed residual distributions as an indication of a more complicated dependence structure.
The present paper has an objective of analyzing excess mortality time series Y t ( i ) for different countries i = 1 , , k during the period of T weeks t = 1 , , T and modeling the dependence patterns in the vector ( Y t ( 1 ) , , Y t ( k ) ) ) through the ARIMA residuals ( ε t ( 1 ) , , ε t ( k ) ) ) .
Mortality statistics are particularly difficult to compare across countries for several reasons. First and foremost, different countries have different standards for recording deaths. For example, England records only the date a death is “registered,” while the United States records mortality statistics using the date of a death (Basellini et al. 2021; National Center for Health Statistics 2025). This means comparisons involving countries who do not record the date of a death are difficult, as actual mortality experience will not be reflected in the data. While a close reconstruction of weekly data is possible (see Martinez-Folgar et al. 2021), it still leaves open the potential problem of a death being registered several weeks after it occurs, making assigning the week it occurred impossible. Second, countries differ in how they define a “week” and how many there are in the calendar year. For example, the European countries in this study (France, Germany, Norway, Sweden) record their weekly mortality data as the sum of deaths occurring from Monday-Sunday, while the United States and Canada record theirs as the sum of deaths from Sunday to Saturday (National Center for Health Statistics 2025). This makes interpretations of resulting models somewhat weaker, but absent massive spikes for one day only, it should not affect overall trends.

2. ARIMA

Box–Jenkins models, more commonly known as ARIMA models, stand for autoregressive integrated moving average models of a time series. They have lag (order) p = 1 , 2 , for the single variable time series Y t , t = 1 , , T ,
Z t = β 0 + β 1 · Z t 1 + + β p · Z t p + ε t , ε t N ( 0 , σ 2 ) ,
integrated with the moving average model with lag q = 1 , 2 , ,
Z t = α 0 + α 1 · ε t 1 + + α q · ε t q , ϵ t N ( 0 , σ 2 ) ,
which is applied to the differences Z t of Y t with order d = 0 , 1 , 2 , ,
Z t = D d Y t , D Y t = Y t Y t 1 , D d Y t = D D d 1 Y t ,
where d > 0 allows the specification of non-stationary models as defined as ARIMA ( p , d , q ) . Here, d = 0 corresponds to the the stationary model ARIMA ( p , 0 , q ) also known as ARMA ( p , q ) .
ARIMA model selection requires the estimation both of p , d , q and the subsequent p + q + 1 parameters in the regression models. This is usually performed via Bayesian or maximum likelihood methods. This yields fitted values of Y ^ t for t = 1 , , T and residuals ε ^ t = Y ^ t Y t .
Unit root tests or other stationarity tests can be used to determine the differencing order d, which can be further informed by the behavior of the ACF or PACF of the time series data. Afterwards, the lag order parameters can be determined via information criterion, such as the Akaike and Bayesian (Schwarz) information criteria. Note that in the case of either the AIC or BIC, there is a penalty term for the number of parameters, leading to more parsimonious models if the information criteria are used for model selection. Note also that changing the value of d does not change the number of parameters to be estimated.

Distribution Analysis of ARIMA Residuals

In general, ARIMA methods are efficient in the assumption of normality of the residuals ε ^ t . However, in many applications, especially survival analysis and finance, one has to deal with asymmetric and fat-tailed residual distributions failing the normality assumption. Therefore, a skewed t-distribution model, such as the one put forward by Fernandez and Steel (1996) may be suitable to the describe the distribution of residuals. The PDF defined therein is as follows.
p ε t ( y ) = 2 ξ + 1 ξ Γ ( v + 1 2 ) Γ ( v 2 ) ( π v ) 1 / 2 ( σ ) 1 [ 1 + ( y μ ) 2 v ( σ 2 ) ( 1 ξ 2 I 0 , ( y μ ) + ξ 2 I ( , 0 ) ( y σ ) ) ] v + 1 2 .
Regardless, fitted distributions should be compared to residual data to ensure accuracy. A common goodness-of-fit test is the Kolmogorov–Smirnov test, which compares empirical CDFs between two different samples of data and/or distributions. Once a distribution has been chosen, the residuals can be appropriately modeled as random variables.
In the case of several dependent time series Y t ( i ) , i = 1 , , k , k separate models can be developed for the marginal distributions of ε t ( i ) , which will help further construction of the joint distribution of ( ε t ( 1 ) , , ε t ( k ) ) ) , which is the ultimate goal.

3. Copula Analysis

Copula analysis is commonly used to model non-linear statistical dependence between two or more random variables. Copulas are special functions that can describe dependence of random variables as an association between their marginal distributions. In the present paper, copula analysis is applied to model the joint distribution of ARIMA residuals ( ε ( 1 ) , , ε ( k ) ) ) using the marginal distributions obtained in the previous section.
Let X , Y be random variables, with CDFs u = F X ( x ) and v = F Y ( y ) . Then their joint distribution of P ( X x , Y y ) can be represented using a copula function C ( u , v | r ) , where r is some set of parameters measuring the strength of dependence between the two variables.
Sklar’s theorem states that any copula function C ( u , v ) of u = F X ( x ) and v = F Y ( y ) is a valid joint CDF, and also, any joint distribution function F X , Y ( x , y ) can be represented as a copula function of its marginals. Therein lies the advantage of copulas; the copula framework allows for modeling of the joint distribution in two steps. First, one models the marginals, and then, one uses an appropriate copula function for modeling their association.
There are many different types of copulas to be used for this purpose. For an in-depth list and definitions, see Brechmann (2010). The most popular copulas used in practice are Archimedean copulas or elliptical copulas. The former are easier to estimate parameters for, while the latter are easier to extend to higher dimensionality, e.g., more marginals.
The most popular one-parametric Archimedean pair copulas combining marginal distribution functions and their dual versions combining marginal survival functions are
  • Clayton’s Copula
    P ( X x , Y y ) = C 1 ( u , v | τ ) = max ( u 2 τ τ 1 + v 2 τ τ 1 1 ) τ 1 2 τ , 0 , 1 < τ < 1 .
  • Gumbel–Hougaard’s Copula
    P ( X x , Y y ) = C 2 ( u , v | τ ) = exp l n ( u ) 1 1 τ + l n ( v ) 1 1 τ 1 τ , 0 τ < 1 .
  • Dual (Survival) Clayton’s Copula
    P ( X x , Y y ) = C 3 ( 1 u , 1 v | τ ) = 1 u v + C 1 ( u , v | τ ) , 0 τ < 1 .
  • Dual (Survival) Gumbel–Hougaard’s Copula
    P ( X x , Y y ) = C 4 ( 1 u , 1 v | τ ) = 1 u v + C 2 ( u , v | τ ) , 0 τ < 1 .
The first and fourth choices are especially good for addressing the lower-tail dependence, while the second and the third work well for the upper-tail dependence.
Several techniques exist for estimating copula parameters. One way of doing it is to use non-parametric measures of sample correlation, namely Kendall’s concordance τ or Spearman’s ρ , as many two-parameter copulas and all single-parameter copulas can have their parameters expressed as a measure of non-parametric correlation, allowing for a direct substitution (Brechmann 2010). This relationship also allows for Bayesian analysis based on sample correlation. This is discussed later. Another approach is to use maximum likelihood estimation, though this could carry computational issues relating to a lack of closed-form estimators (Brechmann 2010; Huang and Shemyakin 2020). For other parametric approaches, see Brechmann et al. (2012) and Manner (2007). For a non-parametric method, see Manner (2007).
Methods used to determine copula selection will be discussed later.

3.1. Vine Copulas

Elliptical copulas allow for a logical extension from dimension two discussed above to higher dimensions. A one-parameter Archimedean copula rarely provides an adequate model in the multivariate case due to its requirement of symmetric association between all pairs of variables. In case of different degree of association, a vine copula, also known as a pair-copula structure, may be preferable. A vine is a graphical tool for establishing the dependence structure in high-dimensional probability distributions. A regular vine is a special case for which all constraints are two-dimensional or conditional two-dimensional. Regular vines generalize trees. Vine copulas work by establishing the structure of association between the variables, where individual edges correspond to different pair copulas. Using Sklar’s theorem, the joint distribution of the data can be represented using a copula function of the marginals.
F ( x , y , z , ) = C X , Y , Z , [ F X ( x ) , F Y ( y ) , F Z ( z ) , ] .
From here, differentiation yields the following expression, involving a copula density c X , Y , Z and marginal densities:
f ( x , y , z , ) = c X , Y , Z , [ F X ( x ) , F Y ( y ) , F Z ( z ) , ] · f X ( x ) f Y ( y ) f z ( z ) .
For two variables, this simplifies to expression
f ( x , y ) = c X , Y [ F X ( x ) F Y ( y ) ] · f X ( x ) f Y ( y ) ,
which using basic properties of conditional probability can be rewritten as
f X | Y ( x | y ) = c X , Y [ F X ( x ) , F Y ( y ) ] · f X ( x ) .
Extending this to three variables yields the following result involving the conditional copula:
f X , Z | Y ( x , z | y ) = c X , Z | Y [ F X | Y ( x | y ) , F Z | Y ( z | y ) ; y ] · f X | Y ( x | y ) · f Z | Y ( z | y ) ,
which in turn can be extended to much higher dimensions. For that and more details, see Aas et al. (2009) and Bedford and Cooke (2001). Regardless, this implies that any joint distribution can be represented as the product of marginals, pair copulas of the component vectors, and conditional pair copulas. In the case of independence between two variables,
c X , Y [ F X ( x ) , F Y ( y ) ] = 1 ,
substantially simplifying the resulting structure.
Note that when working with vine copulas, the structure of the model must be specified before pair copulas can be estimated. In other words, it has to be determined first which variables are independent of each other, which variables are conditioned on the others, and in what order. For a detailed explanation why this approach is advantageous, see Aas et al. (2009) and Bedford and Cooke (2001). To estimate the vine structure, one can use the method put forth by Dissmann et al. (2013). First, the unconditional copulas are selected from the list of all possible structures, which can be exhaustive, based on which structure minimizes the reference statistic, such as AIC or BIC. Then pair copulas and their parameters are estimated for each non-independent pair. Then a variable is selected to be conditioned on, and the process repeats until all variables are exhausted.

3.2. Pair Copula Selection

Once a model structure has been specified, there are several ways to select the optimal pair copula(s). Most involve specifying a potential set of hyperparameters defined for each copula family, and then comparing them. This can be performed using the AIC (Brechmann 2010; Brechmann and Schepsmeier 2013; Manner 2007), BIC (Brechmann 2010), or other information criterion. Various goodness-of-fit tests can also be used for this purpose, allowing their statistics to be compared to select a copula (Huard et al. 2006). However, as Huard et al. (2006) point out, this approach compares single copula models with given parameter values chosen from each parametric family, instead of selecting a copula based on multiple possible parameter values.
A solution to this is to select copulas using Bayesian inference. Wifvat et al. (2020) describes the following method, which was suggested in Huard et al. (2006) and also used in Shemyakin and Kniazev (2017). First, let H m : m = 1 , 2 , , M be the hypotheses that the data come from one of M copula families, and for each pair ( i , j ) , i , j = 1 , k test H k : F i j ( ε ( i ) , ε ( j ) ) = C m ( F j ( ε ( i ) ) , F l ( ε ( j ) ) ) . These hypotheses can be assumed to be mutually exclusive and exhaustive. Then, let τ be Kendall’s concordance. If all copulas considered can be written as functions of τ , the posterior probabilities of the hypotheses given by the data D = D ( i , j ) may be rewritten as
P ( H k | D ( i , j ) ) = P ( H m , τ | D ) d τ = P ( D | H m , τ ) P ( H m | τ ) π ( τ ) d τ P ( D ) ,
where π ( τ ) is the prior probability of τ . Wifvat et al. (2020) show that this method still yields good results even for vague or non-informative priors on τ . Since | τ | 1 and in case of positive dependence τ 0 , uniform Beta ( 1 , 1 ) will be a suitable choice for π . Since the posterior probabilities are only to be used for selection purposes, P ( D ) does not need to be calculated. With the discrete uniform prior on the hypothesis choice, it suffices to calculate the weights with c m , m = 1 , , M denoting respective copula p.d.f.:
W m ( i , j ) = 0 1 P ( D H m , τ ) π ( τ ) d τ = 0 1 Π t = 1 T c m ( F i ( ε t ( i ) ) , F j ( ε t ( j ) τ ) ) π ( τ ) d τ , m = 1 , M ,
or, using a Monte Carlo approach and drawing N samples from the uniform prior, evaluate
W ^ m ( i , j ) = 1 N r = 1 N Π t = 1 T c m ( F j ( e t ( i ) ) , F l ( e t ( j ) ) τ r ) .
and then the posteriors
P ^ ( H m | D ( i , j ) ) = W ^ m ( i , j ) m = 1 M W ^ m ( i , j ) ,
for each pair ( i , j ) , i . j = 1 , , k .

4. Case Study

As an illustration of the suggested methodology, let us consider an example of COVID-19 development in several countries of Europe and North America. As stated above, data from certain countries during the pandemic may be unreliable, due to lack of infrastructure, intentional misreporting, missing data, etc. This study chose to focus on mortality data from the United States, Canada, France, Germany, Norway, and Sweden because of the easy availability of their mostly reliable data recorded in a similar time frame. For each country in this study, the following sources were used: The United States’ mortality data were obtained through the Center Disease Control’s (CDC) website. Canada’s data were obtained through Statistics Canada (2025), and they adhere to the same standards as the US (Human Mortality Database 2025). The European countries’ data were obtained entirely through EuroStat, the official statistics body for the European Union (Eurostat 2025).
To compute excess mortality, the difference between each country’s pandemic data and historical data was recorded as a percentage of the historical data for a given week in a time series. Consistently with Britt et al. (2023), the mortality rates are used rather than counts, and weekly excess mortality rate for week t = 1 , , 52 of the year s = 2019 , 2020 , 2022 is defined as the ratio
E M R ( t , s ) = D ( t , s ) 1 5 u = 2014 2018 D ( t , u ) 1 5 u = 2014 2018 D ( t , u )
where D ( t , s ) is the weekly death count for every country in the study. The baseline death count 1 5 u = 2014 2018 D ( t , u ) is calculated as the average for non-perturbed (pre-COVID years). Then the time series Y t for each country is obtained by concatenation Y t = { E M R ( t , 2019 ) , E M P ( t , 2022 ) } . This approach to excess mortality helps to alleviate the effect of seasonality if it is generally consistent with the pre-COVID patterns.
The six countries in this study record the number of weeks in the year as the same, which is 52 (or 53) full 7-day weeks (Human Mortality Database 2025). This fixes the problem of weeks being out of alignment in which year they occur, as the data presented differ by one day only between the North American and European time series. Mortality data from 2014 to 2018 were used to make 5-year weekly averages. With 2014 being the only year with 53 weeks, this means that the week 53’s mortality average for each country is simply the last week of 2014.
The weekly excess mortality time series from the first week of 2019 through the 52nd week of 2022 were used for all six countries. The data are summarized in Table 1 and three separate time series plots for three pairs of countries to avoid cluttering are presented as Figure 1, Figure 2 and Figure 3.
According to the methodology of Section 2 and Section 3, the case data are analyzed with the final goal of building a parametric joint distribution model for weekly mortality in the six countries listed, taking into account both serial and cross correlation to provide a forecasting tool. The flow chart in Figure 4 delineates the steps of model construction.

4.1. ARIMA

At the first stage, ARIMA models were constructed separately for all six countries in the study. To begin, the time series ( Y ( 1 ) , , Y ( k ) ) ) for k = 6 countries were tested for stationarity using the Augmented Dickey–Fuller test and KPSS tests through the tseries R 4.4 package. The p-values are summarized in Table 2. It is worth noticing that none of moving average coefficients in Equation (2) proved to be statistically significant.
The mixed results do not give a clear indication as to whether the time series are stationary or not.
From here, partial autocorrelation functions for each time series were analyzed. The United States did not show a significant correlation after a delay of 3. France, Germany, and Sweden showed no significant correlation for a lag of 2, but did have significant correlation for higher order lags. Canada and Norway showed no significant correlation after a lag of 2. After this, ARIMA analysis was performed using the ARIMA function from the R 4.4 package stats to generate potential models. Final models were selected first by those with a non-zero amount of statistically significant coefficients, and then by BIC. The models were built according to the structure of the following equation:
Y t = β 1 · Y t 1 + + β p · Y t p + β 0 + ε t ,
where ε t is the error terms, and β 0 is the intercept. The results are summarized in Table 3.
After the models were selected, the residuals of each model were subjected to a Box–Ljung test using the stats R package. The results are summarized in Table 4.

4.2. ARIMA Residual Analysis

At the second stage, the parametric models were developed for the distributions of ARIMA residuals. The Shapiro–Wilk test was performed on the residuals of each model to determine their normality. The results are summarized in Table 5.
This suggests only the residuals for Norway’s model were normal. Fitting the distribution using fitdistr function from the R 4.4 package MASS yielded a normal distribution with μ = 0 and σ = 0.0036 . To fit a distribution to the others, the skewed t-distribution as defined in Fernandez and Steel (1996) was used. The results of the fitted distributions are summarized in Table 6.
Results were verified using the Kolmogorov–Smirnov test, which is summarized in the following Table 7. D is the test statistic. For the test, due to errors caused by the ks.test function used, which was not compatible with the distribution functions fitted, fitted distributions had samples randomly drawn from them to be compared, with a sample size of N = 10 6 .

4.3. Vine Structure Selection

At the third stage of the analysis, the pairwise dependence patterns between the countries were determined for ARIMA residuals. Structure of the copula model for the vector of ARIMA residuals ( ε ( 1 ) , , ε ( 6 ) ) ) was determined using the RVineStructureSelect function from the VineCopula R package. Later procedures determined all copulas after the first level of conditioning to be independence copulas, so only the structure of the first level of the vine will be shown. Figure 5 illustrates the vine graph, where the edges correspond to the closest connections established between the nodes. It is reasonable to believe that these connections are mostly due to geographical reasons, but also may reflect the similarity between health policies. That is why the pair Norway/Sweden being geographically close but different at the health policy level does not share an edge at the first level of the vine.

4.4. Pair Copula Selection

Finally, the pair copula selection was performed using the Bayesian method outlined in Shemyakin and Kniazev (2017) for the choices defined in Equations (5)–(8). First, the assumption was made that each copula would fall into one of four hypothesized copula families.
Hypothesis 1 (H1).
Clayton’s Copula, C.
Hypothesis 2 (H2).
Gumbel–Hougaard’s Copula, G.
Hypothesis 3 (H3).
Dual (Survival) Clayton’s Copula, SC.
Hypothesis 4 (H4).
Dual Gumbel–Hougaard’s Copula, SG.
From here, the Monte Carlo approach described above was used, with N = 10,000. The results are summarized in Table 8, with the maximum values in each column (for each pair of countries) boldfaced.
Next, optimal τ values were selected using MLE, resulting in the following structure. In Figure 6, along with the primary connections established in Figure 5 by vine structure estimation, the selected copula and corresponding value of τ are also provided, with larger τ corresponding to a closer association between the nodes of the graph.

4.5. Validation

Validation of the model with the later (post-COVID) data was carried out on three levels: ARIMA models, distribution models for ARIMA residuals, and the vine copula structure. To validate the results, mortality data from 2023 were used. While Eurostat maintained weekly death counts throughout 2023, the CDC only maintained weekly counts for 37 weeks, and Statistics Canada only maintained weekly counts for 33 weeks. This does not impact the analysis of ARIMA models, since it is performed country-by-country, meaning that all available data can be used.
To validate the ARIMA model, the mean absolute error (MAE) of each country’s ARIMA model was calculated. This is summarized in Table 9. The relatively low errors suggest that Canada, France, and Sweden’s ARIMA models are accurate. The larger errors of the other series makes sense, as the COVID pandemic was winding down in the USA by the time the CDC stopped updating its weekly death count in 2023. As such, models developed from the US mortality experience in 2023 would probably not be as accurate. For Germany and Norway, excess mortality temporarily plummeted in the beginning of 2023, which could explain the large average error observed. Figure 7, Figure 8, Figure 9, Figure 10, Figure 11 and Figure 12 showing the actual out-of-sample values versus the point forecasts with 90% and 95% confidence bounds corroborate this.
To validate the marginal analysis of ARIMA residuals, the residuals generated for 2023 were compared to previously fitted marginal distributions using the Kolmogorov–Smirnov test. The results are summarized in Table 10.
This suggests that the marginal distributions of the residuals fitted in Section 4 are accurate for Canada, France, Germany, Sweden, and Norway, and less accurate for the USA where the COVID pandemic mortality abruptly dropped in 2023. This loss of predictive power is also illustrated in Figure 7. The accuracy of the other models is illustrated by the other figures, despite an initial trough in some series.
Finally, to validate the copula model proposed in Section 4, the data length had to be adjusted for each country’s residuals. To accomplish this, each dataset was limited to the first 33 weeks. Then, this sample was fitted using the VineCopula R package’s RVineGofTest function to the copula structure estimated in Section 4 using the Kolmogorov–Smirnov test (evaluated asymptotically), and the Cramer–von-Mises test (evaluated with 200 bootstrap steps). The results are summarized in Table 11. For more details on how these tests are implemented in software, see Schepsmeier (2015).
This suggests that the dependence model put forth by the vine copula found in Section 4 describes the connections between the countries’ mortality experience well, even when the mortality experience differs from what is expected.

5. Discussion and Conclusions

To summarize the distinctive features of the modeling approach described in Section 2 and Section 3 and then illustrated in Section 4, we will discuss the advantages provided and the most likely applications related to these advantages. Then we will discuss the limitations of this approach. First, we suggest using separate ARIMA models for the time series of weekly excess mortality for the distinct zones (countries, geographical areas, etc). This approach is well established in mortality analysis and used in COVID-19 studies (Alabdulrazzaq et al. 2021; Ilie et al. 2020). Then, we use parametric distribution models for ARIMA residuals since they appear to be skewed and fat-tailed. This approach is quite common for financial time series but also has been applied to mortality data (Campolieti 2021). Finally, we model the joint distribution of ARIMA residuals for the zones considered above using vine copulas with the marginal distributions obtained at the previous step. This approach has been used for financial time series, see, e.g., Shemyakin and Kniazev (2017) for the general introduction and more extensive literature review, but appears to be new for mortality analysis related to epidemics (D’Urso et al. 2022).
The main advantage of using single time series ARIMA is the ease of data collection. One can use open-source data on mortality, which are often available in real time. Therefore, this approach is convenient for short-term forecast, defined by the CDC as one-week to four-weeks ahead (National Center for Health Statistics 2025). The most popular and powerful models, see Johannson et al. (2020) and Imperial College COVID Response Team (2022), use the ensemble approach and datasets including multiple predictors, which makes data collection more difficult. Using excess mortality also bypasses the necessity to properly attribute the cause of death, which can be problematic for epidemic data, especially at the epidemic’s onset, as discussed above.
Parametric distribution models for ARIMA residuals help to address extreme events of abnormally high or low weekly mortality, which can be critical in epidemic contexts. This provides more realistic tail probabilities. However, the example in Section 4 demonstrates that the multivariate effect of cross-correlation is also important. For short-term prediction of zone mortality, vine copula structure allows for an effective use of the recent history of the related zones. It also provides for a more realistic prediction of the mortality peaking simultaneously in several countries. In this regard it works like vector autoregression and network autoregression models (Britt et al. 2023; Sioofy et al. 2021). Unlike vector autoregression, however, it is not limited to linear association and is more effective in analyzing the tail behavior of joint distribution. One can also notice that the approach of the paper is not COVID-specific and can be applied to a wide range of excess mortality contexts.
The main limitation of this modeling approach is related to its strength: it does not directly allow for the use of covariates or external predictors. Therefore, there is no explanation of future mortality other than through previous experience. There is also no room for structural changes during pandemics. That makes it less suitable for longer-term forecasts or developing a realistic explanation of a pandemic’s progression versus traditional SIR and regression models (Chaurasia and Pal 2020). A possible problem is also the use of aggregated data which does not allow for the study of such factors as age, gender and socioeconomic status. The use of disaggregated data is one of the possible future directions of model development.
As we see from the illustration in Section 4, time series models of COVID-19 excess mortality built from open source countrywide mortality data are viable even without addressing mortality covariates. ARIMA models have low lags and no residual autocorrelation, but model residuals tend to be non-normal, being skewed with fat tails. In addition, there appears to exist some cross-correlation between countries not otherwise captured by ARIMA models. This cross-correlation can be modeled using vine copula structures, with pair copulas appropriately selected via Bayesian analysis of different hypothesized families. The end result also demonstrates a geographic component in determining the association between the residuals of different countries. Neighboring countries tend to have higher correlations with each other compared to countries separated by an ocean. However, this is not always the case, as Norway and Sweden’s model residuals appeared to be independent of each other and were more closely related to Germany’s residuals. It appears that copula models applied to ARIMA residuals provide an effective way to address cross-correlation between the countries and may help to predict one country’s mortality based on the others. The copula approach allows for addressing non-linear dependence patterns such as tail dependence, which cannot be captured with individual ARIMA modeling or vector autoregression. Model validation showed that real-world experience toward the end of the pandemic differed somewhat from the model predictions, possibly due to decreased mortality. However, the dependence structure held, suggesting that the conclusions derived from the vine copula model were accurate. Overall, the three-stage modeling approach (ARIMA, distribution analysis of ARIMA residuals, vine copula model for the vector of residuals) seems suitable for creating joint short-term forecasts of pandemic mortality for several countries.

Author Contributions

Conceptualization, A.S.; methdology, J.A. and A.S.; software, J.A.; validation, J.A.; formal analysis, A.S. and J.A.; investigation, A.S. and J.A.; resources, J.A. and A.S.; data curation, J.A.; writing—original draft preparation, A.S. and J.A.; writing—review and editing, J.A. and A.S.; visualization, J.A. and A.S.; supervision, A.S.; projection administration, A.S.; funding acquisition, A.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data presented in this study are openly available in Dropbox at https://www.dropbox.com/scl/fo/cjfa40ytmc3j2bzws1ocq/APwTm4TB7wVK1Xelydq7Qaw?rlkey=n3xwfvijst5bc9muvb3n2s9dk&st=xb6zn4bo&dl=0 (accessed on 2 June 2025).

Acknowledgments

We would like to acknowledge the support of CAM (Center for Applied Mathematics) at the University of St. Thomas and assistance of Rebecca Twite (UST) in modeling and simulation during Summer 2023.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
AICAkaike Information Criterion
ARIMAAuto-Regressive Integrated Moving Average
BICBayesian Information Criterion
CDCCenter For Disease Control
CDFCumulative Distribution Function
CvMCramer–von Mieses (test)
KPSSKwiatkowski–Phillips–Schmidt–Shin (test)
KSKolmogorov–Smirnov (test)
PDFProbability Distribution Function

References

  1. Aas, Kjersti, Claudia Czado, Arnoldo Frigessi, and Henrik Bakken. 2009. Pair-Copula constructions of multiple dependence. Insurance: Mathematics and Economics 44: 182–98. [Google Scholar] [CrossRef]
  2. Ahamad, Mazbahul G., Fahian Tanin, Byomkesh Talukder, and Monir U. Ahmed. 2020. Officially Confirmed COVID-19 and Unreported COVID-19-Like Illness Death Counts: An Assessment of Reporting Discrepancy in Bangladesh. American Journal of Tropical Medicine and Hygiene 104: 546–48. [Google Scholar] [CrossRef] [PubMed]
  3. Alabdulrazzaq, Haneen, Mohamed N. Alenezi, Yasmeen Rawajfih, Bareeq A. Alghannam, Abeer A. Al-Hassan, and Fawaz S. Al-Anzi. 2021. On the accuracy of ARIMA based prediction of COVID-19 spread. Results in Physics 27: 104509. [Google Scholar] [CrossRef]
  4. Alanazi, Fahad. 2021. The spread of COVID-19 at Hot-Temperature Places with Different Curfew Situations Using Copula Models. Paper presented at 1st International Conference on Artificial Intelligence and Data Analytics (CAIDA), Riyadh, Saudi Arabia, April 6–7. [Google Scholar]
  5. Basellini, Ugofilippo, Diego Alburez-Gutierrez, Emanuele Del Fava, Daniela Perrotta, Marko Bonetti, Carlo G. Camarda, and Emilio Zagheni. 2021. Linking excess mortality to mobility data during the first wave of COVID-19 in England and Wales. SSM—Population Health 14: 100799. [Google Scholar] [CrossRef]
  6. Bedford, Tim, and Roger M. Cooke. 2001. Probability Density Decomposition for Conditionally Dependent Random Variables Modeled by Vines. Annals of Mathematics and Artificial Intelligence 32: 245–68. [Google Scholar] [CrossRef]
  7. Brechmann, Eike. 2010. Truncated and Simplified Regular Vines and Their Applications. Munich: Technische Universität München. [Google Scholar]
  8. Brechmann, Eike C., and Uli Schepsmeier. 2013. Modeling Dependence with C-and D-Vine Copulas: The R Package CDVine. Journal of Statistical Software 52: 1–27. [Google Scholar] [CrossRef]
  9. Brechmann, Eike C., Claudia Czado, and Kjersti Aas. 2012. Truncated regular vines in high dimensions with application to financial data. Canadian Journal of Statistics 40: 68–85. [Google Scholar] [CrossRef]
  10. Britt, Tom, Jack Nusbaum, Alexandra Savinkina, and Arkady Shemyakin. 2023. Short-term Forecast of U.S. COVID Mortality Using Excess Deaths and Vector Autoregression. Model Assisted Statistics and Applications 18: 13–31. [Google Scholar] [CrossRef]
  11. Campolieti, Michele. 2021. Tail risks and infectious disease: Influenza mortality in the U.S., 1900–2018. Infectious Disease Modelling 6: 1135–43. [Google Scholar] [CrossRef]
  12. Chaurasia, Vikas, and Saurabh Pal. 2020. COVID-19 Pandemic: ARIMA and Regression Model-Based Worldwide Death Cases Predictions. SN Computer Science 1: 288. [Google Scholar] [CrossRef]
  13. Dissmann, Jeffrey F., Eike C. Brechman, Claudia Czado, and Dorota Kurowicka. 2013. Selecting and estimating regular vine copulae and application to financial returns. Computational Statistics & Data Analysis 59: 52–69. [Google Scholar]
  14. D’Urso, Pierpaolo, Livia De Giovanni, and Vincenzina Vitale. 2022. A D-vine copula-based quantile regression model with spatial dependence for COVID-19 infection rate in Italy. Spatial Statistics 47: 100586. [Google Scholar] [CrossRef] [PubMed]
  15. Eurostat. 2025. Available online: https://ec.europa.eu/eurostat/data/database (accessed on 15 February 2025).
  16. Fernandez, Carmen, and Mark F. J. Steel. 1996. On Bayesian Modelling of Fat Tails and Skewness. Journal of the American Statistical Association 93: 359–71. [Google Scholar] [CrossRef]
  17. Huang, Liwei, and A. Shemyakin. 2020. Empirical comparison of skewed t-copula models for insurance and financial data. Model Assisted Statistics and Applications 15: 351–61. [Google Scholar] [CrossRef]
  18. Huard David, Guillaume Evin, and Anne-Catherine Favre. 2006. Bayesian copula selection. Computational Statistics & Data Analysis 51: 809–22. [Google Scholar]
  19. Human Mortality Database. 2025. Short-Term Mortality Fluctuations Data Series Note. Available online: https://www.mortality.org/File/GetDocument/Public/STMF/DOC/STMFNote.pdf (accessed on 15 February 2025).
  20. Ilie, Ovidiu-Dumitru, Alin Ciobica, and Bogdan Doroftei. 2020. Testing the Accuracy of the ARIMA Models in Forecasting the Spreading of COVID-19 and the Associated Mortality Rate. Medicina 56: 566. [Google Scholar] [CrossRef]
  21. Imperial College COVID Response Team. 2022. Short-Term Forecasts of COVID-19 Deaths in Multiple Countries. Available online: https://mrc-ide.github.io/covid19-short-term-forecasts/ (accessed on 15 February 2025).
  22. Johansson, Michael A., Nicholas G. Reich, Evan L. Ray, Nutcha Wattanachit, Abdul Hannan Kanji, Katie House, Estee Y. Cramer, Johannes Bracher, Andrew Zheng, Teresa K. Yamana, and et al. 2020. Ensemble Forecasts of Coronavirus Disease 2019 (COVID-19) in the U.S. medRxiv. [Google Scholar] [CrossRef]
  23. Kim, Jong-Min. 2022. Copula Dynamic Conditional Correlation and Functional Principal Component Analysis of COVID-19 Mortality in the United States. Axioms 11: 619. [Google Scholar] [CrossRef]
  24. Lei, Xianhui, and A. Shemyakin. 2023. Copula Models of COVID-19 Mortality in Minnesota and Wisconsin. Risks 11: 193. [Google Scholar] [CrossRef]
  25. Manner, Hans. 2007. Estimation and Model Selection of Copulas with an Application to Exchange Rates. METEOR Research Memorandum No. 056. Maastricht: Maastricht University. [Google Scholar] [CrossRef]
  26. Martinez-Folgar, Kevin, Diego Alburez-Gutierrez, Alejandra Paniagua-Avila, Manuel Ramirez-Zea, and Usama Bilal. 2021. Excess mortality during the COVID-19 pandemic in Guatemala. American Journal of Public Health 111: 1839–46. [Google Scholar] [CrossRef]
  27. National Center for Health Statistics. 2025. Weekly Counts of Deaths by State and Select Causes, 2014–2019; 2020–2023. Available online: https://data.cdc.gov/browse?category=National+Center+for+Health+Statistics&sortBy=last_modified&pageSize=20&limitTo=datasets (accessed on 2 June 2025).
  28. Schepsmeier, Uli. 2015. Efficient information based goodness-of-fit tests for vine copula models with fixed margins. Journal of Multivariate Analysis 138: 34–52. [Google Scholar] [CrossRef]
  29. Shemyakin, Arkady, and Alexander Kniazev. 2017. Introduction to Bayesian Estimation and Copula Models of Dependence. London: John Wiley and Sons. 345p, ISBN 978-1-118-95901-5. [Google Scholar]
  30. Sioofy, Khoojine Arash, Mahdi Shadabfar, Vahid Reza Hosseini, and Hadi Kordestani. 2021. Network Autoregressive Model for the Prediction of COVID-19 Considering the Disease Interaction in Neighboring Countries. Entropy 23: 1267. [Google Scholar] [CrossRef] [PubMed]
  31. Statistics Canada. 2024. Table 13-10-0768-01 Provisional Weekly Death Counts, by Age Group and Sex. Available online: https://www150.statcan.gc.ca/n1/pub/71-607-x/71-607-x2021028-eng.htm (accessed on 15 February 2025).
  32. Wang, Haidong, Katherine R. Paulson, Spencer A. Pease, Stefanie Watson, Haley Comfort, Peng Zheng, Aleksandr Y. Aravkin, Catherine Bisignano, Ryan M. Barber, Tahiya Alam, and et al. 2020. Estimating excess mortality due to the COVID-19 pandemic: A systematic analysis of COVID-19-related mortality, 2020–2021. Lancet 399: 1513–36. [Google Scholar] [CrossRef]
  33. Wifvat, Kathryn, John Kumerow, and Arkady Shemyakin. 2020. Copula Model Selection for Vehicle Component Failures Based on Warranty Claims. Risks 8: 56. [Google Scholar] [CrossRef]
  34. Yan, Yifei, Jianhua Hou, Qing Li, and Nancy Xiaonan Yu. 2023. Suicide before and during the COVID-19 Pandemic: A Systematic Review with Meta-Analysis. International Journal of Environmental Research and Public Health 20: 3346. [Google Scholar] [CrossRef]
  35. Zińczuk Alexander, Marta Rorat, and Tomasz Jurek. 2023. COVID-19-related excess mortality—An overview of the current evidence. Archiwum Medycyny Sadowej i Kryminologii 73: 33–44. [Google Scholar] [CrossRef]
Figure 1. USA (red) and Canada (blue) time series excess mortality.
Figure 1. USA (red) and Canada (blue) time series excess mortality.
Risks 13 00119 g001
Figure 2. France (yellow) and Germany (black) weekly excess mortality.
Figure 2. France (yellow) and Germany (black) weekly excess mortality.
Risks 13 00119 g002
Figure 3. Norway (green) and Sweden (purple) weekly excess mortality.
Figure 3. Norway (green) and Sweden (purple) weekly excess mortality.
Risks 13 00119 g003
Figure 4. Steps of the modeling process.
Figure 4. Steps of the modeling process.
Risks 13 00119 g004
Figure 5. Vine structure.
Figure 5. Vine structure.
Risks 13 00119 g005
Figure 6. Vine structure with the pair copulas and τ .
Figure 6. Vine structure with the pair copulas and τ .
Risks 13 00119 g006
Figure 7. USA weekly excess mortality with forecast (blue) and confidence bounds for 2023.
Figure 7. USA weekly excess mortality with forecast (blue) and confidence bounds for 2023.
Risks 13 00119 g007
Figure 8. Canada weekly excess mortality with forecast (blue) and confidence bounds for 2023.
Figure 8. Canada weekly excess mortality with forecast (blue) and confidence bounds for 2023.
Risks 13 00119 g008
Figure 9. France weekly excess mortality with forecast (blue) and confidence bounds for 2023.
Figure 9. France weekly excess mortality with forecast (blue) and confidence bounds for 2023.
Risks 13 00119 g009
Figure 10. Germany weekly excess mortality with forecast (blue) and confidence bounds for 2023.
Figure 10. Germany weekly excess mortality with forecast (blue) and confidence bounds for 2023.
Risks 13 00119 g010
Figure 11. Norway weekly excess mortality with forecast (blue) and confidence bounds for 2023.
Figure 11. Norway weekly excess mortality with forecast (blue) and confidence bounds for 2023.
Risks 13 00119 g011
Figure 12. Sweden weekly excess mortality with forecast (blue) and confidence bounds for 2023.
Figure 12. Sweden weekly excess mortality with forecast (blue) and confidence bounds for 2023.
Risks 13 00119 g012
Table 1. Time series summary statistics.
Table 1. Time series summary statistics.
CountryMinMedianMeanMax σ
USA 0.032 0.170 0.183 0.491 0.134
CAN 0.016 0.138 0.142 0.354 0.083
FRA 0.0463 0.085 0.106 0.662 0.110
GER 0.132 0.067 0.094 0.539 0.114
NOR 0.195 0.026 0.040 0.464 0.090
SWE 0.164 0.012 0.013 0.464 0.101
Table 2. Stationarity test p-values. α = 0.05 .
Table 2. Stationarity test p-values. α = 0.05 .
CountryADFKPSS
USA 0.03 < 0.01
CAN< 0.01 < 0.01
FRA< 0.01 < 0.01
GER< 0.01 < 0.01
NOR 0.25 < 0.01
SWE< 0.01 > 0.1
Table 3. ARIMA coefficients and standard errors by country.
Table 3. ARIMA coefficients and standard errors by country.
CoefUSACANFRAGERNORSWE
β 0 0.18 ( 0.03 ) 0.14 ( 0.03 ) 0.82 ( 0.04 ) 0.11 ( 0.04 ) 0.05 ( 0.03 ) 0.02 ( 0.02 )
β 1 1.41 ( 0.07 ) 0.75 ( 0.07 ) 0.11 ( 0.02 ) 0.91 ( 0.03 ) 0.56 ( 0.07 ) 0.77 ( 0.07 )
β 2 0.29 ( 0.12 ) 0.18 ( 0.12 ) -- 0.31 ( 0.07 ) 0.30 ( 0.08 )
β 3 0.18 ( 0.03 ) ---- 0.22 ( 0.07 )
Table 4. Box–Ljung Tests. α = 0.05 .
Table 4. Box–Ljung Tests. α = 0.05 .
Countryp-Values
USA 0.0922
CAN 0.4843
FRA 0.0553
GER 0.1890
NOR 0.4689
SWE 0.3456
Table 5. Shapiro–Wilk tests. α = 0.05 .
Table 5. Shapiro–Wilk tests. α = 0.05 .
Countryp-Values
USA< 0.0001
CAN< 0.0001
FRA< 0.0001
GER 0.0038
NOR 0.4416
SWE< 0.0001
Table 6. Skewed t-distribution fit results.
Table 6. Skewed t-distribution fit results.
Countryμ 1 σ v ξ
USA0 0.0259 4.6361 1.0340
CAN0 0.0371 3.6201 1.1711
FRA0 0.0720 2.7752 1.1821
GER0 0.0540 4.0917 1.0257
SWE0 0.0509 6.0996 1.2421
1 μ was found to be both < 0.005 and have high standard error, as such as each mean was treated as zero.
Table 7. Kolmogorov–Smirnov test statistics and p-values. α = 0.05 .
Table 7. Kolmogorov–Smirnov test statistics and p-values. α = 0.05 .
CountryDp-Value
USA 0.0384 0.9177
CAN 0.0378 0.9267
FRA 0.0246 0.9996
GER 0.0416 0.8626
SWE 0.0377 0.9283
Table 8. Posterior probabilities for hypothesized pair-copula families.
Table 8. Posterior probabilities for hypothesized pair-copula families.
CountriesUS/CANUS/FRAFRA/GERGER/NORGER/SWE
H1 (C)00.09 0.01 0.610.51
H2 (G)0.840.350.03 0.10 0.02
H3 (SC)0.11 0.28 00.04 0
H4 (SG)0.05 0.28 0.960.25 0.47
Table 9. MAE for ARIMA.
Table 9. MAE for ARIMA.
CountryForecast MAE
USA 0.103
CAN 0.033
FRA 0.064
GER 0.107
NOR 0.117
SWE 0.077
Table 10. Kolmogorov–Smirnov test statistics and p-values. α = 0.05 .
Table 10. Kolmogorov–Smirnov test statistics and p-values. α = 0.05 .
CountryDp-Value
USA 0.2402 0.0276
CAN 0.1043 0.8652
FRA 0.1149 0.4981
GER 0.0646 0.9817
SWE 0.1795 0.0702
NOR 0.0822 0.8738
Table 11. KS and CvM test p-values for vine copula model α = 0.05 .
Table 11. KS and CvM test p-values for vine copula model α = 0.05 .
Testp-Value
KS 0.7897
CvM 0.0650
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Asplund, J.; Shemyakin, A. Copula Modeling of COVID-19 Excess Mortality. Risks 2025, 13, 119. https://doi.org/10.3390/risks13070119

AMA Style

Asplund J, Shemyakin A. Copula Modeling of COVID-19 Excess Mortality. Risks. 2025; 13(7):119. https://doi.org/10.3390/risks13070119

Chicago/Turabian Style

Asplund, Jonas, and Arkady Shemyakin. 2025. "Copula Modeling of COVID-19 Excess Mortality" Risks 13, no. 7: 119. https://doi.org/10.3390/risks13070119

APA Style

Asplund, J., & Shemyakin, A. (2025). Copula Modeling of COVID-19 Excess Mortality. Risks, 13(7), 119. https://doi.org/10.3390/risks13070119

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop