Next Article in Journal
An Efficient Optimization Technique for Training Deep Neural Networks
Next Article in Special Issue
Exact Permutation and Bootstrap Distribution of Generalized Pairwise Comparisons Statistics
Previous Article in Journal
Personalized Movie Recommendations Based on a Multi-Feature Attention Mechanism with Neural Networks
Previous Article in Special Issue
A Blockwise Empirical Likelihood Test for Gaussianity in Stationary Autoregressive Processes
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

COVID-19 Data Analysis Using Bayesian Models and Nonparametric Geostatistical Models

Department of Mathematics and Statistics, University of Ottawa, Ottawa, ON K1N 6N5, Canada
*
Author to whom correspondence should be addressed.
Mathematics 2023, 11(6), 1359; https://doi.org/10.3390/math11061359
Submission received: 6 January 2023 / Revised: 14 February 2023 / Accepted: 3 March 2023 / Published: 10 March 2023
(This article belongs to the Special Issue Nonparametric Statistical Methods and Their Applications)

Abstract

:
Since December 2019, many statistical spatial–temporal methods have been developed to track and predict the spread of the COVID-19 pandemic. In this paper, we analyzed the COVID-19 dataset which includes the number of biweekly infected cases registered in Ontario from March 2020 to the end of June 2021. We made use of Bayesian Spatial–temporal models and Area-to-point (ATP) and Area-to-area (ATA) Poisson Kriging models. With the Bayesian models, spatial–temporal effects and government intervention effects on infection risk are considered while the ATP Poisson Kriging models are used to display the spread of the pandemic over space.

1. Introduction

The pandemic caused by the corona virus in 2019 (COVID-19) has led to an unprecedented number of statistical papers, many of which focus on the estimation of the Basic or Effective Reproduction Number. Other papers describe the evolution of the virus in a geographichal region over time using SEIR model [1,2,3,4]. Few studies, however, have made use of geostatistical methods to analyze the spread of the virus. Kriging methods are often used to develop isopleth maps. However, simple kriging methods make use of data assumed to be measured at each point in space. The public health data is in practice based on areal aggregation. When performing point kriging of areal data, the user makes the practical assumption that all the inhabitants of the administrative unit live at the same location and the measured rate thus refers to this specific location. This assumption is reasonable whenever the units of aggregation are small with respect to the spacing of the interpolation grid. However, the size of public health units in Ontario are not relatively small. Therefore, the assumption of point measurement support becomes clearly inappropriate. There is a need to develop specific methods to incorporate the shape and size of those units in the analysis. Area-to-Point (ATP) Poisson Kriging and Area-to-Area (ATA) Poisson Kriging incorporate the size and shape of administrative units, as well as the population density into the mapping of the corresponding risk at a fine scale [5,6,7]. These kriging methods can be used to see how the virus spreads over space, and downscale the areal infected risks into point ones, which can show the spatial clustering feature of COVID-19 spreading.
Bayesian statistical techniques can be used to estimate areal risk by including relevant information from neighboring sites [8,9,10,11,12]. However, computational aspects is still the main challenge in Bayesian statistics. Markov Chain Monte Carlo (MCMC) methods [13,14,15] are normally used for Bayesian computation. Yet, the estimation of model parameters is usually time-consuming and requires intensive computational efforts. The Integrated Nested Laplace Approximation (INLA) [16] approach has been developed in 2009 as a computationally efficient alternative to MCMC. It can be used to latent Gaussian models that can be applied for analyzing spatial and spatial–temporal data. For this reason, INLA has been successfully used in a great variety of applications [17,18,19,20]. Bayesian spatial–temporal models can incorporate the spatial and temporal effects on the infected risk. They also can include other variables to analyze the association with the disease. In [21], environmental variables were taken into account. Government intervention methods which have played an important role since pandemic, are considered in the paper in order to determine their impact on mitigating the infection risks in Ontario. Also, the following two references [22,23] were brought to our attention after our paper was submitted.
In this paper, we apply Bayesian Spatial–temporal Models to assess the effectiveness of government instituted policy factors as well as to verify the importance of auxiliary variables. We then consider Area-to-Point (ATP) and Area-to-Area (ATA) Poisson Kriging to track the spread of the virus over Ontario. The prediction performance of the models and compare the models with respect to both short-term and longer-term prediction and the spatial clustering feature based on the ATP Poisson Kriging Model are discussed. Conclusions are drawn after comparing Bayesian methods with the ATA(ATP) Poisson Kriging Models. Based on these methods, an interactive website https://mujingrui.shinyapps.io/covid19 (accessed on 1 November 2021), was developed to show these tracking maps with the use of Shiny package in R.

2. Data

The Ontario Government published the dataset of Confirmed Cases in Ontario (see Supplemental Data), which includes case reported date, age group of the case, and Public Health Unit (PHU) where the confirmed case occurred. Age group information of each case can be used to provide an age-adjusted infection rate, which will be used in Poisson Kriging Models. Auxiliary data was also taken into account and are listed in Table S1 (see Supplemental Data). As well, we make use of Statistics Canada released COVID-19 relevant indicators from different characteristics based on the 2016 Canadian census (see Supplemental Data). Since March 2020, the Ontario Government published different policies to stop the spread of the virus. Among all policies implemented, there are three categories labeled as follows: Indoor Gathering, Outdoor Gathering and Non-essential Services, for a more in-depth analysis. As various restrictions and rules were imposed at different levels of enforcement across the different public health units or provinces, it is difficult to compare them directly. The levels within these three categories according to the common features of the restrictions are in the Table S2 in supplemental Data. The time lines for the implementation of these interventions are listed in Tables S3–S5, respectively, in Supplemental Data.

3. Methodologies

3.1. Bayesian Spatial–temporal Models

Spatial–temporal infected cases data can be represented as observations in N public health units in Ontario y t = ( y 1 t , , y Nt ) , where t = 1 , , T . Here, y = ( y 11 , y 12 , , y 1 T , , y NT ) represents the number of bi-weekly infected cases observed in each unit.
A three-stage hierarchical process in Bayesian spatial–temporal statistic models has been widely used [9,24]. The first stage consists of the model for infected cases where we assume y = ( y 11 , y 12 , , y N T ) and y i t Poisson ( μ i t ) . For the second stage we place a regression equation on l o g ( μ i t ) , which includes an overall fixed effect (intercept, denoted α ), covariate effects and spatial, temporal, spatial–temporal interaction effects. We specify the prior distributions on each of the unknown parameters in the third stage, which are usually defined as weakly informative with Gaussian distributions having zero mean and large variance since the spatial and temporal effects discussed as follows are defined under the Gaussian Markov Random Field (GMRF) and the precision matrix in these two effects are sparse.
The spatial component included in the spatial–temporal model we built is the Leroux CAR specification [18]. A BYM specification [8] is also considered, but the performance is not good as the models with Leroux CAR. BYM specification directly decomposes spatial component into structured one and unstructured one, while parameter λ s is introduced to balance spatial structured effect and unstructured effect. However, it was used in analyzing the COVID-19 infection risk in Spain with environmental variables [21]. There are four ways to define the spatial–temporal interaction term [25]. Table 1 indicates the four types of interactions and hence the four different models are considered (Note: Table 1 reproduced from Schrödle and Held [26]). Here, the log-risk is modeled as:
l o g ( μ i t ) = α + β X i + ξ i + γ t + ϕ t + δ i t
where ξ i is the spatial component, ϕ t and γ t represent unstructured and structured temporal effects, respectively and δ i t represents the space-time interaction term. β = ( β 1 , β 2 , , β p ) represents the vector of covariate coefficients; X i = ( x i 1 , x i 2 , , x i p ) is the COVID-19 relevant covariate data vector to be discussed in Section 3.2. Denoting the vector of spatial effects by ξ = ( ξ i , , ξ n ) , the Leroux CAR specification can be defined as:
ξ N ( 0 , D s ( σ s 2 , λ s ) ) , D s = σ s 2 ( λ s R s + ( 1 λ s ) I s ) 1
The term γ t represents the temporally structured effect where random walk of first order (RW1) is considered. That is γ = ( γ 1 , , γ t ) N ( 0 , σ γ 2 R t ) , where σ γ 2 is the variance component. Gaussian distribution is chosen for unstructured temporal effect ϕ t : ϕ t N ( 0 , σ ϕ 2 ) .
The identity matrices I s ( I t ) correspond to the unstructured spatial (temporal) effect respectively, whereas R t ( R s ) represent matrices that correspond to a specific structured temporal (spatial) effect (RW1) as follows. We also consider the random walk of second order (RW2), but the performance is not good as the one when RW1 is included.
R t = 1 1 1 2 1 1 2 1 1 2 1 1 2 1 0 0 1 2 1 0 1 2 1 1 1
R s = k i ω k i , i = k 1 , i k 0 , o t h e r w i s e
where ω k i = 1 if areas k and i are sharing the same boundary. As discussed in Section 2, there are three different policies selected and categorized as three different variables. The variable Indoor Gathering (IG) will be defined as 3 indicator variables according to the different restrictions level: I G i = 1 if the i-th level gathering restriction was in place, and 0 otherwise, for i = 1 , 2 , 3 . The variables Outdoor Gathering (OG) and Non-essential service also can be defined as indicator variables according to the different restrictions level. These variables can also be included into the Bayesian Spatial–temporal Models to determine how they influence the infected risk:
y i t P o i s s o n ( μ i t ) l o g ( μ i t ) = l o g ( e i ) + θ i t θ i t = α + β X i + j = 1 3 β I G j G i j t + j = 1 4 β O G j G i j t + j = 1 4 β E j E i j t + ξ i + γ t + ϕ t + δ i t ξ N ( 0 , D s ( σ s 2 , λ s ) ) , D s = σ s 2 ( λ s R s + ( 1 λ s ) I s ) 1 γ t | γ t 1 N ( γ t 1 , σ γ 2 ) , ϕ t N ( 0 , σ ϕ 2 ) δ N ( 0 , σ δ 2 R δ ) , δ = ( δ 11 , , δ n T )
The spatial component distribution can also be expressed as ξ i | ξ k i N ( λ s 1 λ s + λ s m i k i ξ i , σ s 2 1 λ s + λ s m i ) Where λ s is a spatial smoothing parameter taking values between 0 and 1, I s is an identity matrix of dimension n × n and R s is the spatial neighboring matrix which corresponds to the structured spatial effect. m i is the number of neighbors around area i. That is m i = k i ω k i , with k i referring to neighbor regions i and j sharing a common boundary. When λ s = 0 , the Leroux CAR reduces to ξ N ( 0 , σ s 2 I n ) , and when λ s = 1 , it is ξ N ( 0 , σ s 2 R s ) . The unstructured temporal effect ϕ t is modelled as independent and identical normal distribution. That is, ϕ N ( 0 , σ ϕ 2 I t ) . For the structured temporal effect γ t , a random walk of first order is considered. That is γ N ( 0 , σ γ 2 R t ) . The interaction terms δ = ( δ 11 , , δ n T ) are assumed to be a normal distribution as N ( 0 , σ δ 2 R δ ) , where σ δ 2 is the hyper-parameter and R δ is the matrix given by the Kronecker product of the corresponding matrices of the effects [27]. This model can be built in R-INLA with generic1 option [12].

3.2. Integrated Nested Laplace Approximation (INLA)

The model in Section 3.1 can be fitted using the following modelling framework:
y i t Poisson ( μ i t ) l o g ( μ i t e i ) = α + m = 1 M β m x i m + q = 1 Q η q x i t q * + l = 1 L f l ( ) + h ( δ i t )
where α is a scalar representing the intercept; the coefficients β = ( β 1 , , β M ) and η = ( η 1 , , η Q ) quantify the effect of additional relevant covariates X i = ( x i 1 , , x i M ) and policy covariates X i t * = ( x i t 1 * , , x i t Q * ) on the response; and f = { f 1 ( ) , , f L ( ) } are a set of functions defined in terms of spatially, temporally correlated effects and δ i t is the interaction space and time effect; y = ( y 11 , , y N T ) represents the vector of biweekly COVID-19 infected cases, N is the number of public health units in Ontario and T is the number of biweeks observed. For the Bayesian Spatial–temporal model in Section 3.1, we identify f 1 ( ) N ( 0 , D s ( σ s 2 , λ s ) ) , f 2 ( ) N ( 0 , σ γ 2 R t ) , f 3 ( ) N ( 0 , σ ϕ 2 I t ) and h ( ) N ( 0 , σ δ 2 R δ ) . Upon varying the form of the functions f l ( ) , this formulation can accommodate a wide range of models, from standard and hierarchical regression, to spatial and spatial–temporal models [16,28].
The spatial–temporal models fitted into this framework are built as Bayesian hierarchical models with three stages [24]. The first stage is the model for infected cases given parameters p ( y | θ ) , where y denotes the observed cases. The second stage is the model on each parameter p ( θ | ψ ) . The third stage is the prior on the hyper-parameters p ( ψ ) . Note: θ = ( α , β , η , ξ , ϕ , γ , δ ) and ψ = ( ψ 1 , ψ 2 , , ψ K ) .
The objectives of the Bayesian computation consist of calculating the marginal posterior distributions for each parameter and hyper-parameter:
p ( θ i | y ) = p ( ψ | y ) p ( θ i | ψ , y ) d ψ p ( ψ k | y ) = p ( ψ | y ) d ψ k
ψ k represents the vector but no kth component. The first item we need compute is an approximation to the posterior marginal distribution of the hyper-parameters as
p ( ψ | y ) p ( ψ ) p ( θ | ψ ) p ( y | θ ) p ( θ | ψ , y ) p ( ψ ) p ( θ | ψ ) p ( y | θ ) p ˜ ( θ | ψ , y ) θ = θ * ( ψ ) = : p ˜ ( ψ | y )
Next, p ( θ i | ψ , y ) is needed to be approximated, and it is possible to re-express the vector of parameters as θ = ( θ i , θ i ) and make use of the Laplace approximation again to obtain:
p ( θ i | ψ , y ) p ( ψ ) p ( θ | ψ ) p ( y | θ ) p ( θ i | θ i , ψ , y ) p ( ψ ) p ( θ | ψ ) p ( y | θ ) p ˜ ( θ i | θ i , ψ , y ) θ i = θ i * ( θ i , ψ ) = : p ˜ ( θ i | ψ , y )
Here, p ˜ ( θ i | ψ , y ) represents the Gaussian approximation to p ( θ i | ψ , y ) and θ i = θ i * ( θ i , ψ ) is its mode. The approximation typically works very well, but it can be very expensive in computational terms. Rue et al. [16] proposed the Simplified Laplace Approximation. Numerical integration is used to evaluate the conditional posteriors p ˜ ( θ i | ψ k , y ) and corresponding marginal posteriors p ˜ ( θ i | y ) on a grid of selected values for θ i .
p ˜ ( θ i | y ) k = 1 K p ˜ ( θ i | ψ k , y ) p ˜ ( ψ k | y ) k

3.3. Area-to-Point (ATP) and Area-to-Area (ATA) Poisson Kriging

We assume v i ( i = 1 , , 34 ) represents the public health unit in Ontario and u is the point location centered in each 15 × 15 square cell we partitioned and we use s , s to index different points in units i , j . The 15 × 15 cells are chosen since it has better performance after we tried different cells, 5 × 5, 8 × 8, and 10 × 10, 15 × 15. The observed age-adjusted bi-weekly COVID-19 infection rate is then denoted as z ( v i ) = S ( v i ) / n ( v i ) , where n ( v i ) is the population size in public health unit i. At each unit i, the corresponding infected cases S ( v i ) can be assumed to follow a conditional Poisson distribution given local risk R ( v i ) :
S ( v i ) | R ( v i ) Poisson ( n ( v i ) R ( v i ) )
Therefore, these cases are spatially correlated in either the population sizes or in the risks. The risk variable R ( v i ) itself can be distributed as an unknown distribution with mean value m, variance value σ R 2 and variance function C R ( v i , v j ) [5]. It is not realistic to just assume each unit v i to its geographic centroid because the distances between these public health units are large. Also, they have different shapes and sizes. The spatial correlation of each unit needs to be considered. Area-to-Area (ATA) Kriging is used to predict the areal risks and we assume areal supports are disjointed [29]. The estimated areal risk value r ( v α ) in an arbitrary unit α thus can be expressed as a weighted linear combination of the K neighboring available areal infection rates:
r ^ P K ( v α ) = i = 1 K λ i ( v α ) z ( v i )
where z ( v i ) is the biweekly age-adjusted infection rate in each public health unit i. The areal weights λ i ( v α ) can be calculated through the following system:
j = 1 K λ j ( v α ) [ C ¯ R ( v i , v j ) + δ i j m * n ( v i ) ] + μ ( v α ) = C ¯ R ( v i , v α ) i = 1 , , K j = 1 K λ j ( v α ) = 1
where C ¯ R ( v i , v j ) = Cov { z ( v i ) , z ( v j ) } . The areal covariances are approximated by averaging point-to-point covariances C ( u s , u s ) calculated between any two points which can discretize the units v i and v j :
C ¯ R ( v i , v j ) = 1 s = 1 P i s = 1 P j w s s s = 1 P i s = 1 P j w s s C ( u s , u s )
where P i and P j represent the number of points discretizing the corresponding two areas v i and v j . The weights w s s are calculated as the product of population sizes in each 15 km × 15 km square cell centered on the points u s and u s : w s s = n ( u s ) × n ( u s ) . Therefore, the sum of population size in each cell within unit is equal to the population size in each unit: n ( v i ) = s = 1 P i n ( u s ) and n ( v j ) = s = 1 P j n ( u s ) . The kriging variance for estimated areal risk in unit α is computed as:
σ P K 2 ( v α ) = C ¯ R ( v α , v α ) i = 1 K λ i ( v α ) C ¯ R ( v i , v α ) μ ( v α )
where C ¯ R ( v α , v α ) is the covariance within the same area α :
C ¯ R ( v α , v α ) = 1 n 2 ( v α ) [ s = 1 P α n 2 ( u s ) C ( 0 ) + s = 1 P α s = 1 P α w s s C ( u s , u s ) δ s s ( s s ) ]
and δ s s ( s s ) is the indicator function. Alternatively, kriging may be used to predict a value r ( u s ) also in use of K neighboring areal infection rates { z ( v i ) , i = 1 , , K } [29]. The predicted point risk r ^ P K ( u s ) can also be expressed as a weighted linear combination. Here, u s represents the point location whose risk value will be estimated.
r ^ P K ( u s ) = i = 1 K λ i ( u s ) z ( v i )
The system of linear equations that is used to compute the kriging weights is similar to the one used for calculating weights in the ATA kriging method. However, the area-to-area covariances C ¯ R ( v i , v α ) on the right-side of first equation in (11) are replaced by area-to-point covariances C ¯ R ( v i , u s ) approximated as follows:
C ¯ R ( v i , u s ) = 1 s = 1 P i w s s s = 1 P i w s s C ( u s , u s )
where P i is the number of points in area v i . The area-to-point kriging variance is estimated as:
σ P K 2 ( u s ) = C R ( 0 ) i = 1 K λ i ( u s ) C ¯ R ( v i , u s ) μ ( u s )
In order to solve systems of equations, the covariance C ( u s , u s ) or equivalently the point-to-point semivariogram γ ( u s , u s ) is needed. It can be computed through the relationship [30,31]:
γ v ( h ) = γ ¯ ( v , v h ) γ ¯ h ( v , v )
here, h is the vector of distances. The second term on the right side is calculated by averaging point-to-point semivariogram values in the same unit for any pairs of units separated by given distances h :
γ ¯ h ( v , v ) = 1 2 N ( h ) i = 1 N ( h ) [ 1 P i 2 s = 1 P i s = 1 P i γ ( u s , u s ) + 1 P i + h 2 s = 1 P i + h s = 1 P i + h γ ( u s , u s ) ] ,
where P i and P i + h are the number of points in units v i and v i + h respectively, N ( h ) is the number of pairs of units given distances h . The area-to-area semivariogram value, γ ¯ ( v , v h ) , is also estimated through point-to-point semivariograms:
γ ¯ ( v , v h ) = 1 N ( h ) i = 1 N ( h ) 1 P i P i + h s = 1 P i s = 1 P i + h γ ( u s , u s )
The estimating point-support semivariogram procedure starts with the choice of an initial one γ ( 0 ) ( h ) and the estimation is best tackled using an iterative procedure until the difference between theoretically regularized areal semivariogram and experimental areal semivariogram is small [32,33].

4. Applications of Methodologies

4.1. Model Selection

Following the discussion in Section 3.1, there are 4 models proposed as indicated in Table 2:
Here X i represents the vector of covariates discussed in Section 2. The models are assessed using the Deviance Information Criterion (DIC) as shown in Table 3 with lower values indicating a better fit [34]. The criterion takes into account the goodness-of-fit as well as a penalty term that is based on the complexity of the model via the estimated effective number of parameters. The DIC is defined as: D I C = D ( θ ^ , ψ ^ ) + 2 p D . The Watanabe-Akaike information criterion (WAIC), also known as widely applicable Bayesian information criterion, is similar to the DIC but the effective number of parameters is computed in a different way [35,36]. Table 3 shows the DIC value and WAIC values for the different models under RW1. We have grouped together similar models with and without the Leroux specification for appropriate comparisons. The model with better performance will be chosen accordingly.
We also considered models with RW2 (second order of random walk). But Using the DIC criterion, models with RW1 structured temporal effect show better performance. It is seen that the model under Type II Interaction has the lowest DIC values when considering Leroux CAR specification for spatial component. In the following sections, we discuss the effects of the various coefficients in these models.

4.2. Covariates Effect

In Section 4.1, we evaluated all of the models. Regarding the effect of each covariate on the spread of COVID-2019, the summary of each coefficient on these indicators discussed in Section 2 is reflected in Figure 1:
Figure 1 shows the range value of each coefficient from (mean-sd) to (mean+sd). It can be seen that almost all these covariates do not have significant associations with infected risk.

4.3. Policy Effect

For the policy analysis, regarding the effect of each intervention factor on the spread of COVID-2019, the summay of each coefficient on the intervention factors is reflected in Figure 2:
Figure 2 also shows the same range value from (mean − sd) to (mean + sd) as in Figure 1. It can be seen from Figure 2 that the levels 1–4 in Non-essential Services Intervention Factor are significant, and the levels 2–4 in Outdoor Gathering Intervention Factor could be significant as well.

4.4. Spatial–Temporal Effect

The spatial–temporal effects help us to understand how the pandemic spreads over time and space in Ontario. The infected risks in each public health unit can be calculated by exponentiating the parameters for space, time, space-time, relevant additional, or government interaction, as e x p ( θ i t ) . From Figure 3, we can see some regions in Southern Ontario with higher infection rates at some time point. At the beginning, the risks in the Northern area are relatively small, but they increase after January 2021, when the second wave started. Since the third wave, the infected current cases in Northern Ontario are more than in Southern Ontario.

5. Discussion and Conclusions

5.1. Prediction Performance on Bayesian Methods

We examined the prediction performance of our proposed models. We take the data until time T * and predict the infected risk within q bi-weeks ahead, i.e., y i t * for t = T * + 1 , …, T * + q . The prediction performance is evaluated using the absolute mean error of prediction (AMEP) given by
A M E P = 1 I q T * + 1 T * + q i = 1 I e x p ( θ i t ^ e x p ( θ i t )
The results for q = 1 and q = 2, i.e., predicting 1 and 2 time periods ahead, are given in Table 4 and Table 5 respectively. We choose T * = 30 or T * = 31 to give a sufficient observed data for prediction. For each T * and q, the AMEP is reported for the Model 1–4. Model 2 (corresponding to Type II interaction) performs better by giving the smaller AMEP value when T * = 30 and q = 2. Model 3 (corresponding to Type III interaction ) gives the smallest AMEP value when T * = 31 and q = 1. Generally, Model 2 (Type II interaction) has a better performance when doing longer-term predictions. Prediction performance with Leroux CAR specification for spatial component is not as good as BYM specification’s.

5.2. Estimation Performance on Poisson Kriging Models

We examine the estimation performance of Area-to-area (ATA) Poisson Kriging Model in use of Mean Absolute Error of Estimation (MAEE). Table 6 shows the result as follows:
M A E E = 1 I i = 1 34 ( r ^ P K ( v α ) z ( v α ) )
It can be seen from Table 6 that generally the estimation performance is good. The range in point support semivariograms from ATP Poisson Kriging Model can show the spatial clustering feature of COVID-19 spreading reliably. Table 7 shows different ranges in each time period:
In the case of semivariogram, for sample points with close distances, the difference in values between points tends to be small. In other words, the semivariance is small. But they are larger as the distances between sample points increase. It also means these point’s risk values are not correlated closely anymore [37]. The range represents the distance at which the model first flattens out. Therefore, range value can represent the spatial clustering feature. It can be seen that the largest range value is 355 km, which means the infected cases are clustered spatially within 355 km at most. Public health office can pay more attention to monitor the infected cases around each hotspot under 355 km.

5.3. Conclusions

Here are some conclusions made based on the previous analysis:
(1)
The Poisson Kriging methods display more clearly the spread of the virus spreads over space. Kriging methods in general produce a smoother surface, which leads to overestimated results at non-hot spot regions. Bayesian Spatial–temporal methods provide a more accurate predictions.
(2)
ATP Poisson Kriging model provides a good thought to downscale areal risk maps into point risk maps given a specific timeframe. The range values from point support semivariograms can be a good reference for government to monitor how the virus spreads around each hotspot at early stage of the pandemic.
(3)
The following government interventions were found to be significant: (a) levels 1–4 in non-essential services are sit-down dining and; (b) levels 2–4 in Outdoor Gathering Intervention.
(4)
The infected risk was nearly 0 over Ontario in March 2020. The second wave and third wave started from November 2020 and April 2021, respectively. At the start, the risks in the Northern area were relatively small, but they increased after January, 2021, when the second wave began. Since the third wave, the increment of infected current cases in Northern Ontario were more than in Southern Ontario.
(5)
The Bayesian Spatial–temporal Model 2 included interactions between unstructured spatial and structured temporal effects, whereas Model 3 included interaction between structured spatial and unstructured temporal effects.The results for short-term prediction were better for both Models.
(6)
The interactive website we developed displays the estimated risk maps by date and model choice: Bayesian Spatial–temporal or Kriging.
(7)
A Shinyapp website https://mujingrui.shinyapps.io/covid19 displays spatial–temporal maps exploiting these models.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/math11061359/s1, Table S1. Variables Representation in the Models, Table S2. Restriction Methods and Corresponding Levels, Table S3. Indoor Gathering Intervention Timeline Under Level in Ontario, Table S4. Outdoor Gathering Intervention Timeline Under Level in Ontario, Table S5. Non-essential Services Intervention Timeline Under Level in Ontario.

Author Contributions

Investigation, M.A. and J.M. All authors have read and agreed to the published version of the manuscript.

Funding

The research was partially supported by Natural Sciences and Engineering Research Council of Canada [grant number OGP0009068].

Data Availability Statement

The dataset is accessible through Statistics Canada.

Conflicts of Interest

We declare no conflict of interest.

References

  1. Park, H.; Kim, S.H. A study on herd immunity of COVID-19 in South Korea: Using a stochastic economic-epidemiological model. Environ. Resour. Econ. 2020, 76, 665–670. [Google Scholar] [CrossRef]
  2. Sarkar, K.; Khajanchi, S.; Nieto, J.J. Modeling and forecasting the COVID-19 pandemic in India. Chaos Solit. Fractals 2020, 139, 110049. [Google Scholar] [CrossRef] [PubMed]
  3. Taboe, H.B.; Salako, K.V.; Tison, J.M.; Ngonghala, C.N.; Kakaï, R.G. Predicting COVID-19 spread in the face of control measures in West Africa. Math. Biosci. 2020, 328, 108431. [Google Scholar] [CrossRef] [PubMed]
  4. Zhao, Z.; Li, X.; Liu, F.; Zhu, G.; Ma, C.; Wang, L. Prediction of the COVID-19 spread in African countries and implications for prevention and control: A case study in South Africa, Egypt, Algeria, Nigeria, Senegal and Kenya. Sci. Total Environ. 2020, 729, 138959. [Google Scholar] [CrossRef]
  5. Goovaerts, P. Geostatistical analysis of disease data: Accounting for spatial support and population density in the isopleth mapping of cancer mortality risk using area-to-point Poisson kriging. Int. J. Health Geogr. 2006, 5, 7. [Google Scholar] [CrossRef] [Green Version]
  6. Goovaerts, P. Geostatistical analysis of disease data: Estimation of cancer mortality risk from empirical frequencies using Poisson kriging. Int. J. Health Geogr. 2005, 4, 1–31. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  7. Goovaerts, P. Geostatistical analysis of disease data: Visualization and propagation of spatial uncertainty in cancer mortality risk using Poisson kriging and p-field simulation. Int. J. Health Geogr. 2006, 5, 1–26. [Google Scholar] [CrossRef] [Green Version]
  8. Besag, J.; York, J.; Mollie, A. Bayesian image restoration with two applications in spatial statistics. Ann. Inst. Stat. Math. 1991, 43, 1–20. [Google Scholar] [CrossRef]
  9. Best, N.G.; Richardson, S.; Thomson, A. A comparison of Bayesian spatial models for disease mapping. Stat. Methods Med. Res. 2005, 14, 35–59. [Google Scholar] [CrossRef]
  10. MacNab, Y.C. On Gaussian Markov random fields and Bayesian disease mapping. Stat. Methods Med. Res. 2011, 20, 49–68. [Google Scholar] [CrossRef]
  11. Martínez-Bello, D.; López-Quílez, A.; Prieto, A.T. Spatiotemporal modeling of relative risk of dengue disease in Colombia. Stoch. Environ. Res. Risk Assess. 2018, 32, 1587–1601. [Google Scholar] [CrossRef]
  12. Ugarte, M.D.; Adin, A.; Goicoa, T.; Militino, A.F. On fitting spatio-temporal disease mapping models using approximate Bayesian inference. Stat. Methods Med. Res. 2014, 23, 507–530. [Google Scholar] [CrossRef] [PubMed]
  13. Brooks, S.; Gelman, A.; Jones, G.; Meng, X.L. (Eds.) Handbook of Markov Chain Monte Carlo; CRC Press: Boca Raton, FL, USA, 2011. [Google Scholar]
  14. Martínez-Bello, D.A.; López-Quílez, A.; Torres Prieto, A. Relative risk estimation of dengue disease at small spatial scale. Int. J. Health Geogr. 2017, 16, 31. [Google Scholar] [CrossRef] [PubMed]
  15. Robert, C.P.; Casella, G. The Metropolis—Hastings Algorithm. In Monte Carlo Statistical Methods; Springer: New York, NY, USA, 2004; pp. 267–320. [Google Scholar]
  16. Rue, H.; Martino, S.; Chopin, N. Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations. J. R. Stat. Soc. Series. B. Stat. Methodol. 2009, 71, 319–392. [Google Scholar] [CrossRef]
  17. Lee, D. A comparison of conditional autoregressive models used in Bayesian disease mapping. Spat. Spatio-Temporal Epidemiol. 2011, 2, 79–89. [Google Scholar] [CrossRef] [PubMed]
  18. Leroux, B.G.; Lei, X.; Breslow, N. Estimation of disease rates in small areas: A new mixed model for spatial dependence. In Statistics Models in Epidemiology, the Environment and Clinical Trials; Springer: New York, NY, USA, 2000; pp. 179–191. [Google Scholar]
  19. Schrödle, B.; Held, L.; Riebler, A.; Danuser, J. Using integrated nested laplace approximations for the evaluation of veterinary surveillance data from Switzerland: A case-study. J. R. Stat. Soc. C Appl. Stat. 2011, 60, 261–279. [Google Scholar] [CrossRef]
  20. Schrödle, B.; Held, L. Spatio-temporal disease mapping using INLA. Environmetrics 2011, 22, 725–734. [Google Scholar] [CrossRef]
  21. Briz-Redón, Á. The impact of modelling choices on modelling outcomes: A spatio-temporal study of the association between COVID-19 spread and environmental conditions in Catalonia (Spain). Stoch. Environ. Res. Risk Assess. 2021, 35, 1701–1713. [Google Scholar] [CrossRef]
  22. Jaya, I.G.N.M.; Folmer, H. Bayesian spatiotemporal forecasting and mapping of COVID-19 risk with application to West Java Province, Indonesia. J. Reg. Sci. 2021, 61, 849–881. [Google Scholar] [CrossRef]
  23. Jaya, I.G.N.M.; Folmer, H. Spatiotemporal high-resolution prediction and mapping: Methodology and application to dengue disease. J. Geogr. Syst. 2022, 24, 527–581. [Google Scholar] [CrossRef]
  24. Martins, T.G.; Simpson, D.; Lindgren, F.; Rue, H. Bayesian computing with INLA: New features. Comput. Stat. Data. Anal. 2013, 67, 68–83. [Google Scholar] [CrossRef] [Green Version]
  25. Knorr-Held, L. Bayesian modelling of inseparable space-time variation in disease risk. Stat. Med. 2000, 19, 2555–2567. [Google Scholar] [CrossRef] [PubMed]
  26. Schrödle, B.; Held, L. A primer on disease mapping and ecological regression using INLA. Comput. Stat. 2011, 26, 241–258. [Google Scholar] [CrossRef] [Green Version]
  27. Clayton, D. Generalized linear mixed models. In Markov Chain Monte Carlo in Practice; Gilks, W., Richardson, S., Spiegelhalter, D., Eds.; Chapman and Hall: London, UK, 1996; pp. 275–301. [Google Scholar]
  28. Blangiardo, M.; Cameletti, M.; Baio, G.; Rue, H. Spatial and spatio-temporal models with R-INLA. Spat. Spatio-Temporal Epidemiol. 2013, 4, 33–49. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  29. Kyriakidis, P.C. A geostatistical framework for area-to-point spatial interpolation. Geogr. Anal. 2004, 36, 259–289. [Google Scholar] [CrossRef]
  30. Goovaerts, P. Kriging and Semivariogram Deconvolution in the Presence of Irregular Geographical Units. Math. Geosci. 2008, 40, 101–128. [Google Scholar] [CrossRef] [Green Version]
  31. Journel, A.G.; Huijbregts, C.J. Mining Geostatistics; Academic Press: London, UK, 1978. [Google Scholar]
  32. Goovaerts, P.; Gebreab, S. How does Poisson kriging compare to the popular BYM model for mapping disease risks? Int. J. Health Geogr. 2008, 1, 6. [Google Scholar] [CrossRef] [Green Version]
  33. Molinski, S. Pyinterpolate: Spatial Interpolation in Python for point measurements and aggregated datasets. J. Open Source Softw. 2022, 7, 2869. [Google Scholar] [CrossRef]
  34. Spiegelhalter, D.J.; Best, N.G.; Carlin, B.P.; Van Der Linde, A. Bayesian measures of model complexity and fit. J. R. Stat. Soc. 2002, 64, 583–639. [Google Scholar] [CrossRef] [Green Version]
  35. Gelman, A.; Hwang, J.; Vehtari, A. Understanding predictive information criteria for Bayesian models. Stat. Comput. 2014, 24, 997–1016. [Google Scholar] [CrossRef]
  36. Watanabe, S.; Opper, M. Asymptotic equivalence of Bayes cross validation and widely applicable information criterion in singular learning theory. J. Mach. Learn. Res. 2010, 11. [Google Scholar]
  37. GISGeography. Semi-Variogram: Nugget, Range and Sill. 2022. Available online: https://gisgeography.com/semi-variogram-nugget-range-sill/ (accessed on 9 November 2022).
Figure 1. Summary of the Estimates Obtained for the Coefficients Associated with Indicators for Model 2.
Figure 1. Summary of the Estimates Obtained for the Coefficients Associated with Indicators for Model 2.
Mathematics 11 01359 g001
Figure 2. Summary of the Estimates Obtained for the Intervention Factors Associated with Covariates for Model 2.
Figure 2. Summary of the Estimates Obtained for the Intervention Factors Associated with Covariates for Model 2.
Mathematics 11 01359 g002
Figure 3. Evolution of the Infected Risks at the Public Health Unit Level (Model 2).
Figure 3. Evolution of the Infected Risks at the Public Health Unit Level (Model 2).
Mathematics 11 01359 g003
Table 1. Specification of the Four Types of Spatial–temporal Interaction.
Table 1. Specification of the Four Types of Spatial–temporal Interaction.
Type of Spatial–Temporal Interaction R δ
I I s I t
II I s R t
III R s I t
IV R s R t
Table 2. Model Specification.
Table 2. Model Specification.
ModelsComponents
Model 1 α + j = 1 3 β I G j G i j t + j = 1 4 β E j E i j t + j = 1 4 β O G j B i j t + β X i + ξ i + γ t + ϕ t + δ i t (Type I)
Model 2 α + j = 1 3 β I G j G i j t + j = 1 4 β E j E i j t + j = 1 4 β O G j B i j t + β X i + ξ i + γ t + ϕ t + δ i t (Type II)
Model 3 α + j = 1 3 β I G j G i j t + j = 1 4 β E j E i j t + j = 1 4 β O G j B i j t + β X i + ξ i + γ t + ϕ t + δ i t (Type III)
Model 4 α + j = 1 3 β I G j G i j t + j = 1 4 β E j E i j t + j = 1 4 β O G j B i j t + β X i + ξ i + γ t + ϕ t + δ i t (Type IV)
Table 3. DIC and WAIC Values under RW1.
Table 3. DIC and WAIC Values under RW1.
ModelsBYM SpecificationLeroux CAR Specification
DICWAICDICWAIC
Type I Interaction8434.358211.218454.028361.58
Type II Interaction8410.878266.668392.168247.07
Type III Interaction8448.858239.208432.598220.86
Type IV Interaction8442.718331.018423.188200.98
Bold faced values in the table indicate that Type II Interaction has a relatively better performance.
Table 4. Results of 1-biweek Ahead Prediction for Models 1–4.
Table 4. Results of 1-biweek Ahead Prediction for Models 1–4.
ModelsBYM SpecificationLeroux CAR Specification
Type I63.331363.7554
Type II30.832359.9799
Type III27.510578.0949
Type IV76.094381.1585
Table 5. Results of 2-biweeks Ahead Prediction for Models 1–4.
Table 5. Results of 2-biweeks Ahead Prediction for Models 1–4.
ModelsBYM SpecificationLeroux CAR Specification
Type I84.772281.8366
Type II17.708853.0448
Type III27.2464113.6560
Type IV75.536176.2468
Table 6. Estimation Performance with ATA Poisson Kriging Model.
Table 6. Estimation Performance with ATA Poisson Kriging Model.
DateMAEEDateMAEEDateMAEEDateMAEE
31 March 20209.158314 April 202027.573130 November 202045.218514 December 202051.2377
30 April 202018.198514 May 20209.917231 December 202081.521814 January 202175.8301
31 May 202014.449514 June 202014.069931 January 202156.290114 February 202127.8829
30 June 202010.980614 July 20204.954828 February 202137.833814 March 202154.8017
31 July 20209.801514 August 20209.188731 March 202173.294414 April 202176.0162
31 August 20205.909114 September 20206.136730 April 202172.646614 May 202178.9172
30 September 202012.440514 October 202015.283331 May 202195.215314 June 202173.0008
31 October 202023.978914 November 202031.662631 June 202149.7580--
Table 7. Ranges in Point Support Semivariograms.
Table 7. Ranges in Point Support Semivariograms.
DateRangeDateRangeDateRangeDateRange
31 March 202075.4 km14 April 2020114.9 km30 November 202044.0 km14 December 202073.6 km
30 April 202028.0 km14 May 2020123.7 km31 December 2020151.0 km14 January 2021136 km
31 May 2020240.0 km14 June 2020108.4 km31 January 2021106.0 km14 February 2020101.6 km
30 June 2020126.8 km14 July 202080.5 km28 February 2021151 km14 March 2021247.5 km
31 July 2020116.7 km14 August 2020133.8 km31 March 202176 km14 April 2021121 km
31 August 2020102.0 km14 September 202053.0 km30 April 202184.8 km14 May 202184.4 km
30 September 202087.6 km14 October 2020104.8 km31 May 2021355.9 km14 June 2021272.2 km
31 October 2020104.4 km14 November 202072.8 km30 June 202174.3 km--
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Alvo, M.; Mu, J. COVID-19 Data Analysis Using Bayesian Models and Nonparametric Geostatistical Models. Mathematics 2023, 11, 1359. https://doi.org/10.3390/math11061359

AMA Style

Alvo M, Mu J. COVID-19 Data Analysis Using Bayesian Models and Nonparametric Geostatistical Models. Mathematics. 2023; 11(6):1359. https://doi.org/10.3390/math11061359

Chicago/Turabian Style

Alvo, Mayer, and Jingrui Mu. 2023. "COVID-19 Data Analysis Using Bayesian Models and Nonparametric Geostatistical Models" Mathematics 11, no. 6: 1359. https://doi.org/10.3390/math11061359

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop