A Review of Spatiotemporal Models for Count Data in R Packages. A Case Study of COVID-19 Data

Several R-packages to deal with spatiotemporal count areal data are currently available. Each package is based on different models and/or statistical methodologies. Unfortunately, results generated by these models are rarely comparable due to differences in notation and methods which ultimately hinder the robustness and effectiveness of such models. These models have become crucial nowadays, because their ability to analyze COVID-19 related data. This prompted us to revisit three spatiotemporal count models and monitor their performance under the same conditions (on a same data set and under a consensus notation). The three different models assessed are representative of the main two extended methodologies used to analyse spatiotemporal count data i.e., the classical approach (based on Penalised Likelihood meanwhile and based to Estimating Equations) and the Bayesian point of view. As a case study, we have used these packages to analyze COVID-19 data corresponding to 24 health regions within the Valencian Community, Spain. In particular, daily COVID-19 positive individuals are used to predict daily hospitalisations. Data corresponds to the period from 28 of June to 13 of December 2020.


Introduction
Since December 2019, when the first cases of COVID-19 were reported in Wuhan, China, the SARS-CoV-2 virus has spread worldwide.Up to now the virus has caused more than 100 million infections and 2,16 millions of deaths around the world and, currently, it is impossible to predict how many people will be affected by it.
An effective way to control the spread of the infection is to understand and predict not only the number of cases and COVID-19 severity in a given area at a given time but also where and when the spread may become uncontrollable bringing health care systems to the brink of collapse due to an increase of the number of patients needing hospitalisation.Robust prediction models are therefore vital to support decisions on population and community-level interventions to control the spread of the virus and to prevent the collapse of health services.
The robustness of a model is intrinsically linked to the data quality, that is, the availability of reliable and comparable count data from across the different countries world-wide.Moreover, given the contribution of people's motility to the spread of the virus, these models should be able to adjust for people's motility within and between neighbouring health regions or constituencies as well as nation-wide and/or worldwide.
Epidemic models to study the spread of infectious diseases date back to the beginning of the twentieth century.The susceptible-infected-recovered (SIR) models developed by Kermack and McKendrick ([Kermack and McKendrick(1927)]) were the first mathematical models developed to study the transmission dynamics of infectious diseases.The SIR and SEIR (Susceptible, Exposed, Infectious and Removed) models have been improved and used for analysing and characterising the COVID-19 epidemic ([Fang et al.(2020)Fang, Nie, and Penny, Kucharski et al.(2020) Kucharski, Russe Tang et al.(2020)Tang, Wang, Li, Bragazzi, Tang, Xiao, and Wu]) also in Spain ([Guirao(2020), López and Rodo(2020)]).Predicting the future course of epidemics has been another great challenge through time and have become particularly challenging with the rise of new infectious diseases [Held and Meyer(2019)].
In addition to the crucial role played by those epidemic models described above, other models are also being adapted to examine different aspects of the COVID-19 pandemic [Fronterre et al.(2020) Fronterre, Read, Rowlingson, Bridgen, Alderton, Diggle, and Jewel Dunbar and Held(2020)].In particular, mathematical modelling of patient hospitalisation is essential as it may help raising awareness of a possible collapse of the health care systems.Models for this type of data are rather different from the epidemic models (SIR and SEIR) as the prediction of hospitalisations requires previously obtained COVID-19 data such as the number of people tested and/or infected, that is, count data.
It is well known that for count data modelling, the linear model with normal assumption is not appropriate and generalised linear models with Poisson, binomial or negative binomial distributions should be used ( [Nelder and Wedderburn(1972)]).
The reasons to model these data are diverse and can range from estimating the effect of a risk factor to a response, identifying clusters of contiguous areal units or to forecasting future observations.To deal with this array of scenarios comprising spatiotemporal data different modelling strategies have been proposed.Strategies become more complex when the aim is to build a spatiotemporal model for the joint analysis of different 58 variables which include specific and shared spatial and temporal effects [Gómez-Rubio et al.(2019)Gómez-Rubio, Palmí-Perales, López-Abente, Ram A common challenge when modelling count data is that of spatiotemporal autocorrelation, namely that observations from geographically close areal units and temporally close time periods tend to have more similar values than units and time periods that are further apart.In addition, due to the complicated correlation structure, the parameter estimation is not straightforward and different approaches are needed.
In the context of COVID-19 data we have to manage observations over time in several geographical areas.In particular, this review deals with areal unit data, which is a type of data where observations are related to a set of K contiguous but non-overlapping spatial areal units, such as health departments.Each observation relates to an entire areal unit and in this case referred as aggregated count measures for the entire unit.So we have a spatiotemporal count data (number of new infected people in a health department) to predict the number of COVID-19 hospitalisations.
The diversity of applications, data types and conceptual approaches leads us to a broad spatiotemporal modelling literature.Two excellent books that provide a gradual entry to the methodological aspects of spatiotemporal statistics and outline some of the standard techniques used in this area are [Cressie and Wikle(2015)] and [Wikle et al.(2019)Wikle, Zammit-Mangion, and Cressie].An overview of different spatiotemporal modelling approaches can also be found in [Anderson and Ryan(2017)].
Here we revisit three different models which are representative of the three more extended methodologies to analyse this type of data.The two first follow the classic statistical paradigm and the last the Bayesian point of view (See [Berger(2013)] for a survey of the hierarchical Bayesian approach).Of the two classical approaches one is based on Penalised Likelihood meanwhile and the other one on Estimating Equations [Liang and Zeger(1986)].
This work does not claim to provide a comprehensive coverage of all existing methods to deal with count data but to describe and compare the tree most relevant approaches that can be found implemented R packages ([R Core Team(2020)]).
The performance of the existing models to predict hospitalisation can only be fully assessed and compared when each model is tested against the same data set under a unique consensus.So, we applied new daily positive cases and daily hospitalisations on the 24 health department where Valencian Community, Spain, is divided to the models above explained.
The article is organised as follows: in Section ?? a descriptive analysis of the data is perform.Following by the mathematical details of the three models, Section 2. Then the application to COVID-19 data of the three models, Section 3. Finally the conclusions are discussed in Section 4.
In this section, we describe a spatiotemporal approach to monitoring COVID-19 (daily new cases and total number of people hospitalised) on Valencia Community, Spain, we explore some possible temporal biases that data may have.This section begins by describing the data sources, before describing the analytic model and fitting process.
The data set comprises the number of daily new positive cases of COVID-19 (tested via PCR (polymerase chain reaction) or the antigen test), and the daily number of people at hospital by COVID-19 (daily hospitalisations) in the Valencian Community, that comprises a big area in the eastern part of Spain, during 533 days.This area is organised in K = 24 different health departments, and data set is conformed with both temporal series (hospitalised/new positive cases) for all departments.The spatial distribution of the health departments so as their population (× 100000 people) are shown in Figure 1, as can be observed the populations across regions is very heterogeneous.However, the number of daily hospitalisations is public only at the aggregated level of the entire Valencian Community, so, as a result we cannot show maps nor identifiable information of this variable.As an illustration, in this work we will show the data related to eight of the 24 health departments.These departments have been anonymized, and will be used to illustrate all the steps of the different analysis.However, the models are fitted using the whole data set and the estimations of the parameters and the goodness of fit measurements showed in the paper are related to all of them.
Although these data may be subject to temporal biases due to changing testing regimes, among other problems, the spatial weekly case prevalence (number of cases divided by population size) for weeks 27, 38, 49 plotted in Figure 2 shows strong variation over Valencian Community.The goal of this paper is to predict the number of hospitalisation in time given the number of verified infected people in each of the 24 regions.Figure 3 a) and b) shows time series plots for both variables of interest; the number of daily new positive cases and the number of daily hospitalisations per health department.From now on, with daily hospitalizations we will mean total number of people at hospital by COVID-19 each day.Figure 3 a) shows a peak of new cases in late August, and another in mid-November, two and a half moths apart, and the same temporal pattern is showed in the series of hospitalisations.Figure 3 c) shows the total number of new positive cases and hospitalisations per day (adding the values of the 24 health departments).From Figure 3 c) we can see that there is a time lag between both time series.To estimate the delay between both time series a cross-correlogram (see Figure 4) have been used.It plots a measure of similarity of both time series (in these case Pearson correlation), as a function of the displacement (days) of daily positives relative to the daily hospitalisations.Figure 4 shows that there are two maximums in the correlation at the time lag of 9 and 5 days.
These data may be subject to temporal biases due to under reporting on weekend and/or non-working days.Figure 5 a) shows a great variability in number of positives, depending on day of the week.As we can see in Figure 5 b) this effect does not hold for number of daily hospitalisations.So, this effect is an artefact, due to when the official data is reported, rather than a real effect of the virus.To minimise this effect, henceforth, we will take as a daily hospitalisations, the mean of the last 4 days.By doing this smoothing we reduce this effect, as we can see from Figure 5 c).
Finally, both effects are corrected.Figure 6 shows the smoothed number of positive daily cases (in red) jointly with the number of people hospitalised by COVID-19 (in blue) for the eight illustrative health departments, with a time delay of 9 days between both series.The spatial adjacence matrix among these departments is shown in table 1.In this table values equal to 1 signal neighbouring regions, i.e. those that share a geographical border.
To conclude, when looking for the best model, we explored the relationship between the mean and the variance of the hospitalisation data collected at each instant of time.Figure 7, shows that there exist a potential relationship between mean and variance.This relationship is crucial for the modelling as we will see in the following sections.
On its current formulation and implementation in the R package surveillance ( [Meyer et al.(2016)Meyer, Held, and Höhle]), the EE framework uses incidence from the preceding week, t − 1, to explain the incidence in week t.So, the counts, Y kt |Y t−1 , are assumed to be Poisson or Negative Binomial distributed with conditional mean: and overdispersion parameter, in the Negative Binomial case, ψ k > 0.
The first component of the summation is called endemic component and captures information not directly linked to observed cases from the previous day.This component can cover exogenous factors such as temporal trends, seasonality, sociodemography, and/or population.As an example, in spatial applications, e kt can refer to the fraction of the population living in region k at time t.The remaining terms in Eq. 1 constitute the epidemic component and describe how the incidence in region k is linked to previous cases in the same and in the adjacent regions.The two terms of this epidemic component are usually denoted as 'autoregressive' and 'spatiotemporal' component, respectively.
The parameters ν kt , λ k and φ k are constrained to be non-negative and can be modelled by allowing for log-linear predictors in all three components, as sinecosine terms to account for seasonality [Held and Paul(2012)], long-term temporal trends or/and covariates [Bauer and Wakefield(2018), Cheng et al.(2016) Cheng, Lu, Wu, Liu, and Huang].
This form allows for fixed intercepts α (.) , region-specific intercepts b k , can be treated as fixed effects or as random effects accounting for heterogeneity between the regions.When they are treated as random effects, they are assumed to be independent and identically distributed across k, but can be correlated across the model components, following a Gaussian distribution: We will see this part in a more detailed way in Section 3.1.
Mamimun likelihood (ML) estimates are obtained using penalised likelihood approaches.
This basic model, has been extended to cover other different aspects of disease modelling (see [Dunbar and Held(2020)] for references there in).Recent extensions include methodology to adjust for under reporting ( [Dunbar and Held(2020), Bracher(2019a)]), or to allow different lags in the auto-regressive part of the model (package hhh4addon ([Bracher(2019b), Bracher and Held(2017) where D is the maximum lag considered.
In [Meyer and Held(2017)], authors extend the basic endemic-epidemic spatiotemporal model to fit multivariate time series of counts y gkt stratified by (age) groups in addition to spatial regions.So they define a contact matrix C = (c g g ), where c g g ≥ 0 quantifies the average number of contacts of an individual of group g with individuals of group g, and the spatiotemporal model is now modellized as: where both the endemic and epidemic predictors may gain group-spacific effects.This model is implemented in the R-package hhh4contacts ([Bracher(2019c)]).

Forecast
Package surveillance uses the function hhh4 to fit the models, and implements the function oneStepAhead() that computes successive one-step-ahead predictions for the fitted model, providing also confident intervals for the predictions and plot methods.The associated scores-method computes a number of (strictly) proper scoring rules based on such one-step-ahead predictions; see [Paul and Held(2011)] for details.
A discussion of suitable measures to evaluate the quality of a point forecast can be found in [Gneiting(2011)] and several scoring rules based on the onestep-ahead predictions [Paul and Held(2011)] are implemented in the function scores, although we will consider the root mean squared error (RMSE).Other function implemented in the package related with the function oneStepAhead() is the function calibrationTest that implements calibration tests for Poisson or Negative Binomial predictions of count data are based on proper scoring rules and described in detail in [Wei and Held(2014)].
Long term predictions do not have much sense in our context because we do not know the long term evolution of the covariates.

Multivariate covariance generalised linear models. R package Mcglm
Under the same previous assumption of predicting Given Σ t the K × K covariance matrix within the response variable Y t for t = 1, • • • , N and Σ b be the N × N correlation matrix whose components denote the correlation between outcomes, the McGLM as proposed by [Bonat and Jørgensen(2016)] is given by where g t are monotonic differentiable link functions, X t denotes an K ×k t design matrix, β t is a regression parameter vector to be estimated, and is the generalised Kronecker product [Martinez-Beneito(2013)].The matrix Σt denotes the lower triangular matrix of the Cholesky decomposition of Σ t .The operator Bdiag denotes a block diagonal matrix, and I denotes an K × K identity matrix.
A key point for specifying McGLMs for mixed types of outcomes is the specification of the covariance matrix within outcomes Σ k .Following [Bonat and Jørgensen(2016)] we define the covariance within outcomes by for modelling count outcomes they propose to adopt the Poisson-Tweedie dispersion function [Jørgensen and Kokonendji(2016)] so that is a diagonal matrix whose main entries are given by the power variance function.
The Poisson-Tweedie family of distributions provide a rich class of models to deal with count outcomes, since many important distributions appear as special cases, examples include the Hermite (p = 0), Neyman Type A (p = 1), Negative Binomial (p = 2) and Poisson-inverse Gaussian (p = 3).The dispersion matrix Ω(τ t ) describes the part of the covariance within outcomes that does not depend on the mean structure.Jorgensen et al. [Jørgensen and Kokonendji(2016)] among others, propose to model the dispersion matrix using a matrix linear predictor combined with a covariance link function, i.e.
where h is the covariance link function, Z td with d = 0, • • • , D are known matrices reflecting the covariance structure within the response variable Y t , and McGLMs are fitted based on the estimating function approach described in detail by [Bonat and Jørgensen(2016)]and [Jørgensen and Knudsen(2004)].A general overview of the algorithm and the asymptotic distribution of the estimating function estimators can be found in [Bonat(2018)].As a method to select the components of the matrix linear predictor (variable selection), the score information criterion (SIC) is proposed.This is an important tool to assist with the selection of the linear and matrix linear predictors components, but unfortunately it is less useful for comparing models fitted using different link, variance or covariance functions.

Forecasting
Unfortunately the mcglm package does not have implemented any function to predict future observations.If we don't know to implement any more sophisticated routine, once estimated the model, we can use the function mc link function that returns the inverse of the link function applied on the linear predictor i.e. µ = g −1 (Xβ), as an approximation of the predictions sought.

Bayesian Hierarchical generalised linear models. CARst package
A great variety of spatiotemporal models for count data using generalised linear models (GLM) can be found on the Bayesian literature.To modelling these data a hierarchical model with spatiotemporal structured prior distributions is used.The spatiotemporal structure is modelled via sets of autocorrelated random effects with conditional autoregressive and its spatiotemporal extensions priors.
An excellent review can be found in The general Bayesian Hierarchical model for spatiotemporal count data is as following: The probability function f is in the exponential family (not necessarily a Gaussian distribution), β is the vector of covariate regression parameters, and a multi-variate Gaussian prior is assumed.g can be any monotonic differentiable link function, and ψ kt is a latent component for areal unit k and time period t encompassing one or more sets of spatiotemporally autocorrelated, we denote ψ t = (ψ 1t , . . ., ψ Kt ).In this paper we are just focusing on the models implemented in the CAR-BayesST package.In this package binomial, Gaussian and Poisson data models can be used for the first level of the model, f in Eq. 10, and different spatiotemporal structures for ψ kt in Eq.11 are given.
All models implemented in this package use random effects to introduce spatial autocorrelation into the response variable.For this purpose, CAR-type prior distributions and their space-time extensions are used.Spatial autocorrelation is induced via a non-negative symmetric matrix of adjacency W = (w kj ), where w kj represents the spatial closeness between units (S k , S j ).Larger valued elements represent spatial closeness between the two areas in question, and spatially autocorrelated random effects whereas zero values correspond to areas that are not spatially close are conditionally independent random effects given the remaining.The models are outlined in Table 2 extracted of [Lee et al.(2018) Lee, Rushworth, and Napier] In all cases inference is based on Markov chain Monte Carlo (MCMC) simulation.

Forecasting
In order to predict future observations of the response variable, simulations from the posterior predictive distribution (the distribution of possible unobserved values conditional on the observed values) have to be obtained.
This predictive density can be approximated by Monte Carlo integration.If we denote by θ the vector with all the parameters of the model: If a representative value is wanted, the mean of the predictive density can be obtained taking into account the property that E θ E(Y N +h | θ)) and approxi-mating again the mean with respect to θ: Unfortunately the CARBayesST package has not implemented a function to predict future observations.for h=1 is easily implemented using the samples of the posterior distributions of the parameters.For h > 1 the simulation of the posterior distribution of the random effects should be implemented.For non experts in R programming users just a approximations can be obtained approximating the value of the random effect by the that of the previous prediction.This approximation will be used in Section 3.3.

Application
In this section all the models reviewed in Section 2 are applied to COVID-19 data described in Section ??.Following with the notation introduced, Y t = (Y 1t , . . ., Y Kt ) refers to the observed data of hospitalisations at time t in the K health departments of Valencian Community, X t = (X 1t , . . ., X Kt ) refers to the positive cases at time t − lag, where lag is the lag previously determined and p k is the population in risk in the k-th health department.
Since our primary goal is forecasting, the mean squared errors of the predictions (RMSE) up to a five-day horizon are calculated.This is a classical approach to measure their performance.If we want to use real data of positive cases the horizon of prediction is limited, taking a horizon of five is considered enough, within our possibilities, to prevent the collapse of hospitals.
Each model is adjusted using the corresponding statistical methodology included in its corresponding R package.Additional specific measures of goodness of fit are given, these other measures will be useful to compare different models within the same R package.

Endemic-epidemic models
As stated in Section 2.1, let us consider that the counts of daily hospitalisations at the k−th health department, the t−th day, Y kt , follows a Poisson distribution with mean as in Eq.1.
If p k denotes the population of the k−th health department, we assume in Eq. 1 known population fractions and that w qk = I(q ∼ k), i.e, w qk = 1 if both health districts have a common geographic border (assuming that epidemic only arrives from adjacent health areas) and 0 otherwise.Weights w qk = w qk q w qk are normalised and restricted to be positive.
Covariates such as number of positive cases, can be added to the model in different ways ([Herzog et al.(2011)Herzog, Paul, andHeld, Meyer et al.(2016)Meyer, Held, andHöhle]).The simplest is to include the covariates x kt , in the formulation in the endemic part of the model, for example considering: • Model 1: We are going to consider two possibilities regarding to the covariates.Case 1: consider as covariate the smoothed number of new positive cases at a lag 9. Case 2: consider as covariates the smoothed number of positive new cases at a lag 9 and the smoothed number of new positive cases at a lag 5.Many works include a seasonal effect in the model of the parameters log(ν kt ), log(λ k ) and/or log(φ k ) but we would expect that this seasonal effect be included in the covariates (time series of positive cases), so we include it only in the model of log(λ k ) and log(φ k ).
All unknown parameters are estimated directly by maximising the corresponding log-likelihoods using numerical optimisation routines (see [Paul et al.(2008)Paul, Held, and Toschke]).The estimates obtained so as several goodness of fit measurements for this model are shown in Table 3.
Once estimated the parameters of the model, the fitted mean can be compared with the observed counts in order to check the goodness of the fit, but additionally, we can see the contribution to this fitted mean of the endemic and autorregressive components.The average of the proportions of the mean explained by the different components are also shown in Table 3.Note that the proportion explained by the epidemic component is in both cases around 97% , being with difference the component with a greater influence in the value of the total fit.So, there is a high influence of the within -health area autoregressive component with very little contribution of adjacent areas and a rather small endemic incidence.
We have assumed a Poisson distribution to model the observations but as seen in Figure 7 there is a clear overdispersion in the data set.To account this overdispersion, the Poisson distribution may be replaced by two alternatives included in the hhh4 function: 'NegBin1', a Negative Binomial model with a common overdispersion parameter ψ for all districts and 'NegBinM' with different overdispersion parameters (ψ i ) for the different health areas, but these distributions does not improve the fit (see This basic model can be refined.Paul et al. [Paul and Held(2011)] introduced random effects for endemic-epidemic models, which are useful if the districts exhibit heterogeneous incidence levels not explained by observed covariates allowing for district-specific intercepts in the endemic or epidemic components.Due to the heterogeneity shown in the different health departments, Model 2 considers a baseline specific district incidence, α ν k ; the population fraction e k has been included as a multiplicative offset and α φ k reflects the mean spatial force of influence of the neighbouring health districts.
Both α ν k and α φ k have been modelled as fixed effects.
• Model 2: So, considering a Poisson distribution for the observations, goodness of fit measures of the Model 2 are shown in Table 5 so as the contribution of the three components of the mean in the global fit.Once again we show the results considering as covariates the smoothed number of positive new cases at a lag 9 (Model 2.1) and the smoothed number of new positive cases at a lags 9 and 5 (Model 2.2).Individual estimates for the parameters are not shown now, because the quantity of parameters to estimate.
As can be seen in Tables 3 and 5, there are no much differences between the four models.In all of them, the largest portion of the fitted mean results from the within-district autoregressive component (among 94 and 97%) with very little contribution of cases from adjacent districts and a rather small endemic incidence.There are also no great differences in the goodness of fit parameters provided by the Likelihood inference (Log-likelihood, Akaike Information Criteria (AIC), Bayesian Information Criteria (BIC)) or in the values of the RMSE.
Figure 8 shows 5-days predictions obtained with Model 2.2, jointly with the fitted and the true observed values.

Multivariate Covariance generalised Linear Models
In this section we apply the MCGLM approach to analyse the multivariate count data set that was presented in Section ??, Eq. 5.
As seen in Eq. 5, MCGLM take non-normality into account defining a variance function, and modelling the mean structure by means of a link function and a linear predictor.In this application we have daily observations on 24 health departments (K = 24) on N = 533 consecutive days.
Despite the previous model, now we are not dealing with an autoregressive model, but this autorregressive effect can be included in the linear predictor expression together with other covariates.
with g() being in this case the log-link function, and e kt the offset.
A key point for specifying McGLMs for mixed types of outcomes is the specification of the covariance matrix within outcomes Σ k .Following [Bonat and Jørgensen(2016)] we define the covariance within outcomes by The matrix linear predictor is defined as: h(Ω(τ t )) = τ 0 I n×n + τ 1 Z 1 + τ 2 Z 2 , where n denotes the total number of observations in the data set and I n×n is the n × n identity matrix.τ 0 is the intercept of the covariance linear model.If n t denotes the number of observations in time in each spatial region, Z 1 = I 24×24 G Γ t , where G denotes the Kronecker product and Γ t = (γ(i, j)) i,j∈{1,••• ,nt} with γ(i, j) = 1 if j ∈ {i − 1, i + 1} and 0 otherwise.τ 1 measures the effect of the 'time'.Finally, Z 2 = W G I nt×nt with W being the spatial adjacency matrix between the 24 health departments.
In this case we have obtained the best fits, when using as h() the exponential covariance link function.
To define V (µ t ; p t ), we will adopt the Poisson-Tweedie dispersion function ([Jørgensen and Kokonendji(2016)]) so that in agreement with the plot of Figure 7.
We employed a step-wise procedure for selecting the components of the linear predictor.As in the previous modelling, we are going to consider as potential covariates in Eq. 14 the number of new positive cases at a lag 9, the number of new positive cases at a lag 5, the number of hospitalizations at lag 1, and an additional categorical covariate 'Health Department' to allow different intercepts (β 0k instead of β 0 ) in Eq. 14.The population of each health district will also be used as an offset.The SIC using penalty δ = 2 and the Wald test were used in the forward and backward steps, respectively.We defined a stopping criterion for the selection procedure as SIC > 0, since in that case the penalty is larger than the score statistics.
The SIC is an important tool to assist with the selection of the linear and matrix linear predictors components, but it is less useful for comparing models fitted using different link, variance or covariance functions.The mcglm package implements the SIC for selection of the linear and matrix linear predictors components, by the functions mc sic and mc sic covariance.In this case, the SIC values indicate that all components, excepting the value of the daily hospitalizations with lag 1 at neighboring regions should be included in the model (SIC < 0).10444.49 30 20948.98 20954.1 21135.62Table 6: Goodness of non-nested models, defined from different matrix linear predictors h(Ω(τ t )).
To compare the goodness of non-nested models, the function gof provides the pseudo Akaike information criterion (pAIC), the pseudo Bayesian information criterion (pBIC) and the pseudo Kullback-Leibler information criterion (pKLIC).We are going to use it to select the bet structure for the matrix linear predictor.Results can be seen in Table 6.We will fit the model considering Figure 9 shows 5-days predictions obtained with the resulting model, jointly with the fitted and the true observed values.In this case we have obtained a RMSE equal to 5.78.Although the RMSE obtained is similar to the obtained with the other models/packages, we can see in Fig. 9 that in this case there are health departments where the model provides neither a good fit of the observations nor precise predictions, while the fits and predictions of others health departments are very accurate.With the information available, we have not found a model with better fits and predictions for all the health departments.

Bayesian spatiotemporal models
In this section we apply different models included in CARBayesST package to COVID-19 data given in Section 3.1.For the reasons explained in this section we use the Poisson log-linear model for Y kt , Eq. 10.Overdispersion cannot be controlled with this package.Although that could be regarded as a handicap, as we will see in the results sections the adjusted and forecasted values are quite good.
In order to induce spatial smoothness between the random effects, we use the binary adjacency matrix W used in previous sections.Element w ik =1 if areas i and k share a common border and w ik =0 otherwise, whereas w ii =0 for all i.
Taking into account again the characteristics of our data, the plots shown in Section 3.1 and the fact that our main objective is the prediction of future observations, just two of the spatiotemporal correlation structures included in this package (see Table 2 have been fitted for ψ kt in Eq. 11: the CARar and CAR adaptative structures.Neither spatially varying linear time trends model and spatiotemporal clustering models are appropriated for our data and CARanova and CARsepspatial strucutres assume a symmetric temporal correlation that not allow to obtain future predictions.In both cases the spatiotemporal structure is modelled with a multivariate first order autoregressive process with a spatially correlated precision matrix: . ., φ Kt ) is the vector of random effects for time period t, the precision Q(W, ρ S ) corresponds to the CAR models proposed in [Leroux et al.(2000)Leroux, Lei, and Breslow] and has the expression: 1 is the K × 1 vector of ones and I the K × K identity matrix.(ρ S , ρ T ) respectively control the levels of spatial and temporal autocorrelation, with values of 0 corresponding to independence while a value of 1 corresponds to strong autocorrelation.
The random effects from CARar have a single level of spatial dependence that is controlled by the parameter ρ S .That means that all pairs of adjacent areal units will have the same degree of autocorrelation: strongly if ρ S is close to one, while no spatial dependence will exist if ρ S is close to zero.
The CARadaptative model allows for localized spatial autocorrelation, that is, it allows to be stronger in some parts of the study region.This fact could be adequate for our data because it would be possible that spatial correlation between adjacent health departments to be correlated or conditionally independent, depending on, for example, whether these departments are in the same big city or not or according to the socioeconomic characteristics of its inhabitants.
The CARadaptative model allows this spatial autocorrelation heterogeneity by allowing spatially neighbouring random effects which is achieved by modelling the non-zero elements of the neighbourhood matrix W as unknown parameters, rather than assuming they are fixed constants.
With respect to covariates to be included on the mean, for the same reasons explained in the previous sections, we tried again two possibilities.Case 1: the smoothed number of new positive cases at a lag 9. Case 2: the smoothed number of positive new cases at a lag 9 plus the smoothed number of new positive cases at a lag 5.
Additionally, as a basic district specific measure of disease incidence, the population fraction e k has been also included as an offset.
Inference for all models is based on thinning (by 10) 60000 posterior samples including a burn-in period of a further 1000 samples.Plots of convergence assured that it was reached in all cases.
The Table 7 displays the overall fit of each model by presenting the deviance information criterion (DIC) and the effective number of parameters (pd).It Table 8 shows that the adaptive model fits the data better than the pure AR model, with reductions in DIC in both cases.
The medians of the posterior distribution of each parameter and its 95 % credible intervals are displayed in Table 8.
As can be seen, the estimated parameters of spatial and temporal correlations shows in all cases a strong spatial and temporal correlation.Regarding the covariates both are significative.
Finally, with respect to the number of step-changes between two spatially adjacent areas detected in the CARadaptative models only one change step is detected in case 1 meanwhile no changes are detected in case 2.
From all these data we can conclude that there are no large differences between the four fitted models.Since our main objective is the prediction, we calculate the mean square error of the prediction up to 5 days for the four cases.They can be found in the last row of Table 7 and again not great differences are observed.In this case the best result is obtained with the CARadaptative model with both covariates: positive cases at lag 9 and 5. Adjusted, observed and predicted values of this models can be seen in Figure 10.

Discussion
The aim of this work is to review three different spatiotemporal models for count data, implemented in R packages, and test their performance on an actual case For three models tested, significative positive parameters for the covariates are obtained.Those parameters that indicate temporal correlation show high values for all models.Regarding to the spatial correlation, there is difference between the models depending on how the spatial neighbourhood has been included in the model.
Due to these different statistical methodologies, the different packages provide different goodness of fit measures, there being no measure in common between them.So, as the final aim of our case study has been the short term prediction of the evolution of the hospitalisations in the different spatial areas, the mean squared prediction error (RMSE) has been obtained in all cases.We can achieve really good results using each of the packages reviewed.The RMSE of the best model in each package ranges from 5.4 (using CARBayesST package) until 5.78 (using mcglm package).There is no much difference between them.These results are very promising and can promote the use of these models to help in short-term government decision making regarding preventive measures against the hospital collapse.Additionally, they can provide tools to know in advance if it is necessary to expand hospital capacity, in terms of beds and/or workers.
Our objective in this work has not been to say that one package or one type of model is better than others, but to show possibilities that can be used in practice to analyse this type of data.The choice of the type of model to use will depend on the application at hand.

Figure 1 :
Figure 1: Spatial distribution of the 24 health departments with their population, unit 100.000

Figure 2 :
Figure 2: Spatial distribution of mean daily incidence per 100.000people in Valencian Community for weeks 27,38,49, since the pandemics begins.

Figure 3 :
Figure 3: Time series plot for (a) new positive cases; (b) daily hospitalisations, one time point for each of the 24 health department.Fig (c) shows the sum of the 24 health departments each day.

Figure 4 :
Figure 4: Cross-correlogram, between daily new positives and daily hospitalisations.It shows the Pearson correlation between both series, as a function of the displacement (days) of daily positives relative to the daily hospitalisations.

Figure 5 :
Figure 5: Data dependency from the day of the week: a) Total of daily positives vs day of the week, b) Daily Hospitalisations vs day of the week.c) Mean of daily positives in the last 4 days vs day of the week.
Spatial adjacency matrix of the health departments used to illustrate the paper.

Figure 6 :
Figure 6: Time series of new positives (in red) vs hospitalisations (in blue) in eight different health departments, with a time lag of 9 days in the series of new positive cases, and a smooth of 4 days.
kt in each model compartment.Population fraction, population density, border effects, etc, can be used as covariates.The region-specific intercepts b (.) in terms of spatiotemporal correlations and X t = (X 1t , . . ., X Kt ) covariates we can use the multivariate covariance generalised linear model (McGLMs) introduced in [Bonat and Jørgensen(2016)].This model is a general and flexible statistical model to deal with multivariate count data that explicitly models the marginal covariance matrix combining a covariance link function and a matrix linear predictor composed of known matrices.Let Y K×N = {Y 1 , • • • , Y N } be the outcome matrix and let M K×N = {µ 1 , • • • , µ N } denote the corresponding matrix of expected values.
[Lee et al.(2018)Lee, Rushworth, and Napier].These methods have had a remarkable development, especially in disease mapping, thanks to the availability of estimation methods based on Monte Carlo Markov Chain.With respect to the software packages for implementing these models, although a great quantity of software packages can be found for implementing purely spatial models, such as BUGS[Lunn et al.(2009)Lunn, Spiegelhalter, Thomas, and Best] and R-INLA [Rue et al.(2009)Rue, Martino, and Chopin], software for spatiotemporal modelling is much less well developed, and mainly focused on geostatistical data.This was the motivation to developing the R package CAR-BayesST [Lee et al.(2018)Lee, Rushworth, and Napier].This package can fit several models for count data with different spatiotemporal structures.A useful tutorial is provided by [Lee(2020)].CARBayesST package has been recently used to study the case-fatality risk by COVID-19 in Colombia ([Polo et al.(2020)Polo, Acosta, Soler-Tovar, Villa ST.CARlinear ([Bernardinelli et al.(1995)Bernardinelli, Clayton, Pascutto, Montomoli, Ghislandi, and Songi   ST.CARanova ([Knorr-Held(2000)]) ST.CARsepspatial ([Napier et al.(2016)Napier, Lee, Robertson, Lawson, and Pollock]) ST.CARar ([Rushworth et al.(2014)Rushworth, Lee, and Mitchell]) ST.CARadaptive ([Rushworth et al.(2017)Rushworth, Lee, and Sarran]) ST.CARlocalised [Lee and Lawson(2016)])Table 2: Summary of the models available in the CARBayesST package.

Figure 8 :
Figure 8: Observed values (in blue) jointly with fitted values (in brown) and predictions (in green).

Figure 9 :
Figure 9: Observed values (in blue) jointly with fitted values (in brown) and predictions (in green).

Figure 10 :
Figure 10: Fitted and predicted values in brown and green along with the observed counts in blue.

Table 3 :
Estimations, goodness of fit measurements and contribution of the endemic and autoregressive components to the global fit.

Table 4 :
Goodness of fit comparison between Poisson and two Negative Binomial distributions.

Table 5 :
Goodness of fit measurements and contribution of the endemic and autoregressive components to the global fit.

Table 7 :
Deviance information criterion (DIC) and the effective number of parameters (pD) for each model and scenario.