Modeling the Spatial and Temporal Spread of COVID-19 in Poland Based on a Spatial Interaction Model

: This article describes an original methodology for integrating global SIR-like epidemic models with spatial interaction models, which enables the forecasting of COVID-19 dynamics in Poland through time and space. Mobility level, estimated by the regional population density and distances among inhabitants, was the determining variable in the spatial interaction model. The spatiotemporal diffusion model, which allows the temporal prediction of case counts and the possibility of determining their spatial distribution, made it possible to forecast the dynamics of the COVID-19 pandemic at a regional level in Poland. This model was used to predict incidence in 380 counties in Poland, which represents a much more detailed modeling than NUTS 3 according to the widely used geocoding standard Nomenclature of Territorial Units for Statistics. The research covered the entire territory of Poland in seven weeks of early 2021, just before the start of vaccination in Poland. The results were veriﬁed using ofﬁcial epidemiological data collected by sanitary and epidemiological stations. As the conducted analyses show, the application of the approach proposed in the article, integrating epidemiological models with spatial interaction models, especially unconstrained gravity models and destination (attraction) constrained models, leads to obtaining almost 90% of the coefﬁcient of determination, which reﬂects the quality of the model’s ﬁt with the spatiotemporal distribution of the validation data.


Introduction
COVID-19 is an infectious disease caused by the newly discovered coronavirus, SARS-CoV-2. The virus was first discovered in Wuhan, China, and has quickly spread all around the world, infecting populations in most countries. The spread of COVID-19 is inherently a spatial process; therefore, attempts to combat its spread would benefit from modeling spatial interactions. This article expands on a previous study using gravity models to model the spatial development of the COVID-19 pandemic in Poland [1]. The present study describes the parameterization method of epidemiological models, and the development of spatial interaction models to simulate the pandemic development and determine the spatial (and temporal) distribution of the number of cases in Poland. Reference data on the number of cases in individual powiats (county-level administrative districts) provided by sanitary and epidemiological stations have confirmed the reliability of the prediction. However, the complexity of factors influencing the dissemination of SARS-CoV-2 poses COVID-19 is far more transmissible, spreading from a single city to the entire country in just 30 days [28]. The causes of this difference are thought to be manifold and may include social and demographic factors, local government containment strategies, and differences in the dynamics of transmission between these two coronaviruses [29]. There have been many attempts to build multivariate regression models to explain the number of cases in each county in ongoing studies. Population density, road network density, industrial and residential buildings, have been among the independent variables studied in such models. Typically, three groups of risk factors that shape the volume, intensity and spatial dispersion of diseases are taken into account when creating models: individual risk, natural environment (climatic) risk, and social and demographic risk. However, there is no convincing evidence regarding the rank-order of which factors are most important.
The aim of this study was to construct a model of the geographic dissemination of COVID-19 in Poland based on spatial interaction models. We put forward the research hypothesis that it is possible to achieve a good-quality model, assuming a single independent variable: the mass of the population in administrative units. This approach allows for synergies between the models, enabling the time forecast of case counts (e.g., the classical SIR epidemiological model) and models of spatial interactions. The combined application of these approaches enables a reliable prediction of the number of infected people in administrative divisions. Our model is intended as a support tool that includes a geographic spread to the conceptualization of disease spread. It could be used as an a priori tool for estimating the spatial locations of infection outbreaks, as well as a way of evaluating future infection rates. An important advantage of the constructed model is its spatial aspect; i.e., it enables the evaluation of the potential spatial differentiation of the infected number of people within the set of observed spatial units (for instance, counties in Poland), which is especially important in the case of COVID-19 and similar diseases.
This approach provides useful insights simulating spatial networks in a flexible way, reducing the burden of individual simulations and predictions for each individual-level observation. While it represents a simplification of the spatial spread of the infection of pandemics, it nevertheless provides the possibility to extend into metapopulation models and spatial numerical simulations of epidemics in complex networks [30,31]. In this way, models can include distances. All types of pre-existing mathematical models can be expanded to be solved, maximized, or estimated using the adopted spatial analysis approach, reducing the uncertainty of empirical data [32].

SIR and SIR-F Models
Well-known deterministic compartment susceptible-infected-recovered [33] epidemic models based on viral transmission (in this case SARS-CoV-2) within a population (N) have been used since the beginning of the 20th century, and are described using a set of ODE (ordinary differential) equations: S + I + R = N for all t (2) where S, I, and R are the numbers of susceptible, infected, and recovered individuals, respectively; N is the size of the population; β is the contact rate capturing the 'interaction frequency', the rate of flow from susceptible to infected; γ is the recovery rate, the rate of flow from infected to recovered or dead; and t denotes time [34]. There are several variations of models derived from the basic SIR model. In this paper, we apply the SIR-F model, which partitions the R (recovered) population into recovered and failed (deceased) subpopulations.

Gravity Model
The general formula of population gravity between two certain points is described as T ij = f (O i ,D j , d ij ), where T ij is the volume of interactions (contacts), O i is the flow emanating from the origin i and representing the push factor (e.g., emissiveness), D j is the flow terminating at the destination point j and representing the pull factor (e.g., attraction), and d ij is the distance between i and j representing the spatial separation (e.g., the Euclidean or road distance, time or cost function). Given one county, summing the individual gravity volumes over all columns or rows gives the population potential for the origin or destination, respectively (Table 1, O: origins of contacts, D: destination of contacts). Additional parameters, such as the distance decay function (d ij β , where β is the exponent) between the origin (A) and destination (B), serve to optimize gravity models. Considering these parameters and the entropy-maximizing framework allows one to arrive at the following formulas defining the influence of independent variables on "emissiveness" (V) and attractiveness (W) (Equations (3)-(6) [35]): Unconstrained gravity model: Production (origin) constrained model: Destination (attraction) constrained model: Doubly constrained model: The formula must be calibrated based on the interplay coefficients resulting from the gravity concept. For each pair of counties i and j, we determined the interplay coefficient I ij as follows: While Equation (7) is the typical gravity equation assuming two counties located at distance d ij , in Equation (8), we assumed that the interplay coefficient I ii is the number of inhabitants divided by the square of the radius of a circle with an area equal to the area of the county. Spatial interaction model fit statistics utilize a replacement of the coefficient of determination (pseudo R 2 ) based on the likelihood function and its counterpart and an adjusted coefficient of determination (adjusted pseudo R 2 ), which can be interpreted similarlya maximum value close to 1 denotes better model fit [36]. Model complexity is assessed using the Akaike [37] information criterion (AIC), derived for information theory. Lower AIC values indicate better model fit as an evaluation of the volume of information lost [37]. The typically utilized statistical parameter is the standardized root-mean-square error (SRMSE) of the observed and simulated volume of interactions (in this case, infections). A higher value of SRMSE indicates decreased model assessment (starting from zero as the best fit). The SpInt library utilizes the Sorensen similarity index (SSI) in a similar way to the SRMSE. The values of the SSI can reach a maximum of one, denoting perfect model fit, while values close to zero are the worst [35].
To automate the calculation process, an authorial Python script was developed, which enabled all calculations to be performed for 380 counties in Poland.
The open-source PySal SpInt Python library [38] and official registered COVID-19 data were used within the framework of the proposed approach to trace the COVID-19 pandemic at the county level in Poland [39].
The operationalization allows for any spatial resolution and temporal resolution, as well as demographic stratification.

The Adopted Methodology
Classic epidemic models, such as SIR, SIR-F, and SEIR, enable the reliable forecasting of the number of susceptible individuals, infected individuals, and recovered individuals, at specific time intervals and predetermined model parameters (the key parameter is the basic reproductive rate). However, the assumptions and limitations of SIR-type epidemic models are well known [7]. While they enable the reliable estimation of temporal changes, SIR-type models do not allow for the analysis of the spatial distribution of the number of cases, deaths, or severe cases.
The COVID-19 pandemic outbreak can be studied using methods of spatial diffusion, as well as potential models (spatial interaction modelling) and visualizing using geographic information systems software, due to the following statistical characteristics to be analyzed: population, metapopulation models, spatial distances, and interactions (contacts) [40][41][42]. The inventor of spatial interaction modeling (A. Wilson) also introduced, as independent variables, the probability of contacts and spatial dimension (distance or its functional measure) [43].
The spatial interaction model that we apply may be regarded as an analogy to Newton's theory of gravitational interaction. In the "gravitational" approach, two administrative units interact directly, proportional to the product of their "masses", and inversely proportional to the square of their "distances." However, the concepts of "mass" and "distance" are abstract and require clarification in the context of epidemic models. In general, the distance can be interpreted as a road distance or time distance that considers various methods of transportation (car, bus, train) and different road qualities or speed limits on roads or tracks. For the purpose of this work, we assume that the "distance" between counties is defined by the length of the segment connecting the centroids (centers of gravity) of counties in Poland. Similarly, there are many ways to interpret the concept of the "mass" of individual administrative units. We tested several measures of "mass", including population density, relative population size, and economic attractiveness expressed as the number of jobs or the budget of individual administrative units. However, it has become apparent that the population volume of a county is the variable resulting in the highest coefficient of determination for the quality of the holistic model, explaining the spatiotemporal distribution of the number of COVID-19 cases in individual counties.
Based on the gravity concept, we determined the interplay coefficient of each pair of counties as the product of their population numbers divided by the square of the distance between them. The data used encompassed 380 counties and their populations according to the Statistical Yearbook issued by Statistics Poland, which results in 72,390 (triangular matrix with diagonal) distances between the centroids of individual populations and corresponding interplay coefficients. The collection and initial processing of data, including the calculation of interplay coefficients, was implemented using Python in the ArcGIS environment.
Then, analogous to the notion of contingency tables in statistics, we estimated the table of probabilities of new cases occurring in county j, due to interactions with county i for elements in row i and column j of the table. In general, such a table is unknown, and the aim of model calibration is to recreate the contingency table based on available statistics. Depending on the available historical data used for model calibration, two possible cases can be considered: (1) only the total number of cases in the country is available and (2) the number of cases in each county is known. These two cases correspond to special cases of spatial interaction models, a so-called unconstrained gravity model and a production (destination) attraction model, respectively. We considered both cases in calibrating two models, again based on the statistical data gathered by sanitary and epidemiological stations in 380 counties. We applied the calibration framework SpInt in the PySal python library.
First, we verified the hypothesis that the population size in each county is a proper explanatory variable. For this purpose, we calibrated the destination-constrained model that utilizes detailed data for each county. Since such detailed data may not be known, in the next step, we calibrated an unconstrained model that involves only a cumulative number of cases.
We applied the SIR-F epidemic model to forecast the course of the epidemic at the country level. The SIR-F model was calibrated and tested on the basis of data gathered by sanitary and epidemiological stations, and published on government websites [30]. The calibration of the SIR-F and spatial interaction models was performed for data from the 11th week of 2020 to the 6th week of 2021, inclusively. Then, we applied the resulting models to obtain the temporal and spatial prediction of the number of susceptible individuals, infected individuals, and recovered individuals in the 6th and 7th weeks of 2021.

Data
The sources of the number of registered infections, recoveries and deaths are three sets of data collected by the Polish Ministry of Health. The cumulative numbers of infections (stacked) by county in Poland from weeks 11 to 41 of 2020 are presented on the chart and maps (Figures 1-3; map classification using geometrical interval).
These figures provide some spatial trends of COVID-19 diffusion during the observed time. Social events such as presidential elections, holidays for students and children, etc., could locally influence the rate of infections in different counties. Moreover, the study period used in this research was marked by major changes in intervention measures and testing strategies. Data analysis reveals changes in the number of cases over time and spatial trends, illustrating the government's policy on introducing restrictions and limitations. Data showing the government's measures are available on the Ministry of Health's website and Twitter [44]. Figure 3 shows changes in the epidemic in Poland in weeks 48-52 (the so-called second wave). The concentration of the number of cases in large cities (capitals of voivodeships and powiats) is visible; thus, the spatial differentiation of the development of the COVID-19 pandemic in Poland is evident. These figures provide some spatial trends of COVID-19 diffusion during the observed time. Social events such as presidential elections, holidays for students and children, etc., could locally influence the rate of infections in different counties. Moreover, the study period used in this research was marked by major changes in intervention measures and testing strategies. Data analysis reveals changes in the number of cases over time and spatial trends, illustrating the government's policy on introducing restrictions and limitations. Data showing the government's measures are available on the Ministry of Health's website and Twitter [46]. Figure 3 shows changes in the epidemic in Poland in weeks 48-52 (the so-called second wave). The concentration of the number of cases in large cities (capitals of voivodeships and powiats) is visible; thus, the spatial differentiation of the development of the COVID-19 pandemic in Poland is evident. Figure 2 presents the starting time (March 2020) and first (Spring) wave of the COVID-19 pandemic in Poland. Starting from patient 0 in Poland (the county Słubice, Western Poland, near German-Polish border), the pandemic spread to the most densely populated areas of Silesia and Lesser Poland (Małopolska), as well as in the largest cities and almost all counties with border crossings. During the summer holiday months, the pandemic slowed down modestly (see Figure 1), and people generally understated the hazard of the virus and were more mobile, behaving as they usually would during summer, i.e., visiting the north of Poland, its lake districts, and the beaches on the Baltic Sea. Figure 3 presents the situation during the second wave (Autumn 2020). All urbanized regions and suburban areas of cities were foci of pandemic outbreaks and, moreover, Northern areas of Poland, although less populated, but wetter and cooler, are also touched by the spread of COVID-19. Other social and political phenomena (anti-and pro-abortion demonstrations) also likely influenced the disease dynamics in Poland's largest cities in October and November 2020, despite the official "DDM" policies (distance, disinfection, masks). Modeling pandemic spread required the collection of spatial data (admin divisions of the country) and demographic data, characterizing the population level of population mobility. These data were obtained from the Head Office of G and Cartography [45] and Statistics Poland [46]. Both standard GIS tools (ArcG QGIS) and dedicated programming libraries developed in Python were used analysis. Modeling pandemic spread required the collection of spatial data (admin divisions of the country) and demographic data, characterizing the population level of population mobility. These data were obtained from the Head Office of G and Cartography [45] and Statistics Poland [46]. Both standard GIS tools (ArcG QGIS) and dedicated programming libraries developed in Python were used analysis.  , as well as in the largest cities and almost all counties with border crossings. During the summer holiday months, the pandemic slowed down modestly (see Figure 1), and people generally understated the hazard of the virus and were more mobile, behaving as they usually would during summer, i.e., visiting the north of Poland, its lake districts, and the beaches on the Baltic Sea. Figure 3 presents the situation during the second wave (Autumn 2020). All urbanized regions and suburban areas of cities were foci of pandemic outbreaks and, moreover, Northern areas of Poland, although less populated, but wetter and cooler, are also touched by the spread of COVID-19. Other social and political phenomena (anti-and pro-abortion demonstrations) also likely influenced the disease dynamics in Poland's largest cities in October and November 2020, despite the official "DDM" policies (distance, disinfection, masks).
Modeling pandemic spread required the collection of spatial data (administrative divisions of the country) and demographic data, characterizing the population and the level of population mobility. These data were obtained from the Head Office of Geodesy and Cartography [45] and Statistics Poland [46]. Both standard GIS tools (ArcGIS Pro, QGIS) and dedicated programming libraries developed in Python were used for data analysis.

SIR-F Model Calibration
Using COVID-19 data from Poland in subsequent weeks, which were made available on government websites and in the CovsirPhy library [47,48], the values of the parameters of the SIR-F epidemiological model were estimated [49,50]. Information on the number of COVID-19 cases in Poland at the national level is presented in Figure 4.

SIR-F model Calibration
Using COVID-19 data from Poland in subsequent weeks, which were made available on government websites and in the CovsirPhy library [47,48], the values of the parameters of the SIR-F epidemiological model were estimated [49,50]. Information on the number of COVID-19 cases in Poland at the national level is presented in Figure 4. It should be noted that data concerning the infected population involve only a fraction of the whole infected population, and only the registered fraction of the tested population is disclosed statistically. According to an officially published daily report (as of 7 February 2021), a total of 204,007 people were actually infected (1,550,255 since 4 March 2020), 1,307,161 recovered, and 39,087 failed (died, Figure 5).
Example simulation results obtained from the calibrated SIR-F model are presented in Figures 5 and 6. It should be noted that data concerning the infected population involve only a fraction of the whole infected population, and only the registered fraction of the tested population is disclosed statistically. According to an officially published daily report (as of 7 February 2021), a total of 204,007 people were actually infected (1,550,255 since 4 March 2020), 1,307,161 recovered, and 39,087 failed (died, Figure 5).

Model Calibration-Spatial Interaction
The general spatial interaction models used in this article were calibrated by determining parameters based on the analysis of data made available by district (powiat) sanitary and epidemiological stations. The conducted analyses have shown a reliable (in a statistical sense) correlation between the number of inhabitants of individual powiats and the number of cases.

Model Calibration-Spatial Interaction
The general spatial interaction models used in this article were calibrated by determining parameters based on the analysis of data made available by district (powiat) sanitary and epidemiological stations. The conducted analyses have shown a reliable (in a statistical sense) correlation between the number of inhabitants of individual powiats and the number of cases.
The spatial interaction models were calibrated using the generalized linear model, obtaining the logarithmic Poisson model. The authors compared the results with reference data, i.e., the number of cases in individual weeks in individual powiats in Poland. With this approach, it was possible to assess the reliability of the regression model parameters. The authors followed a similar procedure for the unconstrained gravity (Table 1, model I in Table 2) and destination (Table 1, attraction model III in Table 2) models, using the population statistics as the explanatory variables for a sequence of randomly chosen weeks in 2020.

Model Verification
A verification of the destination-constrained spatial interaction models was performed for the 6th and 7th weeks of 2021 (1-7 February) using the acquired empirical aggregated data of COVID-19 infections by county. The general formula of the destination-constrained model (III) for 6th and 7th weeks of 2021 (Equation (3a,b) Table 2. Spatial interaction model fit statistics (* spatial verification; ** spatial prediction).

Model Unconstrained Gravity Model (I) Destination (Attraction) Constrained Model (III)
Week

Model Verification
A verification of the destination-constrained spatial interaction models was performed for the 6th and 7th weeks of 2021 (1-7 February) using the acquired empirical aggregated data of COVID-19 infections by county. The general formula of the destination-constrained model (III) for 6th and 7th weeks of 2021 (Equation (3a,b) Table 2. Spatial interaction model fit statistics (* spatial verification; ** spatial prediction).

Model Unconstrained Gravity Model (I) Destination (Attraction) Constrained Model (III)
Week

Model Verification
A verification of the destination-constrained spatial interaction models was performed for the 6th and 7th weeks of 2021 (1-7 February) using the acquired empirical aggregated data of COVID-19 infections by county. The general formula of the destination-constrained model (III) for 6th and 7th weeks of 2021 (Equation (3a,b) Table 2. Spatial interaction model fit statistics (* spatial verification; ** spatial prediction).

Model Unconstrained Gravity Model (I) Destination (Attraction) Constrained Model (III)
Week

Model Verification
A verification of the destination-constrained spatial interaction models was performed for the 6th and 7th weeks of 2021 (1-7 February) using the acquired empirical aggregated data of COVID-19 infections by county. The general formula of the destination-constrained model (III) for 6th and 7th weeks of 2021 (Equation (3a,b) Table 2. Spatial interaction model fit statistics (* spatial verification; ** spatial prediction).

Model Unconstrained Gravity Model (I) Destination (Attraction) Constrained Model (III)
Week

Model Verification
A verification of the destination-constrained spatial interaction models was performed for the 6th and 7th weeks of 2021 (1-7 February) using the acquired empirical aggregated data of COVID-19 infections by county. The general formula of the destination-constrained model (III) for 6th and 7th weeks of 2021 (Equation (3a,b) Table 2. Spatial interaction model fit statistics (* spatial verification; ** spatial prediction).

Model Unconstrained Gravity Model (I) Destination (Attraction) Constrained Model (III)
Week

Model Verification
A verification of the destination-constrained spatial interaction models was performed for the 6th and 7th weeks of 2021 (1-7 February) using the acquired empirical aggregated data of COVID-19 infections by county. The general formula of the destination-constrained model (III) for 6th and 7th weeks of 2021 (Equation (3a,b)

Model Verification
A verification of the destination-constrained spatial interaction models was performed for the 6th and 7th weeks of 2021 (1-7 February) using the acquired empirical aggregated data of COVID-19 infections by county. The general formula of the destination-constrained model (III) for 6th and 7th weeks of 2021 (Equations (9) and (10)): T ij = exp −9.71866096 + 0.63624388 ln ln (V i ) − 0.12195776 − (−0.1101434) ln d ij (10) where T ij is the modeled number of infections in count j due to interaction with county i, V i is the population of county i, and d ij is the distance between county i and j; for the county itself, the distance taken into account was the radius of a circle of equal area to the county area. The coefficient of determination R2 equaled 0.85, meaning that the developed model explains 85% of the variance of the phenomenon and the actual number of cases. Based on the resultant model, the authors developed a map of the residuals of regression (Figure 7a-d indicating a proportion of the number of cases not explained by the model. Table 2 provides data on the quality of the model's fit. The determined parameter β = −0.11 shows that, in a given powiat, the average increase in the incidence will amount to 10.4%, with a population increase per unit of distance (per 1 km). area.
The coefficient of determination R2 equaled 0.85, meaning that the developed model explains 85% of the variance of the phenomenon and the actual number of cases. Based on the resultant model, the authors developed a map of the residuals of regression (Figure 7a-d indicating a proportion of the number of cases not explained by the model. Table 2 provides data on the quality of the model's fit. The determined parameter β = -0.11 shows that, in a given powiat, the average increase in the incidence will amount to 10.4%, with a population increase per unit of distance (per 1 km). The results (Figure 7a) indicate that the developed model underestimates the actual number of infections. However, the obtained statistical data indicate the similar reliability estimates of unconstrained gravity and destination-constrained spatial interaction models.
These models fit the statistics for different time intervals, to prove that the outbreak of the COVID-19 pandemic targeted the most populated areas over time. These values are The results (Figure 7a) indicate that the developed model underestimates the actual number of infections. However, the obtained statistical data indicate the similar reliability estimates of unconstrained gravity and destination-constrained spatial interaction models.
These models fit the statistics for different time intervals, to prove that the outbreak of the COVID-19 pandemic targeted the most populated areas over time. These values are more reliable than the data concerning one-week intervals (Table 3).

Simulation
Our model can be combined with the SIR-F model for spatial predictions. The approach is to forecast the total number of cases and then disaggregate the total number of cases to obtain the spatial distribution. This leads us to the new concept of combining the classic epidemic models with spatial interaction models proposed and developed by O'Kelly [47] and Oshan [35]. The authors used the results of the predictive simulation obtained by applying the SIR-F model to the territory of Poland to estimate the number of infections in individual powiats. Due to the unknown spatial differentiation of infections within powiats, the authors used the unconstrained gravity model for this analysis The total numbers of simulated (absolute) susceptible, infected, failed, and recovered cases in Poland from the SIR-F model are presented in Table 4. The simulation of the SIR-F model was performed on 7 February 2021; the estimation of the unconstrained gravity model was completed at the end of the week. The unconstrained gravity models took different forms for the first weeks (sixth and seventh weeks of 2021, Equations (11) and (12) and for the seven weeks ahead (Equation (13)). The modeling results are presented in the maps (Figure 8a-d).
T ij = exp −9.64087617 + 0.62260227 ln (V i ) + 0.62260227 ln ln W j − (−0.11015654) ln ln d ij (11) T ij = exp −9.71866096 + 0.62811812 ln (V i ) + 0.62811812 ln ln W j − (−0.1101434) ln ln d ij (12) T ij = exp −6.87909269 + 0.60225244 ln (V i ) + 0.60225244 ln ln W j − (−0.11531647) ln ln d ij (13) where T ij is the modeled number of infections, V i is the population of county i, W j is the population of county j, and d ij is the distance between counties i and j; for the county itself, the distance taken into account was the radius of a circle of area equal to the county area.

Discussion
The article aimed to develop a spatial interaction model using the "gravitational" approach, and test whether it adequately estimates the COVID-19 pandemic spread in Poland. The gravity model assumed the population size of individual powiats and their mutual distance as explanatory variables. The model that we developed has the advantage of combining the epidemic SIR model with the model of spatial interactions. Classical epidemic models such as SIR models provide a reliable estimate of changes over time, but do not allow for the spatial analysis of morbidity, death, or severe cases.

Discussion
The article aimed to develop a spatial interaction model using the "gravitational" approach, and test whether it adequately estimates the COVID-19 pandemic spread in Poland. The gravity model assumed the population size of individual powiats and their mutual distance as explanatory variables. The model that we developed has the advantage of combining the epidemic SIR model with the model of spatial interactions. Classical epidemic models such as SIR models provide a reliable estimate of changes over time, but do not allow for the spatial analysis of morbidity, death, or severe cases.
The results indicate that the unconstrained gravity model enables the prediction of the actual number of infections, as confirmed by the analysis carried out for the seventh week of 2021. The values obtained in the prediction are much higher than the official reports provided by district sanitary and epidemiological stations. This suggests that the actual number of cases is several times (even ten times) higher than the number of officially reported cases. The Interdisciplinary Center for Mathematical and Computational Modeling of the University of Warsaw (ICM UW) has obtained similar results [50]. Thus, the model presented in this study was calibrated in proportion to the estimated values obtained using the SIR-F model with proven reliability, while maintaining the spatial differentiation provided by the unconstrained gravity model.
Of note, the expected total number of infections in Poland was 42,206 cases (Table 4, Figure 8d) in the seventh week of 2020. The model calibrated using official data from the Ministry of Health was less reliable (Equation (13), Figure 8a,b), as it assumed that the epidemiological situation in Poland in the analyzed period was not changing rapidly.
There are at least two ways to verify the prediction of the number of infections in individual powiats. The SIR or SIR-F models were used at both the national and the district (powiat) levels, and the results were verified using a destination-constrained spatial interaction model at the district level, and an unconstrained gravity model at the national level. The promising results of this combined approach permit the hypothesis that the number of cured cases and the number of vaccinated cases should be considered additional variables in further studies. Taking these values into account would make it possible to reduce the study population susceptible to infection by the number of partially or permanently immunized. It is also possible to consider other social and demographic variables, such as age groups, employment status, and the occupational mobility of residents, among others. Including occupational mobility may be particularly crucial as it indicates the number of work-related contacts between individual administrative units. It would also be useful to consider more extended time series and to assess the credibility of the official epidemiological data provided by district sanitary and epidemiological stations to improve the prediction's quality and credibility. One limitation of this study is that the model does not include pre-existing health problems, such as cardiovascular disease or diabetes, both of which are linked to COVID-19 infections, and in particular, to the worst outcomes. Incorporating such risk factors into COVID-19 research could be beneficial [51,52].
Moreover, the implementation of our model allows the assessment of the actual number of infections, and therefore, it can be used to predict future COVID-19 infection volumes, which provides good forecasts of an overview of future pandemic conditions [53,54]. In particular, the model can be used by central and local government administration to plan and manage efficient activities to counter the spread of COVID-19 infections. It can help with decision-making and public health resource allocation by evaluating and visualizing spatial data on infection outbreaks.

Conclusions
Our study has shown that spatial interaction models are an adequate tool for modeling the spatial-temporal diffusion of the COVID-19 pandemic in Poland, and for determining the number of infections. The results can also be compared with research conducted in other countries. Moreover, numerical simulations have shown that, to determine the degree of mutual influence among administrative districts in terms of pandemic spread, data on the number of inhabitants and the distances between these units are sufficient. The model developed on the basis of these assumptions explains almost 90% of the temporal and spatial variability of the phenomenon.
The data set of COVID-19 infections used included the first year of the pandemic in Poland (12 months, i.e., from March 2020 to February 2021), and there was no additional information differentiating variants of SARS-CoV-2 virus infections. The modeling was aimed only at the confirmation of the simple heuristic empirical evidence of the association between the potential (maximized, simulated) number of possible contacts and the revealed number of infections. To our knowledge, this was the first systematic approach aimed only at finding the spatial regularities of the COVID-19 pandemic in Poland. The confirmation of tested spatial trends will make it possible to introduce additional dependent (mortality or recoveries) and independent variables (e.g., mean radiative temperatures, MRT). It seems that, the shorter the period of the simulation, the greater the fit of dependent variables simulated. On the other hand, as mentioned above, some variables, such as MRT, reveal their impact during longer simulations, and there is no significance for the spatial heterogeneity of infections for short-time simulations in Poland (in fact, MRT is a confounding variable in these short periods).
Our model includes a spatial component, which allowed for the assessment of the potential spatial differentiation of the infected number of persons within a set of observed spatial units (for example, counties in Poland), which is essential in the case of COVID-19 surveillance. Hence, our model could help to assess the spatial locations of infection outbreaks, and determine the regularity of the spatial and temporal evolution of the pandemic. Therefore, the model we lay out here could be valuable to policymakers who need the geographic predictions of disease spread to make policy decisions. For example, the knowledge may provide a basis for implementing restrictions limited to specific geographic areas, based on the forecasts made with the model.
In future work, the authors aim to include the vaccination policy model as well as data on the occupational mobility of residents during the pandemic.