Human-Altered Landscapes and Climate to Predict Human Infectious Disease Hotspots

Background: Zoonotic diseases account for more than 70% of emerging infectious diseases (EIDs). Due to their increasing incidence and impact on global health and the economy, the emergence of zoonoses is a major public health challenge. Here, we use a biogeographic approach to predict future hotspots and determine the factors influencing disease emergence. We have focused on the following three viral disease groups of concern: Filoviridae, Coronaviridae, and Henipaviruses. Methods: We modelled presence–absence data in spatially explicit binomial and zero-inflation binomial logistic regressions with and without autoregression. Presence data were extracted from published studies for the three EID groups. Various environmental and demographical rasters were used to explain the distribution of the EIDs. True Skill Statistic and deviance parameters were used to compare the accuracy of the different models. Results: For each group of viruses, we were able to identify and map areas at high risk of disease emergence based on the spatial distribution of the disease reservoirs and hosts of the three viral groups. Common influencing factors of disease emergence were climatic covariates (minimum temperature and rainfall) and human-induced land modifications. Conclusions: Using topographical, climatic, and previous disease outbreak reports, we can identify and predict future high-risk areas for disease emergence and their specific underlying human and environmental drivers. We suggest that such a predictive approach to EIDs should be carefully considered in the development of active surveillance systems for pathogen emergence and epidemics at local and global scales.


Introduction
Shifting geographical footprints of pathogens and/or infected hosts due to ecosystem disruption can lead to emerging infectious diseases (EIDs) [1], of which COVID-19 is a current example at the center of international attention. Infectious diseases of animal origins (or zoonoses) account for more than 70% of the emerging infectious diseases in recent decades [2,3]. With the onset of SARS-CoV-2, anticipating the emergence of new pathogens has become the major public health challenge of our time. The spatial dynamics of zoonotic diseases make it difficult to study and detect hotspots as they depend on the spatial distribution of mammalian hosts and reservoirs and their interactions with humans [4]. Studies show that disease emergence is closely linked to human-modified landscapes, such as fragmented peri-urban forests, which disrupt the human-animal-environment interface [5][6][7]. Therefore, ecological processes, landscape alterations, especially agricultural development, changes in water ecosystems, deforestation and reforestation, [4,8] and climate change [9] are the main drivers of EIDs.
Natural landscape attributes, such as elevation, and human-modified landscape factors, such as deforestation and agricultural expansions, influence the spatial extent of the hosts and reservoirs. For example, high elevation and water bodies could function as geographical barriers by preventing host movement [10], while rapid landscape changes can increase the likelihood of contact with a reservoir host and, thus, promote the emergence of micro-organisms previously unknown to humans [11]. Implementing a biogeographic approach to detect EID risks requires the integration of a spatial complexity using geographic information systems (GISs) the distribution of EIDs and its immediate environment [12]. Mathematical models help to integrate spatial data to measure the predictive risk of disease emergence. Recent work has shown that using a spatial Bayesian framework on species distribution modeling (SDM) produces more accurate results when dealing with limited and clumped data, as well as it takes into account random effects, thus, providing better results on landscape factors influencing risks [13,14]. Hierarchical Bayesian SDM allows for the observations to be interpreted as the result of ecological processes, such as climate change and human-altered landscapes.
Here, we have mapped and compared the predictive risk of the following three viral epidemics of infectious diseases transmissible to humans that are under surveillance: Filoviridae, including Ebola and Marburg viral diseases (EVD & MVD), Coronaviridae, such as SARS, MERS, and COVID-19, and Henipaviruses (Nipah & Hendra diseases) of the Paramyoviridae family. We are also identifying potential hotspots and quantifying the importance of environmental factors, such as climate and human-induced landscape, on the emergence of viral infectious diseases.

Methods
Using a Bayesian framework, we modeled the presence-absence data using a two-stage spatially explicit hierarchical logistic regression [15]. First, we modeled the potential presence of EID occurrence in each cell grid (local) of bioclimatic and population density variables using disease-level coefficients and a spatial random effect. Once the models were fitted, we compared the different models based on parameter summaries and model deviance.

Data
When considering the occurrence data for SDM, the most common biases arise from the assumption of perfect detection and stationary hosts. Disease occurrence depends on the spatial distribution of the disease reservoir and intermediate hosts. We used zeroinflation binomial models [16] to recognize the imperfect detection of the occurrence. Autocorrelation and non-stationarity of mammalian hosts were accounted for using intrinsic conditional autoregressive models (iCAR) to avoid overestimations of the spatial inference and prediction in the models. We extracted occurrence data on the global occurrences of Filoviridae, Coronaviridae, and Henipavirus human disease outbreaks over time from WHO archives and published studies (Supplementary Table S1). In cases where the origin of the outbreaks was unclear, we restricted the analysis down to the region or district of origin. Laboratory outbreaks, outbreaks resulting in asymptomatic diseases (Reston Ebola disease in the Philippines), and domestic (Hendra outbreaks in horses) and wildlife (Ebola in gorilla populations) outbreaks were also excluded. We excluded the recent SARS-CoV-2 outbreak, as the origin of the infection remains controversial. The analysis and coordinates of the suspected origin of SARS-CoV-2 are included in the Supplementary Materials. We geo-referenced the sites of origin and constructed spatial buffers of 10 km around the geographical coordinates. For each group of viruses, we generated 500 random spatial points in the spatial buffers to constitute presence points. Pseudo-absences were randomly generated in the spatial extents of the reservoirs and intermediate hosts of each virus group in a 1:2 ratio, leading to 1000 absence points.

Bioclimatic and Population Predictors
We extracted climatic and elevation covariates, such as monthly maximum and minimum temperatures, rainfall, and elevation, from global Bioclim data [17] at a spatial resolution of 2.5 min or about 4.5 km at the equator. We used Moderate Resolution Imaging Spectroradiometer (MODIS) Land Cover Type (MCD12Q1) data [18] and land-use changes included in human activities from the Global Human Modification of Terrestrial Systems dataset [19]. Finally, we used the Gridded Population of the World, (GPWv4) for the human population density raster [20]. The raster layers were resampled to a fixed resolution of 4.5 km and stacked to a raster brick. We obtained the geographical distribution and spatial extent of the primary hosts and reservoir mammals from the IUCN red list [21]; the list of mammals is included in Supplementary Materials.

Model Fitting and Model Prediction
The models were fitted using the R package "hsdm", which uses a hierarchical Bayesian approach incorporating spatial dependency into the analysis by accounting for geographical clumping, which can be explained by biological (reservoir and host movement) or bioclimatic variables. In our study, we analyzed the data using hierarchical binomial and ZIB hierarchal SDM models with and without spatial autoregression. The ZIB models combine a binomial process for observability and a Bernoulli process for habitat suitability [22,23]. To model the spatial autocorrelation, we used a SDM with an intrinsic conditional autoregressive model (iCAR) [24]. The model is fitted using a Bayesian framework that allows the use of pre-validated predictors and the generation of parameter uncertainties. A mixture of topographical, climatic, landscape, and human-dependent predictors was used. The effect of a predictor was considered significant if it fell within a 95% confidence interval of the posterior distribution parameter. We used non-informative priors with a large variance of 10e6 (mean = 0), except for the spatial random effects, for which a weak informative prior: Uniform (min = 0, max = 10) was used. Two parallel MCMCs were run for each parameter and the convergence of the chains was checked visually using traceplots (Supplementary Information) and the Gelman and Ruben's convergence diagnostics. High-risk areas or hotspots for each viral EID group are predicted using a maximum sensitivity + specificity threshold selection and the accuracy of the model was determined by True Skill Statistic (TSS) [25].

Model Comparison
We used hierarchical SDM binomial, ZIB, binomial iCAR, and ZIB iCAR models to map the predictive risks of viral EIDs. To compare the models with respect to deviance, we constructed a geographical null model and later calculated the percentage of deviance explained by the null model. The spatial autoregression models performed better than their counterparts in the three groups (Table 1). In the Filoviridae prediction model, we found that 69% of the null deviance could be explained by the bioclimatic and population predictors using the ZIB model, which does not allow for an accurate identification of hotspots. In contrast, the inclusion of the random effects through iCAR allowed us to explain 100% of the null deviance, resulting in a perfect or saturated model. Similarly, ZIB models with an imperfect detection performed slightly better with Coronaviridae, with 74% and 100% of the null deviance explained without and with iCAR, respectively. However, with the Henipavirus EID models, the binomial iCAR model was slightly better, but we chose to summarize the ZIB with iCAR as the possibility of an imperfect detection of outbreak events remains a concern. In addition, the ZIB model was able to explain 71% of the null deviance using predictors for Henipavirus events, which is superior to the binomial model and facilitates model standardization and comparison. Spatial autocorrelation, MCMC traceplots, and TSS evolution for each model are available in the Supplementary Information.

Detection of EID Hotspots
The hotspots for filovirus diseases, EVDs and MVDs, were found in the forest regions of Uganda, Southern Sudan, and eastern parts of the Democratic Republic of Congo, with smaller areas in West and Central Africa, as far as Angola (Figure 1). The ZIB iCAR model had a high TSS of 0.99 due to the addition of a spatial autoregulation. High-risk regions for EIDs caused by Coronaviridae predominate across the Indian subcontinent, with some areas in China and Southeast Asia (

Significant Environmental Predictors
We observed that minimum temperature [

Discussion
Here, we show that ecological, climatic, and landscape factors could predict future hotspots of human viral disease emergence on a global scale and could, thus, serve as a basis for surveillance and early warning systems. For the three groups of viral diseases studied, we were able to map areas at high risk of disease emergence based on the spatial distribution of disease reservoirs and hosts, as well as WHO data on the distribution of each disease. We found that human-related factors, particularly the impact of population growth on humanmodified landscapes, were a common predictor of disease emergence. Filoviridae and Henipavirus outbreaks were also linked to rainfall, while Filoviridae and Coronaviridae emergences were favored by increases in the minimum nighttime temperatures. In addition, for Filoviridae, we noted the potential involvement of "unknown" variables (variables not used in this study). In Africa, these variables could relate to human behaviors, such as bushmeat consumption, that are often associated with EVD outbreaks [26], biodiversity loss, or even other bioclimatic covariates. Interestingly, coronavirus diseases are the only ones to be positively impacted by human population densities. Similarly, the hotspots of Henipaviruses depended on areas of low elevation and low rainfall.
Recent research has shown that the increased surface temperature and unpredictable seasonal rainfall due to climate change have an indirect effect on disease emergence through sudden ecological changes of their reservoir, loss of biodiversity, and migration of small mammal hosts [27,28]. For example, minimum temperature is the limiting factor for parasite development and vector distribution in malaria transmission [29] and other vector-borne disease epidemics, such as the Crimean Congo Hemorrhagic Fever and Zika [5,18]. Unfortunately, the research outside of vector-borne diseases is limited. However, this direct spatial dependence of disease emergence on minimum temperatures is worrying. Indeed, with climate change increasing the nighttime minimum temperatures and lengthening the frost-free season in most mid-and high latitude regions [30], there could be a potentially increase the latitudinal extent of infectious disease emergences.
We also found that low elevation and high rainfall have a significant influence on the distribution of Henipavirus outbreaks. Consistent with our results, studies have hypothesized that the emergence of Nipah in the lower Gangetic plains and low-lying marshes could be attributed to flooding, which leads to the destruction of mammalian habitats [31]. Rapid changes in ecological habitats due to human land-use changes lead to the starvation and migration of fruit bat species (Family Pteropodidae), reservoirs of Nipah virus, with contamination of fruit trees near human habitations and increased exposure to the pathogen [31][32][33]. Our results support this hypothesis of the increased risk of Nipah outbreaks associated with lowland plains, flooding, and rapid human-induced habitat changes.
EVDs and coronaviral diseases have also been found to be associated with humanmodified landscapes. EVDs have long been linked to landscape alterations, such as deforestation, mining, population growth, and land fragmentation [34][35][36]. Our results show that EVD outbreaks are not directly related to population densities, contrary to a recent study [36], but rather to the effects of population increases on human-modified landscapes, such as urbanization, deforestation, mining, and hunting. In contrast, population density was significantly related to coronavirus hotspots. Whether high population density leads to observer bias and, thus, to increased reporting of outbreaks needs to be examined in detail. The report of a SARS-like pneumonia in 2012 in miners in Tongguan, Mojiang [37] raises the issue of potentially unreported sporadic outbreaks in regions with limited populations. Studies show that the emergence of coronaviral diseases, such as SARS [38] and MERS [39], is directly related to exposure to body fluids from mammals raised in confined spaces for bushmeat and recreational activities, respectively. "Wild flavor" bushmeat restaurants and markets are often located in densely populated cites, where the demand for exotic proteins is high [26,40], and cases are, therefore, more likely to be reported in densely populated areas. In the case of MERS, there is an increase in reporting in large cities as camel owners seek treatment for respiratory distress in tertiary hospitals located in large cities and are, therefore, more likely to report cases [39]. The effect of population density is, however, crucial in the spread of epidemics and, therefore, remains an important factor in the detection of hotspots and active surveillance.
We suggest here the urgent need of alternatives to rapid land-use changes, such as deforestation, land fragmentation for agriculture and livestock, and changes in the cultural practices of bushmeat consumption. More importantly, the results highlight the major impact of increasing populations and human activities on land alterations and ecological changes, as well as the dependence of viral disease emergences on bioclimatic changes (minimum temperature, rainfall, low elevation, and flooded areas) at the global scale. We show the potential of using climatic, topographic, and population data to identify and predict areas at high risk of disease emergences. Although our study focused on three viral diseases of concern, we suggest that such biogeographic approach to predicting disease emergences should be considered and tested for other diseases under surveillance in a global active surveillance context.

Conclusions
Landscape (deforestation, urbanization, elevation, flooded areas, etc.), climatic (rainfall, temperature, etc.), and epidemic data were used in combination to estimate the potential role of human-induced landscape and climate changes on the emergence of Filoviridae, Coronaviridae, and Henipaviruses at a global scale. Such predictive approach for identifying regions at high risk of disease emergences should be considered for monitoring local and global pathogen emergences and for the identifying potential future epidemics. By recognizing the influence of predictive environmental factors on EIDs and adopting a predictive approach to disease emergence, unprecedented EID outbreaks could be made predictable.

Conflicts of Interest:
The authors declare no conflict of interest.