Forest Connectivity , Host Assemblage Characteristics of Local and Neighboring Counties , and Temperature Jointly Shape the Spatial Expansion of Lyme Disease in United States

Understanding risk factors for the spread of infectious diseases over time and across the landscape is critical for managing disease risk. While habitat connectivity and characteristics of local and neighboring animal (i.e., host) assemblages are known to influence the spread of diseases, the interactions among these factors remain poorly understood. In this study, we conducted a county-level analysis to test the effects of forest connectivity, together with the suitability of local assemblage (measured by the similarity of local host assemblage with neighboring assemblages) and the infection intensity of neighboring counties on the spatial expansion of Lyme disease in the United States. Our results suggested that both the similarity of local host assemblage and the infection intensity of neighboring counties were positively correlated with the probability of disease spread. Moreover, we found that increasing forest connectivity could facilitate the positive effect of neighbor infection intensity. In contrast, the effect size of the host assemblage similarity decreased with increasing connectivity, suggesting that host assemblage similarity was less effective in well-connected habitats. Our results thus indicate that habitat connectivity can indirectly influence disease spread by mediating the effects of other risk factors.


Introduction
Understanding the spatial expansion of infectious disease is important in predicting the new geographic areas that would have disease presence and is of importance in disease management [1].The spreading rate of disease is not the same over different directions in space, which indicates the variation in spatial suitability for both host and vector [2].This raises a question of what factor influences the nonrandom spread of disease over space.
In general, landscape structure can potentially affect the range expansion of disease due to its important role in determining host-pathogen interactions [3].For example, it has been considered that habitat connectivity is able to considerably influence the process of disease spread via its effect on the dispersal and distributions of hosts and vectors [2,4,5] and host community compositions [6,7].An increase in connectivity generally relates to an increase in the disease incidence or severity due to increased host movements and contact rates between host populations [8].When studying the spread of plague epizootics in Colorado, Stapp and co-workers [9] found that increasing landscape connectivity promoted plague outbreaks in prairie dog colonies.However, high connectivity may sometimes reduce disease risk by increasing host diversity, which can exert a dilution effect [7], or by generating large epidemics that deplete susceptible individuals [10].Thus, the role of habitat connectivity in disease spread still needs more investigations in the context of increasing habitat fragmentation worldwide.
Based on land cover data, we here investigated the effect of forest connectivity on the spread of Lyme disease in the United States.Lyme disease, caused by Borrelia burgdorferi and transmitted by Ixodes ticks [11], is recognized as an important emerging infectious disease [12].Lyme disease in the United States undergoes rapid range expansions from its primary foci in the Upper Midwest and Northeast [13].However, the rate of expansion varied within and among states.For example, some parts of states (i.e., counties) are rapidly invaded, whereas others remain disease-free for years [14].Not only the distribution of Lyme disease, but also that of tick vector are expanding [15,16].For example, Ixodes scapularis has moved northward along the eastern shores of Lake Michigan [16] and has also spread across the state from west to east in Wisconsin [17].The large geographic range of Lyme disease and I. scapularis combined with the nonrandom spatial patterns in colonization raise important questions about the mechanisms driving expansion of this emerging infectious disease.
The dispersal and establishment of I. scapularis in new areas partly depend on the possibilities of dispersal of the tick hosts (carrying ticks) which rely on both host movements and landscape characteristics [14].The former relates to the inherent capacities of host species to move (i.e., some species can more easily move over larger distances than others); the latter relates to parameters that help (e.g., corridors) or hinder (e.g., barriers) these movements.Because the dispersal probability of species that host I. scapularis partly drives the spread of Lyme disease [18,19], we predicted a positive relationship between forest connectivity and the probability of transition from disease-free to infected.
Besides landscape structure, the nonrandom expansion of disease via hosts or vectors can also be explained by characteristics of both the local and the neighboring host assemblages (e.g., all host species in a particular habitat/area), such as the suitability of local host assemblage (in disease-free areas) and the intensities of infected neighbors [20].Due to host specificity of pathogens or vectors [21], the suitability of local disease-free assemblage could be partly explained by its similarity with neighboring infected (and infectious) assemblages.Greater ecological similarity can translate to less resistance of local disease-free assemblage to pathogen invasion, thus indicates high suitability to disease spread [20].In addition, the infection intensity of infected neighbors might be positively related to the probability of disease spread.This is because high intensities of infected neighbors indicate high prevalence rates of the Lyme pathogen in ticks and hosts, which would enhance the chances that infected host or ticks invade to the disease-free county.In this study, we also explored the effects of these two factors on Lyme disease spread.We predict a positive relationship between the similarity of host assemblage in local disease-free area with neighboring infected areas and the probability that the disease-free area will become infected.We also predict that the infection intensity of neighboring infected areas relates positively to the probability of transition from disease-free to infected.
In addition, it is likely that the three factors described above (suitability of the assemblage in a disease-free area, the infection intensity of neighboring infected area, and the connectivity between disease-free and infected areas) also interact.For example, the effect of the intensity of infection in an area where the disease is present may depend on the connectivity level between disease-free and infected areas, so interactions between these factors are also expected to influence the geographical expansion of infectious disease.However, these interactions between connectivity and host assemblages similarity or infection intensity have rarely been studied.Therefore, in addition to the effects of these three factors, we also explicitly examined the importance of their two-way interactions to the transition in disease status (from disease-free to infected).Our results suggested that forest connectivity can indirectly influence disease spread by mediating the effects of assemblage suitability and neighbor infection intensity.

Lyme Disease Data
We first obtained the annual number of human Lyme disease cases in each county in the United States during 2000-2016 from the Centers for Disease Control and Prevention (CDC, Figure 1).We then limited our dataset to only those counties with established or reported I. scapularis populations, according to [11].As we aim to identify factors related to a county's transition of Lyme disease status, we targeted, per year, only those disease-free counties that shared a border with at least one infected county (at least one case was reported) in the same year.We then determined, also on an annual basis, whether a disease-free county remained disease-free or became infected in the next year.This binary classification served as the dependent variable in our analyses.

Lyme Disease Data
We first obtained the annual number of human Lyme disease cases in each county in the United States during 2000-2016 from the Centers for Disease Control and Prevention (CDC, Figure 1).We then limited our dataset to only those counties with established or reported I. scapularis populations, according to [11].As we aim to identify factors related to a county's transition of Lyme disease status, we targeted, per year, only those disease-free counties that shared a border with at least one infected county (at least one case was reported) in the same year.We then determined, also on an annual basis, whether a disease-free county remained disease-free or became infected in the next year.This binary classification served as the dependent variable in our analyses.

Explanatory Variables
Remote sensing (RS) has been recognized as an important tool for studying the ecology and epidemiology of infectious diseases [22].It is often used to identify environmental (including habitatrelated) and climatic risk factors, especially at regional level [23,24].Using the National Land Cover Database (https://www.usgs.gov/centers/eros/science/national-land-cover-database) in United States Geological Survey, we calculated forest habitat connectivity.This parameter was calculated as the minimum percentage of forest over 1 km wide buffer zones on both sides of a shared border between two counties.For each county, forest connectivity was weighted by the length of shared borders (1): where n is the number of infected neighbors that shared borders with disease-free county; li = length of shared borders between focal disease-free county and the ith infected county.Moreover, as Ixodes ticks are highly susceptible to desiccation, dry springs and summers may reduce the abundance of nymphal ticks in subsequent seasons [25][26][27].Therefore, we extracted several climatic variables from the WorldClim database, version 2.0 [28], including Bio01 (mean annual temperature), Bio05 (maximum temperature of warmest month), Bio10 (mean temperature of warmest quarter), Bio12 (annual precipitation), Bio13 (precipitation of wettest month), Bio16 (precipitation of wettest quarter), and Bio18 (precipitation of warmest quarter).These variables were evaluated for possible inclusion as covariates in the model (Table 1).Habitat connectivity and climatic variables were calculated in ArcGIS (Version 10.5) Similarity of local host assemblage for disease-free counties with neighboring infected counties (hereafter local assemblage similarity) was measured as β diversity using the Jaccard similarity index [29], based on the list of mammal host species of Ixodes scapularis from [11] which contains 39 mammal species.The distributions of these host species were obtained from the IUCN (International Union for Conservation of Nature) [30].Similar to forest connectivity, host assemblage similarity, calculated in R 3.5.0with vegan package [31], was weighted by the length of shared borders:

Explanatory Variables
Remote sensing (RS) has been recognized as an important tool for studying the ecology and epidemiology of infectious diseases [22].It is often used to identify environmental (including habitat-related) and climatic risk factors, especially at regional level [23,24].Using the National Land Cover Database (https://www.usgs.gov/centers/eros/science/national-land-cover-database) in United States Geological Survey, we calculated forest habitat connectivity.This parameter was calculated as the minimum percentage of forest over 1 km wide buffer zones on both sides of a shared border between two counties.For each county, forest connectivity was weighted by the length of shared borders (1): where n is the number of infected neighbors that shared borders with disease-free county; l i = length of shared borders between focal disease-free county and the ith infected county.
Moreover, as Ixodes ticks are highly susceptible to desiccation, dry springs and summers may reduce the abundance of nymphal ticks in subsequent seasons [25][26][27].Therefore, we extracted several climatic variables from the WorldClim database, version 2.0 [28], including Bio01 (mean annual temperature), Bio05 (maximum temperature of warmest month), Bio10 (mean temperature of warmest quarter), Bio12 (annual precipitation), Bio13 (precipitation of wettest month), Bio16 (precipitation of wettest quarter), and Bio18 (precipitation of warmest quarter).These variables were evaluated for possible inclusion as covariates in the model (Table 1).Habitat connectivity and climatic variables were calculated in ArcGIS (Version 10.5) Similarity of local host assemblage for disease-free counties with neighboring infected counties (hereafter local assemblage similarity) was measured as β diversity using the Jaccard similarity index [29], based on the list of mammal host species of Ixodes scapularis from [11] which contains 39 mammal species.The distributions of these host species were obtained from the IUCN (International Union for Conservation of Nature) [30].Similar to forest connectivity, host assemblage similarity, calculated in R 3.5.0with vegan package [31], was weighted by the length of shared borders: The intensities of disease in infected neighboring counties (hereafter neighbor infection intensity) were calculated based on the numbers of reported Lyme disease cases and were weighted using the following formula: where P i is the length of shared border between a disease-free focal county and the ith infected neighbor divided by the total border length of the focal disease-free county, thus, an infected neighboring county with a larger percentage of shared border was given greater weight.

Statistical Analyses
To explore the factors related to the probability of Lyme infection at the county level (i.e., the transition from disease-free to infected), we constructed Generalized Linear Mixed Models (GLMMs) with a binary response.State and year were included as random factors.Local assemblage similarity, neighbor infection intensity, forest connectivity, their two-way interactions, and climatic variables were included as fixed factors.We also included human population size of focal counties as an offset.
With GLMMs, we first performed univariate analyses to identify the potential risk factors.Variables with a p-value of less than 0.05 were identified as potential risk factors and were used to construct multiple regression models.Before constructing multiple models, we checked for multicollinearity by examining the correlation coefficients (r) between predictor variables.For those highly correlated (r > 0.7), only the variable with the smaller p-value was used to construct multiple regression models [32].We performed backward selection to build a multiple model with only main effects, and then the interaction terms were included into the model.Main terms were maintained in the model if they were included in a significant interaction term.Before fitting models, all predictors were scaled.We reported the ROC (receiver operating characteristic) curve and AUC (Area under ROC curve) value to estimate the performance of the final model.All analyses were conducted using R 3.5.0with lme4 package.

Local assemblage similarity The similarity of local host assemblage with neighboring counties
Counties with high similarity of host assemblage with infected neighbors would have a higher risk of disease spread

Forest connectivity Forest habitat connectivity between local county and neighboring county
Counties with high connectivity with infected neighbors would have a higher risk.

Neighbor infection intensity Number of Lyme cases reported in neighboring counties in the previous year
Counties close to heavily infected neighbors would have a higher risk.
Bio01, Bio05, Bio101 Temperature-related variables [28] Higher temperature and precipitation support tick establishment, tick survival, and contribute to the expansion of the distribution range of ticks [33,34]; however, a higher temperature can also decrease tick density by dehydrating ticks [26,27].
Bio12, Bio13, Bio14, Bio16, Bio18 1 Precipitation-related variables [28] Human population Human population size Counties with high population density may have a high disease risk.

Results
For each year from 2000 to 2015, about 484 ± 65 counties were disease-free, and about 25% of these disease-free counties got infected in the next year (Table 2).Based on the results of univariate analyses (Table S1) and the correlations between predictors (Table S2), seven variables were included in the final model (Table S1).The results from the final regression model (Table 3) suggested that both neighbor infection intensity and local assemblage similarity were positively correlated with transition in disease status from disease-free to infected.Although forest connectivity did not have a significant effect on its own, its interaction with neighbor infection intensity had a positive effect: increasing forest connectivity boosted the positive effect of neighbor infection intensity on disease status transition (Table 3; Figure 2a).In addition, the interaction between forest connectivity and local assemblage similarity related negatively to disease status transition: local assemblage similarity was more important when counties were poorly connected (Table 3; Figure 2b).

Discussion
Previous studies have demonstrated that landscape configuration, such as habitat connectivity, can potentially influence the risk of infectious diseases [35,36].However, these studies sometimes showed contradictory results about the role of habitat connectivity.In this study, we examined the effects of forest connectivity, as well as the effects of the host assemblage characteristics of local and neighboring counties, on the spatial spread of Lyme disease in the United States.We found that both the infection intensity of neighbors and local assemblage similarity had positive effects on the risk of disease spread.These two effects also depended on the level of forest connectivity, though forest connectivity itself did not play a significant role.
The spread of Lyme disease involves two processes: the dispersal potential of infected hosts/vectors and the establishment of infection [37].Generally, the dispersal potential of infected hosts/vectors largely depends on the abundance of infected hosts/vectors [38].Also, the probability of disease establishment seems to be positively correlated with the abundance of infected hosts [39].In the case of the climatic factors, disease status transition was negatively correlated with annual mean temperature (Bio01) and positively correlated with precipitation in the warmest quarter (Bio18; Table 3).The AUC value for the final multiple regression model was 0.751 (Figure 3), indicating a moderate predictive power.

Discussion
Previous studies have demonstrated that landscape configuration, such as habitat connectivity, can potentially influence the risk of infectious diseases [35,36].However, these studies sometimes showed contradictory results about the role of habitat connectivity.In this study, we examined the effects of forest connectivity, as well as the effects of the host assemblage characteristics of local and neighboring counties, on the spatial spread of Lyme disease in the United States.We found that both the infection intensity of neighbors and local assemblage similarity had positive effects on the risk of disease spread.These two effects also depended on the level of forest connectivity, though forest connectivity itself did not play a significant role.
The spread of Lyme disease involves two processes: the dispersal potential of infected hosts/vectors and the establishment of infection [37].Generally, the dispersal potential of infected hosts/vectors largely depends on the abundance of infected hosts/vectors [38].Also, the probability of disease establishment seems to be positively correlated with the abundance of infected hosts [39].

Discussion
Previous studies have demonstrated that landscape configuration, such as habitat connectivity, can potentially influence the risk of infectious diseases [35,36].However, these studies sometimes showed contradictory results about the role of habitat connectivity.In this study, we examined the effects of forest connectivity, as well as the effects of the host assemblage characteristics of local and neighboring counties, on the spatial spread of Lyme disease in the United States.We found that both the infection intensity of neighbors and local assemblage similarity had positive effects on the risk of disease spread.These two effects also depended on the level of forest connectivity, though forest connectivity itself did not play a significant role.
The spread of Lyme disease involves two processes: the dispersal potential of infected hosts/vectors and the establishment of infection [37].Generally, the dispersal potential of infected hosts/vectors largely depends on the abundance of infected hosts/vectors [38].Also, the probability of disease establishment seems to be positively correlated with the abundance of infected hosts [39].Consistent with these statements, we found a strong positive relationship between the infection intensity of neighbors and disease spread.In heavily infected counties, the number of both infected ticks and hosts are expected to be high, which is able to enhance the chances that infected hosts or ticks move to disease-free counties or facilitate disease establishment.In addition, we found the effect of neighbor infection intensity depended on forest connectivity.With increasing connectivity, the effect of neighbor infection intensity became greater, increasing the risk of disease spread from neighboring counties.
Disease invasion and establishment may also depend on the suitability of new areas [38].We found that the risk of disease spread positively correlated with the similarity between local and neighboring host assemblages.Due to host specificity of vectors and pathogens, these organisms are more likely to infect hosts that are or phylogenetically similar [21].Therefore, disease is more likely to spread between assemblages that are similar and share common host species [40].Additionally, similar host assemblages might indicate high similarity in habitat and resource availability, which can facilitate the adaptation of hosts and vectors to the new environments [38,41], thereby promoting disease establishment.Moreover, we found that the relationship between host assemblage similarity and disease spread also depends on forest connectivity.The effect size of the host assemblage similarity decreased with increasing forest connectivity Thus, host assemblage similarity is apparently less important when host species can move easily across county borders in well-connected habitats, and the transition in disease status can occur even when the host assemblage similarity is not very high.
For climatic factors, we found a negative correlation between annual mean temperature and the risk of Lyme spread, which is consistent with many previous studies [26,27].However, as the mean temperature was highly correlated with other temperature-related variables (such as the maximum temperature of warmest month and the mean temperature in the warmest quarter), we cannot determine which variable is mechanistically most important.A general mechanism underlying this relationship may be desiccation, to which ticks, especially larvae and nymphs, are susceptible [42].As temperature increases, the tick density may decrease due to desiccation, leading to a reduced risk of Lyme disease.The annual precipitation did not show a significant correlation with the risk of Lyme spread, while the precipitation in the warmest quarter (generally summer in the Northern Hemisphere) showed a positive relationship.Thus, summer precipitation might be a better predictor then annual precipitation.Indeed, summer precipitation is positively related to the abundance of blacklegged ticks in northwest Illinois [43].In this case, the proposed mechanism is through a negative effect of summer drought on understory plants and leaf litter layer: more exposed and drier soils are expected to reduce tick survival and reproduction [43].
It is important to acknowledge some limitations associated with our study.For example, we used the data of reported human cases of Lyme disease, which might be underreported [26].Additionally, the data we used offered only a "global" snapshot per county.Consequently, it was impossible for us to analyze if new cases in a previously uninfected county occurred along a border with an infected county.It was similarly impossible to take into account the mobility of infected people due to the lack of these data.Finally, the host distributions were obtained from the IUCN and did not take into account any possible changes (e.g., range shifts, expansions, or contractions) that occurred during the study period.Even though these data limitations may partly restrict our conclusions, our study still offers important new insights into the factors driving the spread of such a zoonotic disease.We are among the first to explore the interactions between habitat connectivity and characteristics of local and neighboring assemblages on disease expansion, and the results of our analyses demonstrate how important these interactions can be.

Conclusions
In this study, we explored the role of host assemblage characteristics and forest connectivity of local and neighboring counties in shaping Lyme disease expansion in the United States.Our results showed that the similarity between local and neighboring host assemblages and the infection intensity of neighbors had positive effects on the risk of disease expansion.Moreover, higher forest connectivity promoted the role of neighbor infection intensity and negated the role of host assemblage similarity.Our results, therefore, provide valuable new insights into the underlying interacting mechanisms that influence the spread of disease.Since factors related to disease spread do not function in isolation, future studies in this area should continue analyzing these interactions between ecologically relevant parameters.

Figure 1 .
Figure 1.The distribution of Lyme disease cases at the county level in (a) 2000 and (b) 2016 in the 35 states of the United States with established or reported Ixodes scapularis populations.

Figure 1 .
Figure 1.The distribution of Lyme disease cases at the county level in (a) 2000 and (b) 2016 in the 35 states of the United States with established or reported Ixodes scapularis populations.

Figure 2 .
Figure 2. The interaction effects of forest connectivity with (a) the infection intensity of neighboring counties and (b) the similarity of local host assemblage on the predicted probability that a diseasefree county shifted to infected.Red indicates low forest connectivity; blue indicates high forest connectivity.

Figure 3 .
Figure 3. ROC (receiver operating characteristic) curve and AUC (area under ROC curve) value for the selected multiple regression model explaining the probability of county-level disease status transition from Lyme disease free to infected.

Figure 2 .
Figure 2. The interaction effects of forest connectivity with (a) the infection intensity of neighboring counties and (b) the similarity of local host assemblage on the predicted probability that a disease-free county shifted to infected.Red indicates low forest connectivity; blue indicates high forest connectivity.

10 Figure 2 .
Figure 2. The interaction effects of forest connectivity with (a) the infection intensity of neighboring counties and (b) the similarity of local host assemblage on the predicted probability that a diseasefree county shifted to infected.Red indicates low forest connectivity; blue indicates high forest connectivity.

Figure 3 .
Figure 3. ROC (receiver operating characteristic) curve and AUC (area under ROC curve) value for the selected multiple regression model explaining the probability of county-level disease status transition from Lyme disease free to infected.

Figure 3 .
Figure 3. ROC (receiver operating characteristic) curve and AUC (area under ROC curve) value for the selected multiple regression model explaining the probability of county-level disease status transition from Lyme disease free to infected.

Table 1 .
Hypotheses and explanatory variables used in the analyses modelling the spatial spread of Lyme cases in the United States.

Table 2 .
Summary of numbers and ratio of transition from disease-free to infected county from 2000~2016.

Table 3 .
Summary of results (Adjusted Odds Ratio, AOR, with their 95% confidence intervals (CI) and P values) of the final regression model, correlated with the probability of county-level disease status transition from Lyme disease free to Lyme disease present.