Predicting the Geographic Range of an Invasive Livestock Disease across the Contiguous USA under Current and Future Climate Conditions

: Vesicular stomatitis (VS) is the most common vesicular livestock disease in North America. Transmitted by direct contact and by several biting insect species, this disease results in quarantines and animal movement restrictions in horses, cattle and swine. As changes in climate drive shifts in geographic distributions of vectors and the viruses they transmit, there is considerable need to improve understanding of relationships among environmental drivers and patterns of disease occurrence. Multidisciplinary approaches integrating pathology, ecology, climatology, and biogeophysics are increasingly relied upon to disentangle complex relationships governing disease. We used a big data model integration approach combined with machine learning to estimate the potential geographic range of VS across the continental United States (CONUS) under long-term mean climate conditions over the past 30 years. The current extent of VS is conﬁned to the western portion of the US and is related to summer and winter precipitation, winter maximum temperature, elevation, fall vegetation biomass, horse density, and proximity to water. Comparison with a climate-only model illustrates the importance of current processes-based parameters and identiﬁes regions where uncertainty is likely to be greatest if mechanistic processes change. We then forecast shifts in the range of VS using climate change projections selected from CMIP5 climate models that most realistically simulate seasonal temperature and precipitation. Climate change scenarios that altered climatic conditions resulted in greater changes to potential range of VS, generally had non-uniform impacts in core areas of the current potential range of VS and expanded the range north and east. We expect that the heterogeneous impacts of climate change across the CONUS will be exacerbated with additional changes in land use and land cover affecting biodiversity and hydrological cycles that are connected to the ecology of insect vectors involved in VS transmission.


Introduction
Changes in climate, land use and land cover are primary contributors to the expansion of vector-borne diseases at regional and continental scales [1][2][3][4][5][6][7][8]. Disentangling how different environmental factors are related to patterns in vector-borne disease occurrence at local scales can inform the spread of disease at broader spatial extents [9,10]. However, most studies of vector-borne disease have focused on fine-scale relationships among pathogen vectors and aspects of their local environment (e.g., [3,11]). A more comprehensive analysis of the complex relationships among the viruses, hosts, and the large suite of environmental drivers that can potentially affect disease dynamics is needed to predict changes in the geographic distribution of disease [10,12,13]. Because vector-borne disease can destabilize health, economic, social, and environmental systems [4,14], there is a critical need to understand both the underlying drivers governing disease occurrence at local scales and the change in geographic distribution of disease as the drivers change [15][16][17][18].
Multidisciplinary approaches integrating virus pathology, climate, and land surface information are increasingly used to understand complex systems and to predict dynamics of disease spread [12,19,20]. These approaches often combine large and diverse types of data with emerging technologies in machine learning to provide new insight into the drivers of disease occurrence across large spatial extents [9]. Combining diverse data types from multiple environmental drivers can reveal fine-scale dynamics and provide a basis for predicting shifts in geographic distribution and variability in occurrence and prevalence [21]. Although driver data are readily available for parts of the globe (e.g., North America), a major limitation of these approaches is the availability of disease occurrence data required to test the models, and then conduct analyses to predict future geographic distributions under alternative climate scenarios [22].
A large number of climate change scenarios are available that result in geographic variability in model output with consequences for predictions of future disease spread [23]. Uncertainty in climate model projections arises from natural variability in climate, emission scenario uncertainty, and the modeling process itself. Natural variability produces uncertainty due to the free-running nature of climate models, meaning that the models are not initialized or forced with observed conditions. Instead, they are spun up for a few hundred years to a quasi-equilibrium state using a plausible pre-industrial initialization. Then, this state becomes the new initialization for running the simulations forward through the present using certain observed time-varying atmospheric and land surface conditions [24,25]. Because of this free-running nature, modeled natural variability is not necessarily in concurrence temporally with observed natural variability. Emissions scenario uncertainty plays a role because there is no way to know which, if any, of the greenhouse gas emission forcing scenarios used to run climate models will align with reality. Uncertainty arising from the modeling process itself is due to limited theoretical or observational understanding of physical processes, difficulty in mathematically representing known processes, assumptions used in representing sub-grid scale processes (including ecological and vegetative processes), and missing or approximated processes such as dynamic vegetation [26,27].
The selection of climate projections for ecological prediction, which includes the choice of data source, specific climate models, and future emissions scenarios, is often based on reasons other than objective analysis of the available projections. These reasons may include availability of data, ease of use, or familiarity with the data provider [28] among others, and is a known problem in the application of climate projections in many fields [26,29,30]. For research on climate-driven ecological systems, climate model selection based on simulation performance is a critical part of the research process. However, while realistic model simulation of present-day climate is necessary, it is likely not sufficient for ensuring realistic simulations of climate under future conditions. Model performance in simulating present-day climate has been shown to correlate to model projection similarities only for certain variables on regional and global scales [31], but most straightforward metrics of assessing model simulations of present-day climate do not correlate with future climate projections [29]. Defining holistic performance metrics that relate to predictive skill is a largely unsolved problem. Regardless, use of models with demonstrably unrealistic simulations of present-day climate is unwarranted. Therefore, selecting projections from models with the most realistic present-day simulations, despite the caveats, is currently the most robust way to use model projections of future climate in a continental-scale ecological study like ours.
Our goal was to evaluate the environmental drivers of the geographic distribution of the vector-borne disease, vesicular stomatitis (VS), and to predict shifts in this distribution as a consequence of changes in climate using an objective analysis of output from multiple global climate models. VS is the most common vesicular livestock disease affecting horses, cattle, and swine in North America [32]. The causative agent of VS is vesicular stomatitis virus (VSV), an RNA virus that is endemic from northern South America to southern Mexico. The virus spreads from southern Mexico to northern latitudes in the US to result in incursion and expansion events facilitated by contact from biological and/or mechanical arthropod vector transmission [9,12,33]. While infections are rarely fatal to humans or livestock, VS clinical disease diagnosis in livestock is difficult to distinguish from footand-mouth disease, a devastating, highly contagious viral disease of livestock that was eradicated from the US in 1929 [34,35]. In addition, although wildlife in the current geographic range of VSV have not been implicated as hosts, as the range of VSV shifts to new locations, important wildlife species may become infected and warrant conservation measures to limit the negative effects of this disease. Mandatory reporting to the US Department of Agriculture of diagnosis and quarantine periods for premises for VS has resulted in the availability of multi-year occurrence data [9].
Recent studies identified interannual variability among environmental factors associated with VSV infection patterns at landscape and regional scales [9]. However, efforts have not explored long-term normal environmental relationships to estimate the potential geographic range of VS infections (e.g., [36,37]). Because climate is an important driver of inter-annual spatio-temporal occurrence variability, it is expected that the geographic range will also be affected by climate, and this distribution will shift with forecast changes in climate.
This study explored the continental-scale environmental relationships for VS occurrence across the entire contiguous United States (CONUS). While recent VS outbreaks have been limited to the western portion of the US [9], our model simulations for CONUS enable prediction of the potential current range of VS and allow for an increase in geographic extent of the disease under predicted changes in climate. Our objectives were two-fold: (1) to evaluate the continental-scale environmental and biotic factors related to the epidemic range of VS in the US under long-term normal environmental conditions (Current Climate), and (2) to predict shifts in the geographic distribution of VS using a suite of different climate change projections (Climate Change). We used a big data model integration (BDMI) approach, which has been used to explore and analyze of complex systems by coupling contemporary data science and analytical tools to existing knowledge and data, and to perform a machine learning analysis supervised by a transdisciplinary team that effectively identified environmental and biotic processes related to patterns in VS occurrence at local and regional scales [12]. We first harmonized publicly available data characterizing the expected biologically meaningful environment and constructed a model of VS geographic distribution of occurrences across the US using a database of known occurrences. Second, we simulated potential shifts in this geographic distribution of VS under future climate alternatives using downscaled climate projections selected from the general circulation models that most realistically simulate present-day climate according to our climate model performance analysis.

VS Occurrence Data
A total of 1963 records of VS New Jersey serotype (VS-NJ) from 2004 to 2016 were available. Another serotype not detected during the time frame of this analysis, VSV-Indiana (VS-IN) occurs infrequently, and it is unknown how the two serotypes differ in their response to environment variability. Therefore, we only included data for the NJ serotype occurrences. In-situ evaluation of infected animals by veterinarians followed by laboratory serum and antigen diagnostic analyses provided confirmation of VS infection, onset date, serotype, and premises location. Records of occurrences prior to 2004 were not included in this analysis due to unavailability of specific location (i.e., GPS coordinates) for the affected premises. In addition, historic data collection and management of affected premises differ from current approaches, and these differences would likely affect the results. Prior to the 1980's, there was no standardized quarantine approach, and the quarantine approach from 1980 to 2000 was ultimately revised and replaced with the current quarantine process. Consequently, the 2004-2016 data best represent the current natural range of VS spread by native vectors and limit confounding factors.

Approach
We followed the approach of Peters et al. [12] by using a machine learning process supervised by a transdisciplinary team to identify important factors and construct a model to estimate the potential range of VS occurrence across the CONUS (Figure 1). Disparate online data sources representing soil, climate, land use, host, and vector properties were synthesized (Table 1) using an information-theoretic approach [38] to guide model construction and favor model parsimony under current climate. We used mean long-term values of variables to minimize between-year effects of environmental variability on current VS processes. To simulate changes in VS distribution resulting from a long-term change in climate, climate parameters in the model were replaced using a suite of available climate emission scenarios. Our comparative efforts were focused on current versus climate change alternatives and did not consider transient dynamics of change in climate.

VS Occurrence Data
A total of 1963 records of VS New Jersey serotype (VS-NJ) from 2004 to 2016 were available. Another serotype not detected during the time frame of this analysis, VSV-Indiana (VS-IN) occurs infrequently, and it is unknown how the two serotypes differ in their response to environment variability. Therefore, we only included data for the NJ serotype occurrences. In-situ evaluation of infected animals by veterinarians followed by laboratory serum and antigen diagnostic analyses provided confirmation of VS infection, onset date, serotype, and premises location. Records of occurrences prior to 2004 were not included in this analysis due to unavailability of specific location (i.e., GPS coordinates) for the affected premises. In addition, historic data collection and management of affected premises differ from current approaches, and these differences would likely affect the results. Prior to the 1980's, there was no standardized quarantine approach, and the quarantine approach from 1980 to 2000 was ultimately revised and replaced with the current quarantine process. Consequently, the 2004-2016 data best represent the current natural range of VS spread by native vectors and limit confounding factors.

Approach
We followed the approach of Peters et al. [12] by using a machine learning process supervised by a transdisciplinary team to identify important factors and construct a model to estimate the potential range of VS occurrence across the CONUS ( Figure 1). Disparate online data sources representing soil, climate, land use, host, and vector properties were synthesized (Table 1) using an information-theoretic approach [38] to guide model construction and favor model parsimony under current climate. We used mean long-term values of variables to minimize between-year effects of environmental variability on current VS processes. To simulate changes in VS distribution resulting from a long-term change in climate, climate parameters in the model were replaced using a suite of available climate emission scenarios. Our comparative efforts were focused on current versus climate change alternatives and did not consider transient dynamics of change in climate.

Data Sources
We selected 34 environmental predictors for analysis based on previous investigations of patterns in VS that were expected to govern ecosystem level processes, including hosts and vectors, at continental scales [9,12,39] (Table 1). Our interdisciplinary approach prioritized variables related to biotic and abiotic processes expected to influence VS occurrence. The use of mechanistic details was expected to minimize spurious relationships and improve the quality of our model and its ability to project into alternative climates [40].
All spatial data were manipulated with ArcGIS v. 10.3 (Environmental Systems Research Institute, Redlands, CA, USA) and R v. 3.5.3 (Rstudio, Inc., Boston, MA, USA)). All spatial data were rasterized, when necessary, resampled, and projected to a uniform geographic dimension as part of our harmonization procedure (cell size approx. 4 km × 4 km) prior to analysis. Long-term normal climate data (precipitation and temperature) were standardized to seasonal 30- [42], transmission rates, overall vectorial capacity [43][44][45][46], and therefore, the temporal-spatial occurrence of vector-borne diseases [47][48][49]. For each season, mean monthly total precipitation and mean minimum and maximum temperature were summed or averaged across each season, and then averaged for the 30-year period to result in a total of 12 variables.

Hydrology Variables
Soil Moisture. This has been shown to influence vector abundance and vector geographic distribution and has been an important variable for predicting VS occurrence [9,50,51]. Soil moisture data were accessed through the GIOVANNI online data system [52] and downloaded as 30-year seasonal means.
Surface Runoff. This is expected to be an important factor for vector abundance and vector geographic distribution [39] and has also been included in several scales of VS investigations [9]. To capture variability in surface runoff, we accessed non-infiltrating overland flow data [53] through NASA's GIOVANNI online data system [52] to extract 30-year seasonal means.
Distance to nearest water. This was an important variable in previous analyses [9], and represents the behavior of Culicoides, a genus of biting midges and a known vector of VS, to travel only a few hundred meters and not more than 2-3 km from breeding locations [54,55]. Fine-scale investigations of other potential vectors (e.g., Culicoides spp.) have also found similar relationships between abundance and proximity to water [51]. Proximity to water was used in this analysis to represent distance to water sources associated with vector reproduction and abundance and was quantified by calculating the Euclidean distance to North American rivers and lakes [56].

Land Surface Variables
Land use. This classification provides a measure of the heterogeneity of the land surface and may be useful in identifying areas where land use practices can interact with the biotic environment to influence livestock disease prevalence [57]. Land use classifications were represented using 16 categories of land cover/land use from the GAP/LANDFIRE National Terrestrial Ecosystems data set [58].
Normalized Difference Vegetation Index (NDVI). This variable estimates the vegetative biomass and photosynthetic activity which has been linked with vector abundance [59,60], and was an important factor in several scales of VS and vector investigations [9,39,61]. We used the Google Earth Engine API [62] to construct seasonal mean NDVI using Landsat-5, 7, and 8 imagery (courtesy of the US Geological Survey) for the 30-year time period.

Biotic Variables
Host density. This is expected to be an important factor in VS occurrence [9,39]. While little is known about the breadth of potential wildlife hosts, both equine (horses) and bovine (cattle) are susceptible to VS infection and the most common species affected by the sporadic outbreaks of VS in the CONUS [63]. While VS antibody titers have been documented in wildlife, we are not aware of any observations of lesioned wildlife. The absence of significant viremia in most species suggests that wildlife other than feral swine are unlikely to play a major role as reservoirs. To represent variability in host density, we downloaded totals for horses, cattle, and calves for all US counties for the years 1997, 2002, 2007, and 2012 [64]. We averaged all years for horse, cattle and calf values by county separately, then rasterized into two separate layers representing mean horse and combined cattle and calf density. Additionally, means of total livestock were calculated as the sum of horse and combined cattle and calf mean densities. Unfortunately, accurate population estimates of feral swine were unavailable for these periods and are not included in this analysis.

Future Climate Alternatives
The effect of changing climate on the spatial distribution of VS was evaluated by substituting the current precipitation and temperature parameters with 30-year mean projections averaged across 2071-2100 from two emissions scenarios or representative concentration pathways (RCP) from the Coupled Model Intercomparison Project 5 (CMIP5): RCP 4.5, a mitigation scenario, which simulates the effects of greenhouse gas emissions that peak around 2040 and stabilize by 2100, and RCP 8.5, a business as usual scenario, which simulates continued increases in emissions throughout the 21st century. Respectively, the scenarios provide an intermediate and worse case perspective of future climate change scenarios [65].
Downscaled climate projections were selected for both RCP emissions scenarios based on whether the general circulation models (GCMs) that these data are derived from can simulate the mean, trend, and variability in winter (DJF) and summer (JJA) minimum and maximum surface air temperature and precipitation over the western US ( Figure A1). We conducted this model performance analysis for two reasons: (1) there were no GCM performance benchmarks implemented prior to the downscaling process, and (2) bias correction during the downscaling process cannot account for all model biases. Ten metrics (Table A1) were used to assess the performance of 34 CMIP5 [25] GCMs (Table A2; accessed through the Earth System Grid Federation CMIP5 archives https://esgf-node.llnl.gov/ search/cmip5/ accessed on 1 February 2021) by comparing the historical experiment simulations to Berkeley Earth temperature [66,67] and GPCC v2018 precipitation [68,69] observations over the 100-year period 1906-2005. GCMs were resampled to a 1-degree latitude by 1-degree longitude grid to facilitate comparison to the two observational datasets and an elevation-based temperature correction was applied to the model data based on an environmental lapse rate of 6.5 C/km.
The GCM performance analysis indicated that eight models passed our performance thresholds. We then selected downscaled projections for these models from the NASA Earth Exchange 800 m Downscaled Climate Projections dataset (NEX-DCP30) [70,71]. We chose this particular dataset because it contained downscaled projections for seven out of eight GCMs that passed our performance checks, whereas other commonly used downscaled datasets included fewer of these models. Mean climate conditions for the end of the 21st century were computed by averaging each projection over the years 2071-2100. The 7 selected projections from both RCP scenarios each represent a potential climate future. Over the western US region where most VS cases have occurred in the past ( Figure A1), our suite of selected projections included varied changes to summer precipitation (approximately ± 100 mm), decreased winter precipitation (between −100 to 0 mm), and increased temperatures (between 2 to 5 • C) ( Figures A2-A4). Further detailed and extended analyses of the biases in precipitation and temperature are available from Geil et al. (in prep).

Analysis
We used maximum entropy modeling (MaxEnt) [72][73][74], a machine learning algorithm to model the environmental suitability for VS across the CONUS under the 30 year normal climatic conditions (Current Climate, Table 1), and to project a change in geographic distribution under future climate conditions (Climate Change) using projected climate data (Table A2). MaxEnt has been extensively used across a broad range of biological applications to create spatially explicit species distribution models using presence only data [75]. The ability of MaxEnt to fit complex responses to predictors provides a robust platform that performs competitively with other distribution modeling approaches [76]. To reduce spatial autocorrelation, which can negatively impact MaxEnt analyses [77], 1963 occurrence records from 2002-2016 were spatially thinned using the R package spThin [78] to 10 km resulting in 544 occurrence records used to train the MaxEnt models.
We used the R variable selection package GIMVS [79] to select variables and tune MaxEnt parameters by varying the regularization parameter (β) from 0-10 at 0.5 intervals to reduce overfitting, removing variables contributing <5% of a model's information, and preventing co-occurrence of correlated variables when r 2 > 0.7. The performance of each modelwas assessed using the corrected Akaike Information Criteria (AICc) [80] which provide a relative measure of model quality by evaluating model fit and parsimony and has been shown to better identify biologically meaningful variables than MaxEnt's area under the receiver operating characteristic curve (AUC) [81]. Models with ∆AICc < 2 were considered to have a high level of empirical support [82] and would be considered in the development of the final model. Extrapolation beyond the environmental range of training data was limited by enabling the 'fade by clamping' option [83]. We ran 10 replicates for the current model and averaged the results. Since occurrence data did not indicate the number of infected individuals, the output was interpreted as the relative occurrence rate (ROR) for VS [84]. For a detailed information on MaxEnt and input parameters, see [84].
Current climate. The ROR was predicted for the CONUS by a multi-disciplinary team that supervised the comparison of 50 models, each a unique set of environmental factors and MaxEnt tunings. The importance of individual factors included in the best performing model, in terms of AICc, was evaluated using jackknife resampling plots and permutation importance (PI), which represents the contribution of each variable to the model, measures calculated in MaxEnt.
Model evaluation. We used an analysis of variance to compare the predicted occurrence rates using the selected model at 544 training points; the remaining 1419 points were used as test data, and 10,000 randomly located points were used to represent background locations without VSV. We performed pairwise comparisons of the predicted ROR values at training and test locations against the ROR at random points using Dunnett's test for multiple comparisons using the R package PMCMRplus to determine which of the ROR at training and test points was significantly greater. Model spatial transferability was evaluated by comparing the ROR at training and test poinst. Some key non-climatic parameters are fixed inputs based on historical means yet depend on climate. These include NDVI (as a measure of vegetation biomass) and horse density, both of which reflect important controls over VS but which may change under future climates. To assess how model design impacts variability and to determine if dynamic variables impact temporal transferability, we also constructed a model using only the static variables (e.g., topography) and climate parameters in the full model (hereinafter referred to as the 'climate-only' model). Comparisons between the two models provide insight into the degree and spatial distribution of variability between the more parameterized full model and a model that only incorporated dynamic climate and static broad-scale drivers (i.e., climate-only model). The climate-only model was evaluated by comparing the ROR at test points against the full model ROR at background and test points. In all comparisons, significance was set at 0.05.
Alternative climate scenarios. To generate predictions of future VS geographic range, we used our current climate MaxEnt model but replaced the temperature and precipitation variables with RCP 4.5 and 8.5 climate projections from the NEX-DCP30 downscaled climate projections dataset. The current climate MaxEnt model was run seven times for each RCP emissions scenario where each run included a different NEX-DCP30 climate projection as input. The resulting ROR predictions were then averaged together for each RCP emissions scenario. We also calculated the agreement between the seven different ROR predictions for both RCP 4.5 and 8.5 by calculating at each grid cell the number of predictions with ROR greater than 0.2 and 0.5. To quantify the degree of variability between the full model and climate-only model, we also ran the climate-only model for each RCP scenario. We calculated the difference for each RCP scenario and the RCP overall standard deviation of potential ROR to quantify the spatial variation across the CONUS.

Model Evaluation
Post-hoc evaluation of the best performing full model with reserved occurrence data (test data) and random background locations showed that our model under current conditions performed well with regard to fit of both training and test data ( Figure 2). The predicted ROR at occurrence locations used to train our model were significantly greater (mean ROR = 0.53, std = 0.20, and p < 0.001) than the ROR for 10,000 random background locations (mean ROR = 0.08 and std = 0.14). Similarly, the ROR of reserved test data were also significantly greater than that of background locations (mean ROR = 0.66, std = 0.16, and p < 0.001) and greater than the ROR of training data (p < 0.001). These results provide support that the MaxEnt full model can predict VS occurrences beyond the locations used in parameterizing and training the model. ROR predictions for both RCP 4.5 and 8.5 by calculating at each grid cell the number of predictions with ROR greater than 0.2 and 0.5. To quantify the degree of variability between the full model and climate-only model, we also ran the climate-only model for each RCP scenario. We calculated the difference for each RCP scenario and the RCP overall standard deviation of potential ROR to quantify the spatial variation across the CONUS.

Model Evaluation
Post-hoc evaluation of the best performing full model with reserved occurrence data (test data) and random background locations showed that our model under current conditions performed well with regard to fit of both training and test data (Figure 2). The predicted ROR at occurrence locations used to train our model were significantly greater (mean ROR = 0.53, std = 0.20, and p < 0.001) than the ROR for 10,000 random background locations (mean ROR = 0.08 and std = 0.14). Similarly, the ROR of reserved test data were also significantly greater than that of background locations (mean ROR = 0.66, std = 0.16, and p < 0.001) and greater than the ROR of training data (p < 0.001). These results provide support that the MaxEnt full model can predict VS occurrences beyond the locations used in parameterizing and training the model. Boxplots of the relative occurrence rates (ROR), shown as red circles, from the full model at each of the 10,000 random background data points (background; mean ROR = 0.08), the VS locations used to train the model under current climate (train; mean ROR = 0.53), and VS locations withheld from the training analysis (test; mean ROR = 0.66). The ROR of training data were significantly higher than background locations (p < 0.001). The ROR of test locations was also significantly higher than background locations (p < 0.001). The ROR of training data were significantly higher than the ROR of test data (p < 0.001). The ROR of test occurrences using the climate-only (Climate) model (mean ROR = 0.48) was significantly more than the background points (p < 0.001) and less than the training and test points (both p < 0.001).
Comparisons between the full model and the climate-only model, which included produced the greatest variation in the Western region of the CONUS ( Figure A5) where increases and decreases in ROR were >0.4. The predicted ROR using the climate-only model at test occurrence locations was significantly lower than the full model's ROR Figure 2. Boxplots of the relative occurrence rates (ROR), shown as red circles, from the full model at each of the 10,000 random background data points (background; mean ROR = 0.08), the VS locations used to train the model under current climate (train; mean ROR = 0.53), and VS locations withheld from the training analysis (test; mean ROR = 0.66). The ROR of training data were significantly higher than background locations (p < 0.001). The ROR of test locations was also significantly higher than background locations (p < 0.001). The ROR of training data were significantly higher than the ROR of test data (p < 0.001). The ROR of test occurrences using the climate-only (Climate) model (mean ROR = 0.48) was significantly more than the background points (p < 0.001) and less than the training and test points (both p < 0.001).
Comparisons between the full model and the climate-only model, which included produced the greatest variation in the Western region of the CONUS ( Figure A5) where increases and decreases in ROR were >0.4. The predicted ROR using the climate-only model at test occurrence locations was significantly lower than the full model's ROR (mean ROR = 0.48, std = 0.16, p < 0.001), especially in core areas of past infection. Differences within scenarios were also non-uniform and highest in the western portion of the CONUS. The overall standard deviation among RCP 4.5 and 8.5 scenarios was generally greatest (>0.35) in core areas of the current potential range (Figures A6 and A7).

VS Potential Distribution under Current Climate
The predicted ROR under current climate conditions varied across the CONUS with the highest estimates concentrated in the western region, including Arizona, Colorado, Idaho, Montana, New Mexico, Wyoming, Texas, and Utah ( Figure 3). This predicted spatial variability is similar to VS occurrences since 2004 as 95.4% of the total number of occurrences have been located in these states [63]. CONUS. The overall standard deviation among RCP 4.5 and 8.5 scenarios was generally greatest (>0.35) in core areas of the current potential range (Figures A6 and A7).

VS Potential Distribution under Current Climate
The predicted ROR under current climate conditions varied across the CONUS with the highest estimates concentrated in the western region, including Arizona, Colorado, Idaho, Montana, New Mexico, Wyoming, Texas, and Utah ( Figure 3). This predicted spatial variability is similar to VS occurrences since 2004 as 95.4% of the total number of occurrences have been located in these states [63].  Table 1. Recorded incidences of VS from 2004 through 2016 that were used as presence data in this analysis are shown as cyan '+'.
Our highest RORs correspond with locations with high observed rates of VS occurrence along the Front Range of the Rocky Mountains in Colorado, northern New Mexico, and southern Wyoming. However, the model over-predicted the spatial extent of VS in the northwest (central Oregon), the Midwest (southeastern South Dakota), the southwest (central Arizona and Utah), and parts of western Texas. The non-uniform distribution of VS occurrence and predicted RORs reflect spatial heterogeneity in environmental factors ( Figure 4).
Seven variables representing long-term mean climate, land cover, topography, hydrology, and host density factors provided the best performing full model under current climate (β = 1, AUC = 0.94, and a ΔAICc of 35.5 compared to the next best model). Overall, climate inputs were among the most influential factors in our model. A jackknife test of variable importance and permutation importance (PI) identified long-term mean summer precipitation (Jun-Aug) as the most important variable related to VS occurrence (PI = 28.2). VS was predicted to occur between 50 and 250 mm summer rainfall per year that occurs over 44% of the CONUS. Low amounts of winter (DJF) precipitation was the second most important variable (PI = 27.2); VS was predicted to occur in areas with less than 250 mm for these three months. Elevation (PI = 13.8) between 600 and 3500 m, and winter maximum temperature (PI = 10.1) between −2.5 and 25 °C, fall NDVI (PI = 8.8) between 0 and 0.5, and horse density (PI = 8.6) above 0.1 animals per hectare were also positively associated with higher predicted VS occurrence. The last variable in the model, distance to water (PI = 3.4), was negatively related to VS occurrence; highest RORs were found near water and ROR decreased as the distance increased from 0 to 1.4 degrees (ca. 155 km).  Table 1. Recorded incidences of VS from 2004 through 2016 that were used as presence data in this analysis are shown as cyan '+'.
Our highest RORs correspond with locations with high observed rates of VS occurrence along the Front Range of the Rocky Mountains in Colorado, northern New Mexico, and southern Wyoming. However, the model over-predicted the spatial extent of VS in the northwest (central Oregon), the Midwest (southeastern South Dakota), the southwest (central Arizona and Utah), and parts of western Texas. The non-uniform distribution of VS occurrence and predicted RORs reflect spatial heterogeneity in environmental factors ( Figure 4).
Seven variables representing long-term mean climate, land cover, topography, hydrology, and host density factors provided the best performing full model under current climate (β = 1, AUC = 0.94, and a ∆AICc of 35.5 compared to the next best model). Overall, climate inputs were among the most influential factors in our model. A jackknife test of variable importance and permutation importance (PI) identified long-term mean summer precipitation (Jun-Aug) as the most important variable related to VS occurrence (PI = 28.2). VS was predicted to occur between 50 and 250 mm summer rainfall per year that occurs over 44% of the CONUS. Low amounts of winter (DJF) precipitation was the second most important variable (PI = 27.2); VS was predicted to occur in areas with less than 250 mm for these three months. Elevation (PI = 13.8) between 600 and 3500 m, and winter maximum temperature (PI = 10.1) between −2.5 and 25 • C, fall NDVI (PI = 8.8) between 0 and 0.5, and horse density (PI = 8.6) above 0.1 animals per hectare were also positively associated with higher predicted VS occurrence. The last variable in the model, distance to water (PI = 3.4), was negatively related to VS occurrence; highest RORs were found near water and ROR decreased as the distance increased from 0 to 1.4 degrees (ca. 155 km).

Changes in Geographic Range under Future Climate Conditions
Each climate projection used as input to our MaxEnt model modified the range of VS estimates across the CONUS. For the mitigation scenario RCP 4.5, although there was variability between predictions of ROR using different climate projections, there was a general agreement on the areas with an increase in simulated potential ROR ( Figure 5).

Changes in Geographic Range under Future Climate Conditions
Each climate projection used as input to our MaxEnt model modified the range of VS estimates across the CONUS. For the mitigation scenario RCP 4.5, although there was variability between predictions of ROR using different climate projections, there was a general agreement on the areas with an increase in simulated potential ROR ( Figure 5).   For the business-as-usual scenario RCP 8.5, similar patterns to RCP 4.5 were found for ROR, although a larger spatial extent and greater infilling were typically found fo each MaxEnt model run (Figure 7). The greatest ROR increases were adjacent to the north ern and eastern regions surrounding current VS occurrence locations. A large portion o north Texas and Oklahoma experienced increases in ROR (>50%) when compared to th RCP 4.5 predictions. In addition, localized ROR increases (>50%) were found in Idaho Montana, North Dakota, and South Dakota. This pattern was most pronounced using th ACCESS1.0, CESM1.CAM5, HadGEM2.ES, and NorESM1.M climate projections. Result using these projections also show stronger agreement in decreases in ROR prediction (>50%) in southern Texas and Arizona, and throughout the Rocky Mountains when com pared to the RCP 4.5 predictions. RCP 8.5 MaxEnt results were in general agreement fo For the business-as-usual scenario RCP 8.5, similar patterns to RCP 4.5 were found for ROR, although a larger spatial extent and greater infilling were typically found for each MaxEnt model run (Figure 7). The greatest ROR increases were adjacent to the northern and eastern regions surrounding current VS occurrence locations. A large portion of north Texas and Oklahoma experienced increases in ROR (>50%) when compared to the RCP 4.5 predictions. In addition, localized ROR increases (>50%) were found in Idaho, Montana, North Dakota, and South Dakota. This pattern was most pronounced using the ACCESS1.0, CESM1.CAM5, HadGEM2.ES, and NorESM1.M climate projections. Results using these projections also show stronger agreement in decreases in ROR predictions (>50%) in southern Texas and Arizona, and throughout the Rocky Mountains when compared to the RCP 4.5 predictions. RCP 8.5 MaxEnt results were in general agreement for most areas where ROR was estimated to increase by >0.5 (Figure 8b). However, RCP 8.5 MaxEnt results indicate low agreement in predicted ROR along the eastern extent of the VS current range in Texas, Oklahoma, Kansas, Nebraska, and South Dakota (Figure 8a). We also note that there is much more disagreement in our RCP 8.5 MaxEnt results compared to our RCP 4.5 results (Figures 6a,b and 8a,b).  However, RCP 8.5 MaxEnt results indicate low agreement in predicted ROR along the eastern extent of the VS current range in Texas, Oklahoma, Kansas, Nebraska, and South Dakota (Figure 8a). We also note that there is much more disagreement in our RCP 8.5 MaxEnt results compared to our RCP 4.5 results (Figure 6a,b and Figure 8a,b).

Discussion
Our human-guided machine learning approach distilled complex information across multiple disciplines, and identified plausible and important long-term predictors of the geographic range of VS under current climate at the continental scale. The use of ecological niche modeling, which has successfully been used to predict disease and vector distributions [85,86], provided a robust analysis of nearly three decades of VS observations with publicly available environmental datasets. Broad-scale environmental factors identified as related to VS occurrence through interactions with hosts, vectors, and the pathogen were used to map the potential geographic distribution of VS across the CONUS, and

Discussion
Our human-guided machine learning approach distilled complex information across multiple disciplines, and identified plausible and important long-term predictors of the geographic range of VS under current climate at the continental scale. The use of ecological niche modeling, which has successfully been used to predict disease and vector distributions [85,86], provided a robust analysis of nearly three decades of VS observations with publicly available environmental datasets. Broad-scale environmental factors identified as related to VS occurrence through interactions with hosts, vectors, and the pathogen were used to map the potential geographic distribution of VS across the CONUS, and provide insights into the spatial uncertainty under future scenarios. While fine-scale processes can increase variability, these results are important in defining priority areas for research, monitoring, and mitigation efforts [87]. Furthermore, this approach, using current climatic conditions in a machine learning framework to predict the geographic extent of disease threats, can be applied to other vector-borne diseases, such as West Nile, Rift Valley, Chikungunya, and dengue, where sufficient environmental data are available [88][89][90][91].
The potential geographic extent of VS under current climatic conditions included observed occurrence locations from 2004 to 2016, and expanded the range primarily to the northwest (Montana, Idaho, Oregon), southwest (Arizona, Utah), and large portions of Texas. This predicted range based on locations with similar long-term climate (reduced summer, reduced winter precipitation, and increased winter maximum temperature), elevation, fall vegetation biomass (NDVI), horse density, and proximity to water provides insight to the processes that lead to vector-borne disease spread across multiple years. This long-term perspective excludes transient effects beyond the scope of this research and identifies the locations to be avoided by livestock owners as predictable disease hot spots during a disease outbreak (e.g., the Front Range and Western Slope in Colorado) or alternatively, to select locations where VS is unlikely to occur based on mean current climatic conditions (e.g., much of the Midwest and east and west coasts).
Climate change simulations are important management tools for anticipating ecological and economic threats [92]. For VS, projections from a future scenario with moderate increases in greenhouse gas concentrations (RCP 4.5) predicted an eastern expansion of VS in Oklahoma and Texas, localized reductions in the Rocky Mountains, and with less confidence, some expansion northward in North Dakota, South Dakota, and Montana. Projections from a future scenario with continued increasing greenhouse gas concentrations (RCP 8.5) more strongly suggest an expansion of the range of VS northward (North Dakota, South Dakota, Montana) as well as eastward (Nebraska, Kansas, Oklahoma, and Texas). Future climate conditions that likely modify stream flow (i.e., high precipitation) and vector habitat are expected to affect the future range of VS the most. In addition, modification of temperature and precipitation can impact vectorial capacity [45,46]. Changes in climate that result in an expansion in the geographic distribution by VS to the north, west, and southwest and/or modification of vectors' ability to transmit disease would have important consequences for livestock owners in these regions. Because VS has not been found historically in these states, veterinarians and livestock owners would need specific educational outreach and materials to increase awareness for the possible incursion of the disease, identify clinical signs of VS, understand reporting requirements to animal health officials, and implement appropriate vector control and biosecurity practices in susceptible livestock herds [93].
Historically, for VS, the virus has spread from endemic regions in Mexico to locations in the CONUS over a two-to-three-year period with overwintering between years [32,33,94]. The occurrence and expansion of this vector-borne disease during incursions into the US requires the presence of the virus, competent vectors, and susceptible hosts in addition to suitable environmental conditions for vector population growth and survival. While currently exposed wildlife are unlikely to play a major role as host due to the absence of significant viremia and observations of lesions, a changing climate that affects existing VS processes or results in novel processes governing VS infection could result in wildlife being important hosts in the future. This modification has been demonstrated in an invasive species on Ossobaw Island (Georgia), where the eradication feral swine, which were implicated as reservoirs in an endemic cycle of VS-NJ, coincided with the disappearance of VS [95].
Similar to recent analyses for VS [9], a large suite of variables were needed in the best performing full model representing long-term mean climate, land cover, topography, hydrology, and host density. However, the specific variables in our analysis differed from previous analyses where incursion versus expansion years had a set of variables related to different host habitat (e.g., black flies in incursion and expansion years, biting midges in expansion years) [9]. Our results share more variables with the incursion years suggesting that black fly habitat more frequently governs VS occurrence compared to biting midge habitat. In addition, the predicted range maps and habitat of vectors known or hypothesized to be important to the spread of VS [96] cover much of this expanded geographic distribution of disease under current climate conditions. Based on our current climate model, the extent of several species of black fly and biting midges overlap with the predicted geographic distribution of VS [97][98][99][100][101][102][103]. These insect vectors are likely susceptible to multiple impacts of future climate change. In addition, future climatic conditions may alter the importance of minor or rare vector species through alteration of vectorhost interactions.
While our climate-only model produced lower ROR values at occurrence locations (i.e., less fit) compared with our full model, it did spatially quantify how, in the absence of other mechanistic elements, climate changes alone may potentially impact broad-scale environmental suitability. While it is difficult to anticipate how these processes may interact with changing climate and shifts in landscape structure, consideration of a less parameterized and potentially more temporally transferable climate-only model offers a coarser characterization of where VS could occur in the future.
Our results can be used to guide long-term mitigation efforts for reducing the risk of VS infection [93,104] when novel processes challenge the management of systems with livestock diseases. Current conservation efforts are limited in spatial scale to individual pastures or ranches where spatial processes of insect spread and animal movement are not important (Augustine et al. in press). Increasing the spatial scale of conservation efforts to the watershed or landscape unit scale will require inclusion of insect dispersal and animal transport processes that are not included in our current model, yet these processes can have consequences for the dynamics of insect vectors and spread of VS.
Because VSV can be transmitted by biting insects feeding on infected animals, by direct contact between infected animals, and by fomites (contaminated inanimate objects that can transfer disease) [96,105], it is expected that higher livestock densities would increase the occurrence or rate of spread of VS. Viral transmission is driven by the abundance of animal hosts, and the frequency of contacts between hosts or between hosts and vectors. Arthropod-borne viruses, such as VSV, that infect multiple animal and insect host species, have different population dynamics than those restricted by a single host [106]. This diversity results in a greater transmission potential that can impact disease outbreaks. Examples of these impacts have been found in several livestock associated arthropodborne virus systems, including but not limited to, Japanese encephalitis virus [107] and Rift Valley fever virus [108]. Changes to climate are also expected to modify VS occurrences since temperatures play a significant role in biting midge physiology (longevity, fecundity, fertility), ecology, distribution, and viral replication [48,[109][110][111][112][113]. In addition, temperatures can affect transmission dynamics by mediating gonotrophic cycles and oviposition rates [111,114], which affect the number of times vectors will contact hosts to blood feed thereby affecting rates of transmission [115]. Similar results have been reported for mosquito-borne viruses [116][117][118][119].

Conclusions
Multidisciplinary approaches integrating process-based information from multiple scales can improve understanding of the factors related to long-term occurrence patterns of VS and offer insight into how the geographic distribution may change in the future. Our findings suggests that the heterogeneous impacts of climate change across the CONUS will be intensified as they coincide with changes in land use and land cover that affect biodiversity and hydrological cycles tied to the ecology of insect vectors involved in VS transmission. While more detailed data and contemporary analytical tools will continue to provide improvements in mechanistic understanding of VS, these models also provide an evidence-based product that enables prediction of the geographic distribution of VS infections that can be used to guide research and mitigation efforts.   (Table A2).   (Table A2).  (Table A2). Figure A3. As Figure A2, except representing the change in winter precipitation (December through February). Figure A3. As Figure A2, except representing the change in winter precipitation (December through February). Climate 2021, 9, x FOR PEER REVIEW 20 of 28 Figure A4. The projected change in winter (December through February) average maximum temperature rates (°C) for each climate projection selected for analysis under RCP 4.5 described in Table A2.   Table A2. Figure A4. The projected change in winter (December through February) average maximum temperature rates (°C) for each climate projection selected for analysis under RCP 4.5 described in Table A2.    Table A2. The standard deviation of all model outputs is also shown.   Table A2. The standard deviation of all model outputs is also shown.  Table A2. The standard deviation of all model outputs is also shown. Figure A7. As Figure A6, except using climate projections from the RCP 8.5 emission scenario. Figure A7. As Figure A6, except using climate projections from the RCP 8.5 emission scenario. Table A1. The 10 metrics and metric threshold values used to determine GCM performance. Metrics were designed to test GCM performance in variables previously determined to be important to historical occurrences of the VS virus; namely winter (DJF) and summer (JJA) precipitation (pr), and minimum and maximum surface air temperature (tasmin, tasmax). All metrics were computed for winter (DJF) and summer (JJA) seasonal averages over the 100-year period 1906-2005 by comparing the GCM historical experiment to GPCC precipitation and Berkeley Earth temperature observations.

Metric
Metric Bias in the 100-yr trend with statistical significance at the 95% significance level computed on the trend of the model minus observations timeseries using a two-tailed students t test. % area computed as number of significant grids divided by total grids in the evaluation area. Performance thesholds chosen to eliminate only the worst performing models. DJF = 20% of study area JJA = 20% of study area % of study area with significant bias in 100-yr tasmin trend Bias in the 100-yr trend with statistical significance at the 95% significance level computed on the trend of the model minus observations timeseries using a two-tailed students t test. % area computed as number of significant grids divided by total grids in the evaluation area. Performance thesholds chosen to eliminate only the worst performing models. DJF = 33.3% of study area JJA = 33.3% of study area % of study area with significant bias in 100-yr tasmax trend Bias in the 100-yr trend with statistical significance at the 95% significance level computed on the trend of the model minus observations timeseries using a two-tailed students t test. % area computed as number of significant grids divided by total grids in the evaluation area. Performance thesholds chosen to eliminate only the worst performing models. DJF = 33.3% of study area JJA = 33.3% of study area Table A2. The 34 Coupled Model Intercomparison Project 5 (CMIP5) global circulation models evaluated in this study (accessible through the Earth System Grid Federation CMIP5 archives https://esgf-node.llnl.gov/search/cmip5/ accessed on 1 February 2021) and their associated modeling center and country. Monthly resolution historical experiment model output was examined for three variables: precipitation (pr), minimum surface air temperature (tasmin), and maximum surface air temperature (tasmax). Only one realization (r1i1p1) was examined for each model. Orography (orog) and land fraction (sftlf) from each model was also used. Note: many more models exist in the CMIP5 archive, but we chose only those that provided data for all of the aforementioned variables as well as data for the historical, rcp 4.5, and rcp 8.5 experiments. NASA Earth Exchange 800m Downscaled Climate Projections dataset (NEX-DCP30) models that passed all of our performance metrics and were used in this analysis are indicated with an asterisk.