Iterative Models for Early Detection of Invasive Species across Spread Pathways

Species distribution models can be used to direct early detection of invasive species, if they include proxies for invasion pathways. Due to the dynamic nature of invasion, these models violate assumptions of stationarity across space and time. To compensate for issues of stationarity, we iteratively update regionalized species distribution models annually for European gypsy moth (Lymantria dispar dispar) to target early detection surveys for the USDA APHIS gypsy moth program. We defined regions based on the distances from the invasion spread front where shifts in variable importance occurred and included models for the non-quarantine portion of the state of Maine, a short-range region, an intermediate region, and a long-range region. We considered variables that represented potential gypsy moth movement pathways within each region, including transportation networks, recreational activities, urban characteristics, and household movement data originating from gypsy moth infested areas (U.S. Postal Service address forwarding data). We updated the models annually, linked the models to an early detection survey design, and validated the models for the following year using predicted risk at new positive detection locations. Human-assisted pathways data, such as address forwarding, became increasingly important predictors of gypsy moth detection in the intermediate-range geographic model as more predictor data accumulated over time (relative importance = 5.9%, 17.36%, and 35.76% for 2015, 2016, and 2018, respectively). Receiver operating curves showed increasing performance for iterative annual models (area under the curve (AUC) = 0.63, 0.76, and 0.84 for 2014, 2015, and 2016 models, respectively), and boxplots of predicted risk each year showed increasing accuracy and precision of following year positive detection locations. The inclusion of human-assisted pathway predictors combined with the strategy of iterative modeling brings significant advantages to targeting early detection of invasive species. We present the first published example of iterative species distribution modeling for invasive species in an operational context.


Introduction
Effective targeting for early detection of invasive species requires knowledge of an organism's biology and mechanisms of invasion, as well as the ability to translate that information into an optimized surveillance program [1].A popular tool for invasive species managers is the use of species distribution models (SDMs) to define the geographic extent over which an invading species may occur [2,3].Factors that influence realized distributions (e.g., where a species is actually found) may be categorized by the biotic-abiotic-movement (or "BAM") framework [4].In practice, researchers most often define distributions by an organism's potential niche [5], estimated by the range of abiotic variability within which it can survive and reproduce.However, potential distributions may be further limited by biotic interactions that arise from other processes, such as competition [6,7], predation [8], and parasitism [9,10].The movement component of the BAM diagram describes habitat accessibility through dispersal, such that areas of the predicted niche are within reach of dispersing organisms [11].For invasive species distribution models (iSDMs), dispersal may be assisted by human movement, which can transport organisms across long distances [12][13][14][15].The inclusion of dispersal mechanisms or "pathways" within iSDMs may be an effective approach for optimizing early detection programs to areas with high potential for invasion [16][17][18].
Pathways can be defined as the mechanisms or routes by which species arrive at new regions or ecosystems [19].Invasion success is partially defined by the magnitude and spatio-temporal variability of propagule pressure across invasion pathways [20,21].Therefore, development of pathway predictors for iSDMs will be most informative when they are proximate to the mechanism of dispersal [2] and include information on origin, destination, and rate of movement per time step [22][23][24][25].Examples of pathway predictors in iSDMs are sparse and typically involve spatial kernels as dispersal predictors [18,26,27].Pathways models fitted to large geographic extents are particularly under-represented [28,29], and even fewer examples exist of volume-based pathways predictors [30].However, there is potential for using such pathway models [31] to guide early detection and rapid response, or EDRR [32].Pathways data in SDMs not only refines potential distribution to more targeted risk areas, but also more explicitly supports the operational use of SDMs for EDRR by targeting the mechanisms for how invasive species enter an ecosystem [22].The quantification of major pathways can guide policy programs [33] to target high risk pathways in outreach campaigns designed to prevent human-assisted spread [34].
However, there remain significant challenges [35][36][37] in the development of iSDMs, particularly to target early detection.The process of biological invasion is both spatially and temporally dynamic, which violates assumptions of stationarity in SDMs for invasive species [38].The assumption of a species in equilibrium with its environment (i.e., temporal stationarity) results in underprediction of risk area [39], especially in earlier stages of the invasion process [38,40].Spatial stationarity is often violated, especially over large geographic extents, because abiotic/biotic constraints may vary over space [39] or the organism exhibits characteristics of stratified dispersal [41,42].Predictive models following suggested guidelines [43][44][45] and iterated over time to test hypothetical processes of invasion [46] provide a more informative and adaptive framework for managing species distributions than single model development [47].Iteratively updating iSDMs improves the detection of new invasion hotspots [48,49], expands predicted geographic distributions [50], and increases the reliability of model predictions [51,52], particularly when paired with a targeted survey design [53].We apply these principles of handling non-stationarity and iterative modeling to the European gypsy moth (Lymantria dispar dispar Linnaeus 1758) to increase our understanding of its mechanisms for spread and how to increase effectiveness for early detection within operational contexts.
The European gypsy moth is a forest pest accidentally introduced to Medford, Massachusetts, USA. in 1869 that has steadily spread across the north-eastern United States, establishing as far north as Maine, west toward Wisconsin and Minnesota, and as far south as Virginia.Gypsy moth feed on more than 300 species of trees and shrubs, making it a generalist species [54].This destructive forest pest has impacted forests in invaded areas by reducing mast production [55], quality of timber products [56], affecting native species [57,58], nutrient cycling [59], and human health [60], with an economic impact estimated at >$250 million per year [61].Because of these impacts, the gypsy moth is a well-studied species with detailed information on population biology [62,63], climate suitability [64][65][66], pathways [67][68][69][70][71], detection [72,73], and optimal management strategies [74][75][76].Most of the United States is climatically suitable to gypsy moth [77].The combination of broad climatic suitability and extensive host list means that gypsy moth can potentially establish in most of the United States [78], making early detection of the spread into new areas difficult to target.
However, targeting surveillance to gypsy moth's pathways for spread may yield efficiency in limited program resources.Gypsy moth naturally disperses by larval ballooning [79].While male moths can fly, the females are incapable of flight.Dispersal by larval ballooning generally occurs over a distance of 1-3 km [80], but may extend over much greater distances depending on topography and wind [81].Humans also play a role in transporting the insect [80], because its sticky egg masses may be deposited on most outdoor objects.Bigsby et al. [67] developed an anthropogenic model to estimate the contribution of anthropogenic factors on gypsy moth spread for counties with a historical record of infestation.They found that locations with greater proximity to source populations, higher household income, and higher household consumption of firewood were correlated with a higher likelihood of gypsy moth presence.A second study on anthropogenic pathways for gypsy moth [71] used proxies, such as population density and road accessibility, but it lacked proximate predictors for spread mechanisms.A national model of spread pathways is not likely to be stationary across geographic space, which affects how management actions target different stages of invasion or pathways of spread occurring across the landscape.
Our goal is to develop a species distribution model to support the early detection of European gypsy moth that addresses geographic and temporal variability of spread mechanisms.Our specific objectives are to identify proximate predictors for spread mechanisms, evaluate regional differences in spread, and to demonstrate the utility of iteration.We developed origin-destination data as a predictor for human-assisted spread, developed regionalized models based on geographic changes in predictor strength, and analyzed the value of iterative model development by assessing model performance with following year survey data.

Study Area
Our study area consisted of the continuous U.S. Within the U.S., 11 states that cover the leading edge of the infestation cooperate with the US Forest Service in the Slow the Spread [82] program to monitor population levels and implement treatment programs to slow the natural spread rate of the gypsy moth.The United States Department of Agriculture (USDA) Animal and Plant Health Inspection Service, Plant Protection, and Quarantine (APHIS-PPQ) similarly coordinates with state departments of agriculture and forestry for detection and eradication efforts of isolated populations outside the Slow the Spread (STS) project area (Figure 1).In addition to operational duties, APHIS has authority to enact and enforce regulatory policy to limit interstate movement of gypsy moth life stages hitchhiking on private or commercial traffic, as well as to designate/terminate domestic quarantines to delineate the areas of general infestation under effect of limited interstate movement (7 C.F.R. §301.45).Nearly 225 million acres are currently under federal quarantine, which are considered generally infested.Therefore, active surveillance does not occur within the quarantine area.All analyses used data from the entire U.S., but final model predictions were masked to exclude the current federal quarantine and active Slow the Spread project zone.

Survey Data Acquisition and Preparation
The survey and detection data for European gypsy moth were collected from various sources, with the primary sources being the (1) national and state offices of the USDA APHIS-PPQ, (2) the Slow the Spread program, and (3) the National Agricultural Pest Information System (NAPIS).This is the first time that a national survey database has been compiled across data sources for gypsy moth in the U.S., resulting in 1.9 million records spanning the period of 1974-2017.Completeness of data reporting is not consistent over time, and trapping density varied tremendously from state-to-state.Records were tested for spatial quality (removing locations where coordinates did not occur within the recorded county or state) and screened for duplicate records.We also compiled historical gypsy moth eradication treatment polygons from the western U.S. (beyond the Slow the Spread zone), as these treatment areas were indicative of a population that was introduced and then established at a level requiring eradication.Most of these historical treatment areas were not present in the detection database, therefore, exact detection locations were unknown.We converted these polygons to a raster, matching the scale, projection, and snapped extent of a raster template (a 1-km national survey grid).We converted the raster centroid locations to presence point locations, resulting in additional potential presence locations.These locations were spatially thinned in later described steps to algorithmically choose the presence locations for model training.We assume that the benefits of adding potentially unique site characteristics for model training outweigh the cost of adding noise from inexact, but auto-correlated locations.
The response variable varied between presence-absence and count data, and were standardized to presence-absence.We defined 'presence' differently based on the region (i.e., short-range or longrange mechanisms).For the short-range geographic model, counts of ≥3 gypsy moths were considered an established, reproducing population (presence) based on a conservative estimate of the number of male moths caught in a trap and relative female mating success [62,[83][84][85].These presence locations were used to estimate distances to next year detections for a spread kernel, which

Survey Data Acquisition and Preparation
The survey and detection data for European gypsy moth were collected from various sources, with the primary sources being the (1) national and state offices of the USDA APHIS-PPQ, (2) the Slow the Spread program, and (3) the National Agricultural Pest Information System (NAPIS).This is the first time that a national survey database has been compiled across data sources for gypsy moth in the U.S., resulting in 1.9 million records spanning the period of 1974-2017.Completeness of data reporting is not consistent over time, and trapping density varied tremendously from state-to-state.Records were tested for spatial quality (removing locations where coordinates did not occur within the recorded county or state) and screened for duplicate records.We also compiled historical gypsy moth eradication treatment polygons from the western U.S. (beyond the Slow the Spread zone), as these treatment areas were indicative of a population that was introduced and then established at a level requiring eradication.Most of these historical treatment areas were not present in the detection database, therefore, exact detection locations were unknown.We converted these polygons to a raster, matching the scale, projection, and snapped extent of a raster template (a 1-km national survey grid).We converted the raster centroid locations to presence point locations, resulting in additional potential presence locations.These locations were spatially thinned in later described steps to algorithmically choose the presence locations for model training.We assume that the benefits of adding potentially unique site characteristics for model training outweigh the cost of adding noise from inexact, but auto-correlated locations.
The response variable varied between presence-absence and count data, and were standardized to presence-absence.We defined 'presence' differently based on the region (i.e., short-range or long-range mechanisms).For the short-range geographic model, counts of ≥3 gypsy moths were considered an established, reproducing population (presence) based on a conservative estimate of the number of male moths caught in a trap and relative female mating success [62,[83][84][85].These presence locations were used to estimate distances to next year detections for a spread kernel, which was computed as Observations inside treatment polygons within the short-range region were removed for the current and following survey years due to the effects of treatment applications such as mating disruption by the synthetic pheromone disparlure, Bacillus thuringiensis var.kurstaki (Btk), and/or pesticides [86].The other geographic models included any count >0 as a 'presence', because the purpose of early detection is to find moths before they become a reproducing population.Therefore the count data in the intermediate and long-range models do not represent reproducing populations.The short-range geographic model was temporally dynamic, and so was calculated from the prior year's 'presence' locations.Long-range geographic models collapsed data over time, due to the lower number of unique spatial observations.
California did not report absence data; therefore, some assumptions were made to generate background locations.We used a two-step method for creating a survey bias surface on which to base the location of background points: (1) We generated a kernel density surface for the clustering of survey efforts, and (2) applied a fitted function to detections along an accessibility surface to represent survey bias with respect to road/urban center accessibility.Both bias surfaces were standardized, 0-1, and multiplied together to create a combined survey bias surface (Supplementary S1).Due to the inclusion of background data, all absence locations were treated as background, which relaxed assumptions regarding the true detection value of that location.
We analyzed detection data using Moran's I, then spatially thinned the data within 5-km to reduce spatial autocorrelation between observations [87].We found that the data thinning process also reduced computation time and errors during numerical fitting.Remaining observations were then snapped and re-projected (Albers equidistant) to our 1-km 2 raster template.For the short-range geographic model, data was spatially thinned by year to allow variability in the annual dispersal range.For the intermediate and long-range models, all presence/absence data was collapsed over the historical period.

Predictor Variables
The selection and development of candidate predictors was based on input from the Interagency Gypsy Moth working group and guidelines from the USDA APHIS gypsy moth program manual [88], which categorizes urban and environmental factors suspected to have an associated risk of gypsy moth introduction.The selected predictors represent different pathways of potential movement for the pest, including proximity to infested areas, directional traffic volume on transportation networks, and household movement.Other predictors include potential point source locations (sawmills, rest stops, campgrounds, etc.) as well as urban predictors (population density, traffic volume, and household income).All predictors (Table 1) were standardized to our raster template in the Alber's Equidistant projection.Included predictors were identical across model iterations, aside from updating predictors with more recent data.The methods for model regionalization evolved across model iterations (see Supplementary S2).
It has long been recognized that egg masses and other gypsy moth lifestages may hitchhike on household items moving from the generally infested area within the federal quarantine area to other parts of the U.S. [80,[89][90][91].We obtained the number of total address forwarding movements originating from postal zip codes within the federal quarantine area to destination census tracts from the U.S. Postal Service.An address forwarding record is generated when an individual files a change of address form with the U.S. Postal Service when they move their household.Predictor maps of address forwarding data were represented as total counts of movements from the quarantine zone to destination areas.For the 2015 model, this included data between January 2012 and December 2014.Subsequent model iterations added more years of address forwarding data.These aggregate movements served as a proximate measure of propagule pressure from source populations in the northeast to new destinations in the west.

Model Development
In the 2014 model, we regionalized the study area (non-quarantine areas of the U.S.) based on program management zones: Slow the Spread [92] versus the APHIS survey area (Figure 1), so that we had two regional models: A short range model and a long range model.Beginning with the 2015 model, we regionalized the study area based on changes in the pathway's importance across space (Supplementary S2).We created regional divisions for model development, including a Maine region (remaining, non-quarantine area), short-range region (0-200 km from the spread front), intermediate region (201-500 km from the spread front), and long-range region (501+ km).We used the Vistrails Software for Assisted Habitat Modeling (SAHM) v3.3.1 module [93] to develop the distribution models for each region (model template available in Supplementary S4).The first model developed for 2014 included five statistical algorithms common in species distribution models: MaxEnt [94,95], Multivariate Adaptive Regression Splines (MARS) [96,97], Generalized Linear Models (GLM) [98], Boosted Regression Trees (BRT) [44,99], and Random Forest (RF) [100,101].We found that model performance (in terms of AUC), spatial continuity across regional models, and reduced model complexity [102] was best expressed using the MARS algorithm, so we limited future model iterations to MARS only.The MARS model is also robust to moderate levels of collinearity in model predictors [103].
For each regional model (Supplementary S2: Figure S1), we started with all predictors listed in Table 1.We evaluated the set of candidate predictors for collinearity (Pearson or Spearman correlation coefficient ≥0.70) [103].We dropped predictor variables that explained less variability in a univariate model than the identified collinear variable(s), were not retained in the MARS backward pruning process, or had nonsensical response curves (such as increasing risk with distance from an introduction source location).For example, we dropped "Distance from STS" as a predictor in the long-range regional model because detections along the Pacific coast resulted in a "bump" near the tail of the response curve.We determined that this was not a realistic response of propagule pressure from the spread front and would be explained by other predictors, so we dropped the predictor.The remaining predictors were used as input to the models.We repeated this predictor selection process for every regional model and annual iteration.Each regional model's variability was assessed with a random 10-fold cross validation.
After initial model fitting, we optimized the model fit via sensitivity analysis of the MARS model complexity parameters: Degree (number of interaction terms) and penalty (cost per freedom of degree).Final models were optimized by selecting parameter settings that maximized the area under the curve (AUC) and percent correctly classified (PCC) with the lowest MARS degree setting.We chose this approach to balance the model accuracy with generalizability across space.Once the models were optimized, we applied the fitted short range model to current year source populations of gypsy moth in order to predict the spread risk for the next survey year.We combined regional models by taking the maximum value from each regional raster, with the exception of the Maine model (other model values were ignored for that region).We assumed that the maximum risk for each cell was defined by the pathways defining that risk.A simple re-visualization of the overlay process confirmed that individual cell values tended to derive from the appropriate regional model.The final model's performance was assessed by combining the training data sets for all regional models, intersecting them with the final risk map, and running the data through the PresenceAbsence package (v1.1.9)in R (v3.4.3) [104] to generate an overall receiver operating curve.For each regional model, we assessed predictor importance by permutating predictor values between presence and absence data while holding other predictors constant and calculating the resulting change in AUC values.
Prior to survey application, we applied establishment masks (Supplementary S3) to limit introduction risk to areas where gypsy moth would be more likely establish.The final risk surface (masked by host availability, climate suitability, and outside the survey exclusion area) was used as an inclusion probability surface for a survey design tool to allocate spatial locations of next year survey traps.

Survey Design
The Forest Service Forest Health Assessment and Applied Sciences Team (USFS FHAAST) developed an Invasive Species Sample Design Tool for APHIS [105], using the "Create Spatially Balanced Points" tool in ArcGIS (versions 10.0 and higher).This was designed as a custom ArcGIS toolbox that requires information on the sample area, the probability surface, the number of sample locations to be allocated, and any exclusion areas.Survey locations are stochastically distributed within the sample area (minus any exclusion areas) in a spatially balanced design [106,107], reducing spatial autocorrelation of samples and maximizing information on survey detection within the landscape.The implementation of the sample design tool was optional with state-level operations, but was used by APHIS-PPQ's national program as one of many criteria to evaluate how national survey funding allocation could be distributed to states according to risk.

Model Validation
Gypsy moth detection data on gypsy moth is typically collected from all states at the end of each survey season (approximately October-December), although some states may be late or incomplete in reporting catch data in time for model validation/ next year development.Each survey year's data were intersected with the predicted risk model developed the prior year (starting with the 2014 model) before application of establishment masks.We evaluated the model performance on continuous risk values because the survey design tool uses a continuous risk surface to allocate sampling effort.Additionally, the gypsy moth program is interested in early detection of this species; therefore, correct prediction of positive survey detections was of primary interest.We generated receiver operating curves as well as boxplots of the predicted risk at positive detection locations by survey year.We compared this information to evaluate the overall model performance over continuous thresholds, and to evaluate the precision and accuracy of model predictions for positive detections over time.New detection data were integrated to the survey detection database and implemented in building the next year's risk model.Each model development year, we delivered a presentation on the model performance and the newest iteration of model development to the stakeholder community (state-level program managers and pest surveyors) in the spring before survey season.

Spatial Targeting
The spatial distribution of likelihood for gypsy moth detection varied the most between the 2014 and 2015 iterations (Figure 2), due to a change in the methodology for the model regionalization.The 2014 risk model predicted abrupt transitions in risk from the quarantine area to the APHIS survey area, and tight clusters of risk within urban areas.The application of a pathways-driven regionalization approach for the 2015 and later models resulted in a smoothing effect to the spatial distribution of risk between the invasion front and urban areas.Smoothing of the geographic risk also appeared to be a function of allowing predictor retention to be a function of the MARS backward pruning algorithm, rather than user-specified variable retention.
Another change between the 2014 versus the 2015 and later models was the addition of a Maine-only regional model.The spread dynamics for this spatial region were significantly different from the spread dynamics along the western edge (the STS program area).We can see from Figure 2 that risk is annually dynamic in Maine, compared to the slower changing dynamics near the STS program area.The relative differences in spread variability between Maine and the area near the STS zone may be due to northern weather variability affecting population establishment success and the lack of a population suppression program.This region in Maine lies near the edge of climate suitability for gypsy moth population establishment [77], so the effect of annual weather variability may be diminishing the predictor importance of the local spread kernel relative to the STS spread front (Supplementary S2).This dynamic is occurring at similar latitudes as affected provinces in Canada [108], and climate change may expand this pest's suitable range northward [109].Gypsy moth egg mass overwintering survival may be enhanced near the Great Lakes by lake effect snow providing thermal insulation [110].
Forests 2018, 9, x FOR PEER REVIEW 3 of 21 diminishing the predictor importance of the local spread kernel relative to the STS spread front (Supplementary S2).This dynamic is occurring at similar latitudes as affected provinces in Canada [108], and climate change may expand this pest's suitable range northward [109].Gypsy moth egg mass overwintering survival may be enhanced near the Great Lakes by lake effect snow providing thermal insulation [110].Nationally, the highest density of detections occurred in areas where the spread kernel interacted with urban areas near the spread front.While the majority of gypsy moth detections occurred within the Slow the Spread and short-range geographic areas, there were a large number of detections that occurred in the intermediate and long-range regions in 2015 (n = 162).That year, the risk model correctly predicted the spatial distribution of a population outbreak in the Pacific Northwest (Figure 3).Washington and Oregon experienced incursions in various locales, triggering eradication programs and follow-up surveys in subsequent years.Nationally, the highest density of detections occurred in areas where the spread kernel interacted with urban areas near the spread front.While the majority of gypsy moth detections occurred within the Slow the Spread and short-range geographic areas, there were a large number of detections that occurred in the intermediate and long-range regions in 2015 (n = 162).That year, the risk model correctly predicted the spatial distribution of a population outbreak in the Pacific Northwest (Figure 3).Washington and Oregon experienced incursions in various locales, triggering eradication programs and follow-up surveys in subsequent years.Predictor importance varied with each regional model, which aligns with the purpose for model regionalization (Supplementary S2).Distance from a population source was the most important in the short-range model, contributing 81% of the relative importance compared to 16% contributed by address forwarding in the 2018 model.Anthropogenic movement played a larger role in the intermediate model, such that address forwarding and distance from the STS action area were the most important predictors and had a similar influence (36% and 39% for the 2018 model, respectively).Anthropogenic variables were the most important predictors in the long-range model, with population density, traffic volume, and median household income as the top three predictors.These patterns were fairly consistent across the annual model iterations.The Maine model had the most annual variation, with both the spread kernel and anthropogenic factors (e.g., distance from sawmills, distance from rest stops, and distance from wood pallet manufacturers) being important.

Temporal Targeting
Our iterative approach to model development increased the overall predictive performance of the risk model over time (Figure 4).The first model iteration had poor performance, predicting worse than random for sensitivity thresholds less than 0.5.This was due to the discontinuity of modeled risk along the spread front (Figure 2, 2014 model), which under-predicted the detection likelihood near the spread front.Changing the methodology for regionalizing models resulted in a large gain in model performance for 2015, while the 2016 iteration showed improvements primarily in low risk locations.There were also gains within the regional models when we evaluated the predicted risk using the next year's survey's positive gypsy moth detections.Predicted risk increased in accuracy (higher mean), increased in precision (smaller interquartile range), or both for all regional models even though the regionalization methodology remained identical between the 2015 and 2016 models (Figure 5).We omitted a boxplot for the Maine-only geographic model because it had insufficient information for a temporal comparison.Maine had a static risk value of 0.51 in the 2014 model (see Supplementary S2), and the state did not report survey data for validation of the 2016 model.Predictor importance varied with each regional model, which aligns with the purpose for model regionalization (Supplementary S2).Distance from a population source was the most important in the short-range model, contributing 81% of the relative importance compared to 16% contributed by address forwarding in the 2018 model.Anthropogenic movement played a larger role in the intermediate model, such that address forwarding and distance from the STS action area were the most important predictors and had a similar influence (36% and 39% for the 2018 model, respectively).Anthropogenic variables were the most important predictors in the long-range model, with population density, traffic volume, and median household income as the top three predictors.These patterns were fairly consistent across the annual model iterations.The Maine model had the most annual variation, with both the spread kernel and anthropogenic factors (e.g., distance from sawmills, distance from rest stops, and distance from wood pallet manufacturers) being important.

Temporal Targeting
Our iterative approach to model development increased the overall predictive performance of the risk model over time (Figure 4).The first model iteration had poor performance, predicting worse than random for sensitivity thresholds less than 0.5.This was due to the discontinuity of modeled risk along the spread front (Figure 2, 2014 model), which under-predicted the detection likelihood near the spread front.Changing the methodology for regionalizing models resulted in a large gain in model performance for 2015, while the 2016 iteration showed improvements primarily in low risk locations.There were also gains within the regional models when we evaluated the predicted risk using the next year's survey's positive gypsy moth detections.Predicted risk increased in accuracy (higher mean), increased in precision (smaller interquartile range), or both for all regional models even though the regionalization methodology remained identical between the 2015 and 2016 models (Figure 5).We omitted a boxplot for the Maine-only geographic model because it had insufficient information for a temporal comparison.Maine had a static risk value of 0.51 in the 2014 model (see Supplementary S2), and the state did not report survey data for validation of the 2016 model.

Pathways Predictor Performance
Address forwarding data (cumulative number of people moving from within the quarantine area to a destination census tract) was used as a proxy for propagule pressure.Our assumption that more household movements resulted in more gypsy moth egg mass introductions appears to be supported by regional model outputs.In the intermediate geographic model, predictor strength for address forwarding data more than quadrupled between the 2015 and the 2018 model iterations, and was the highest ranked predictor of gypsy moth occurrence in the intermediate region in the most recent model iteration (Table 2).This increase in predictor strength was concurrent with the increase in available USPS data to inform the predictor.As our detection dataset is historical and the result of several years of detection data, we found increased predictor performance when the predictor also accumulated more data over time.While we did use address forwarding data in the 2014 model, it

Pathways Predictor Performance
Address forwarding data (cumulative number of people moving from within the quarantine area to a destination census tract) was used as a proxy for propagule pressure.Our assumption that more household movements resulted in more gypsy moth egg mass introductions appears to be supported by regional model outputs.In the intermediate geographic model, predictor strength for address forwarding data more than quadrupled between the 2015 and the 2018 model iterations, and was the highest ranked predictor of gypsy moth occurrence in the intermediate region in the most recent model iteration (Table 2).This increase in predictor strength was concurrent with the increase in available USPS data to inform the predictor.As our detection dataset is historical and the result of several years of detection data, we found increased predictor performance when the predictor also accumulated more data over time.While we did use address forwarding data in the 2014 model, it

Pathways Predictor Performance
Address forwarding data (cumulative number of people moving from within the quarantine area to a destination census tract) was used as a proxy for propagule pressure.Our assumption that more household movements resulted in more gypsy moth egg mass introductions appears to be supported by regional model outputs.In the intermediate geographic model, predictor strength for address forwarding data more than quadrupled between the 2015 and the 2018 model iterations, and was the highest ranked predictor of gypsy moth occurrence in the intermediate region in the most recent model iteration (Table 2).This increase in predictor strength was concurrent with the increase in available USPS data to inform the predictor.As our detection dataset is historical and the result of several years of detection data, we found increased predictor performance when the predictor also accumulated more data over time.While we did use address forwarding data in the 2014 model, it was summarized to the zip code level.Due to the historic nature of that model, we no longer had access to the variable importance of address forwarding for the 2014 model, so it is omitted from the comparative analysis in Table 2.

Discussion
The inclusion of origin-destination pathway predictors in invasive species distribution models brings significant advantages to targeting the early detection of invasive species.Inclusion of pathways were useful in predicting long-range, human-mediated dispersal of mussels [25,111] and in intermediate-range, human-mediated dispersal with campground reservations for gypsy moth [69].The increasing variable importance of address forwarding, a proximate predictor for long-distance dispersal of gypsy moth egg masses, with a concurrent increase in model performance and precision over time, was similar to the Leung et al.'s [25] recreational boat movement approximating human-mediated long distance dispersal of mussels.While a change in methodology between 2014 and 2015 accounts for some performance increase, it does not explain the continued increasing trend in subsequent model iterations with stable methodology.The increase in model performance may be due to the accumulation of the address forwarding data as an approximation of propagule pressure, rather than the addition of new detection data.While the risk models were available for the states to target their surveillance, it was optional.There was not an explicit feedback loop between survey design and the risk model, which may have increased the informatic value of new data being collected.Also, given that the gypsy moth program is historically rich in data and collects more than 100,000 new data samples each year, novel detections (new detections in low risk areas) to inform the risk model are rare.
Some research has suggested that predictions of invasion dynamics should be hierarchical, with data gathered from multiple spatial scales [112].Specific to invasive species distribution models, however, inclusion of a dispersal kernel, which focuses on short range dispersal, to limit over-prediction of risk has been encouraged [3,26].However, for targeting early detection of invasive species with a human-assisted spread pathway, this suggestion may be too conservative.Dispersal kernels focus detection effort along the spread front, where prevalence is high and fewer samples are required to detect the species.This method fails for early detection in uninfested areas far from the spread front, as illustrated by the gypsy moth 2015 outbreak in the Pacific Northwest.Our regionalized approach addresses the afore-mentioned biological phenomenon of stratified dispersal for gypsy moth.Our approach recognizes the different mechanisms of dispersal by incorporating a dispersal kernel for larval ballooning and other short-range mechanisms, while also addressing human-assisted pathways to explore long-distance dispersal events important for early detection activities overseen by APHIS.
The Slow the Spread program targets a 100-km "transition zone" to suppress gypsy moth spread, which was determined to be the optimal distance for reducing the spread rate to a target rate of 9 km per year [74].Our short range model included an additional 100 km beyond the transition zone, and it suggested that both areas had the same spread mechanisms.This result concurs with prior analysis of new colony formation occurring as far as 250 km from the spread front [63].While a spread kernel (partially driven by ballooning and wind-driven transport) was the most important predictor, anthropogenic predictors were also important in determining the detection likelihood within that region.The single largest difference between the short and intermediate range regional models was the change in primary predictor from a spread kernel (distance from prior year detections) to the address forwards.This changeover in pathway importance is demonstrated in the prediction surface as the transition of detection likelihood from the spread front to urban area hot spots.
We expected that the long-range model would show an increasing importance of address forwarding as a predictor of gypsy moth detection, as it does in the intermediate model.However, predictors, such as household income, population density, and traffic volume, exhibited consistently higher predictor performance than address forwarding.Areas of previous infestation in the long range model area generally occur in high-density urban areas, where a higher level of household income may be required for the higher cost of living.Urban areas also have smaller census tracts than rural regions (such as those that dominate the intermediate range model area), and gypsy moth detections frequently appear in very close proximity to high move-in areas without actually occurring in one.This phenomenon adds noise to the model and likely explains the poor predictor performance of address forwards in the long range model.
The need for an iterative approach to both sampling and distribution modeling for invasive species has been acknowledged for many years [47,113].Tests of the iterative sampling and modeling framework revealed improvements in secondary models based on new information collected as a result of initial models [53].Here, we present an operationalized iterative modeling framework to support adaptive management of invasive species.As with previous work, our example illustrates increased model performance over time with the addition of new information (both detections and predictors with accumulated information such as address forwarding).
Our model serves as a rapid investigative technique to test hypotheses regarding gypsy moth spread pathways and to inform targeted surveillance.The iterative process allowed us to investigate model prediction failures and test improvements for future model iterations.For example, stakeholders and program management have provided feedback that there is too much risk area in Texas, which is supported by the lack of detections in the region.These two lines of evidence indicate the model is overpredicting in this region, leading us to hypothesize that biological limitations, such as supraoptimal temperatures [114], high winter temperatures precluding a required diapause development stage [115], or dessication [116], may be limiting life stage development in this area.We also detected several false negatives in the intermediate range regional model, which may be a result of missing pathway predictors, such as firewood movement [117][118][119] and recreational activity [16,120] in non-urban landscapes.The lack of origin-destination data sources to estimate these pathways likely results in underestimation of risk in this region.These examples highlight the importance of incorporating expert knowledge [121] and proximate predictors [122] into risk models, ensuring that products are appropriately targeted to the management need.
These operationalized iterative models of an invasive forest pest support management activities at a national scale.Our regionalized approach to model development supports previously identified policy and management objectives to invasive species management at different stages of invasion in different geographic regions [40].Efforts to prevent the spread of invasive species by targeting pathways are less costly than even early detection efforts [123].Our models can help target public outreach campaigns to prevent human-assisted movement of gypsy moth through our identification of the importance of these pathways.For example, APHIS partnered with the American Moving and Storage Association in the "Remove Before You Move" outreach campaign [124] to educate the public on how to check their household articles for egg masses before moving [125].Our models also support the next step in the invasion process by targeting early detection efforts to areas at high risk for population establishment.We interact with the Slow the Spread program targeted at the spread front, allowing for continuous surveillance effort across management areas.Thus, our framework facilitates efforts across stages of the invasion process and the stages' associated management options.

Conclusions
Our results suggest that more effort in the collection and application of human-related, origin-destination datasets that can serve as proximate predictors for invasive species movement is warranted.The application of pathways data in invasive species distribution models should be carefully inspected for geographic variation, and possibly regionalized to better target the variability of pathways of invasion for early detection.Implementation of an iterative modeling approach provides opportunity to improve model predictions over time, understand mechanisms of spread, and enhance targeted management actions.We demonstrate that species distribution models can be effective in an operational context for early detection of invasive species if they include pathways of spread and accommodate variation in space and time.

Figure 1 .
Figure 1.The APHIS-PPQ gypsy moth survey program area, defined by the region outside the generally infested area (federal quarantine area) and the spread front (managed by the Slow the Spread program).

Figure 1 .
Figure 1.The APHIS-PPQ gypsy moth survey program area, defined by the region outside the generally infested area (federal quarantine area) and the spread front (managed by the Slow the Spread program).

Forests 2019, 10 , 108 5 of 21 a
rolling average of the five most recent survey years (to capture more recent spread front dynamics).

Figure 2 .
Figure 2. Annual risk models depicting the likelihood of gypsy moth detection.Area in grey is the federal quarantine area and the active spread front.Areas in black are climatically unsuitable for establishment.

Figure 2 .
Figure 2. Annual risk models depicting the likelihood of gypsy moth detection.Area in grey is the federal quarantine area and the active spread front.Areas in black are climatically unsuitable for establishment.

Figure 3 .
Figure 3. Validation of the 2015 gypsy moth risk model with survey detection data.The 2015 survey year coincided with population outbreaks of introduced gypsy moth in Washington and Oregon.

Figure 3 .
Figure 3. Validation of the 2015 gypsy moth risk model with survey detection data.The 2015 survey year coincided with population outbreaks of introduced gypsy moth in Washington and Oregon.

Figure 4 .
Figure 4. Receiver operating curve validating annual gypsy moth risk models with next year survey data, labelled by year with area under the curve (AUC).

Figure 5 .
Figure 5. Boxplot series of the distribution of predicted risk values for positive detection locations by year and geographic region: (A) Short-range geographic model (0-200 km), (B) intermediate-range geographic model (201-500 km), and (C) long-range geographic model (>500 km).

Figure 4 .
Figure 4. Receiver operating curve validating annual gypsy moth risk models with next year survey data, labelled by year with area under the curve (AUC).

Figure 4 .
Figure 4. Receiver operating curve validating annual gypsy moth risk models with next year survey data, labelled by year with area under the curve (AUC).

Figure 5 .
Figure 5. Boxplot series of the distribution of predicted risk values for positive detection locations by year and geographic region: (A) Short-range geographic model (0-200 km), (B) intermediate-range geographic model (201-500 km), and (C) long-range geographic model (>500 km).

Figure 5 .
Figure 5. Boxplot series of the distribution of predicted risk values for positive detection locations by year and geographic region: (A) Short-range geographic model (0-200 km), (B) intermediate-range geographic model (201-500 km), and (C) long-range geographic model (>500 km).

Table 1 .
Summary descriptions of the potential predictors for the 2015 model (some of which are updated in subsequent years), indicated by type of pathway: Biologic 1 or anthropogenic 2 dispersal.
Abbreviations: STS: Slow the Spread; APHIS: Animal and Plant Health Inspection Service; APHIS PPQ: Animal and Plant Health Inspection Service, Plant Protection and Quarantine; FHAAST: Forest Health Assessment and Applied Sciences Team; TIGER: Topologically Integrated Geographic Encoding and Referencing; NTAD: National Transportation Atlas Database; POI: points of interest.

Table 2 .
Average relative importance for the address forwarding predictor explaining gypsy moth occurrence increase as more data was accumulated to develop the predictor layer.