Estimating Invasion Dynamics with Geopolitical Unit-Level Records: The Optimal Method Depends on Irregularity and Stochasticity of Spread

Biological invasions are an ongoing threat for sustainability of ecosystems, and estimating the spread of invasive species is critical for making management decisions. Geopolitical unit-level data (GULD) are often used to estimate invasions due to their wide availability, and researchers had evaluated the abilities of multiple methods to estimate invasion with GULD. However, earlier studies were case based and only addressed limited information on the spread, thus making it inadequate to determine which method to choose to estimate invasions with GULD under various spread scenarios. Here, we conducted a simulation study to (1) evaluate performances of eight methods on estimating expansion patterns, spread rates, and spread dynamics of invasive species with GULD; (2) assess the impact of size and homogeneity of size of geopolitical unit on the estimations by studied methods; (3) evaluate the similarities of all studied methods. Additionally, we presented a concave hull boundary displacement method (Ctd_BD) and an area-based regression method (SqrtNA_R) to estimate spread with GULD. Three regions with varying sizes of counties in the United States (U.S.) were selected to conduct the simulations, and three spread scenarios and three expansion patterns were simulated. AIC, and R2 and root mean square error (RMSE) were used to evaluate the accuracy of methods on estimating expansion pattern, and overall spread rate and spread dynamics, respectively. Correlation coefficient and RMSE were used to assess the similarity of eight methods. We found Ctd_BD and area-based regression methods consistently estimated the right expansion patterns. Boundary displacement and area-based regression methods estimated highly correlated spread rates and dynamics. Distance-based regression methods provided a high accuracy on estimating overall spread rate without long-distance jump dispersal but performed poorly on estimating the spread dynamics. We recommend boundary displacement method, especially Ctd_BD, for estimating spread with GULD, whereas for spread without clear infestation boundaries, distance-based regression can be used to estimate overall spread rate and area-based regression can be used to estimate spread dynamics.


Introduction
As a major component of global change, biological invasions continue to pose significant threats on ecosystems sustainability, decrease biodiversity, and cause substantial economic losses [1,2].
Sustainability 2020, 12, 8526; doi:10.3390/su12208526 www.mdpi.com/journal/sustainability Management of invasive species is the key to minimize their economic and environmental impacts.
Estimating the spread of invasive species can provide valuable information on when and where a further invasion may occur, thus guide the early management of the invasive species to increase control efficacy and promote environmental sustainability [1,3,4]. The spread of an invasive species may show an increasing, decreasing or stable rate, which can be affected by the internal dispersal ability of the species, environmental factors, and human activities [3,5,6]. Estimating spread dynamics of the invasive species can shed light on the trend of spread rate over the invasion period, and such information can effectively facilitate decision on early management and analysis on factors impacting the spread [5,6]. Estimating spread of invasive species has been conducted at local to global spatial scales [5][6][7][8]. Data collected through field sampling are frequently used in studies conducted at local scales [5,9], whereas research on regional or larger scales utilize records collected from online databases, published research, surveys, or field sampling [8,10]. The collected data for large-scale research are often aggregated to geopolitical unit-level records (i.e., city, county, parish, state, etc.) to unify their spatial scales [7,11,12]. Additionally, quarantines initiated by governments or institutions are usually conducted and reported at the geopolitical unit-level [13][14][15]. Consequently, geopolitical unit-level data (GULD), i.e., presence or absence at a geopolitical unit level, is usually the most abundant information for invasive species. During the past decades, several researchers worldwide used multiple methods to estimate spread of invasive species at various spatial scales with GULD [7,11].
Several researchers have compared the accuracy of these methods and validated the use of GULD to estimate invasion rates [14,15]. However, earlier research focused on specific species, thus the generality of their results on performances of these methods to estimate spread with different dispersal traits remain unknown [15]. Additionally, these studies only compared the overall spread rate, i.e., a spread rate for the whole invasion period, and lacked at evaluations of methods on estimating spread dynamics, i.e., the spread rate for different invasion periods. Spread of invasive species, especially at large scales, is commonly complex due to spatial heterogeneity and stochastic events, such as uncommon weather conditions or long-distance jump dispersal [8,16]. Estimating the spread with GULD further increases the complexity, as the spatial resolution is coarse and there can be large variations in the sizes of geopolitical units. Thus, compared to estimating one single overall spread rate, providing spread dynamics across different time scales are more informative for understanding the invasion process [16]. Additionally, earlier research did not examine the ability of different methods on estimating expansion patterns, i.e., the change pattern of spread rate through the invasion period. Estimating these expansion patterns of invasive species can provide insights on the temporal dynamics of the invasion rate and facilitate the prediction of future spread.
Despite the widely use of GULD to estimate spread of invasive species, studies evaluating the impact of geopolitical unit size on the estimated spread rates remain absent. Dramatic variations in geopolitical unit size are common especially for regional and larger spatial scales, which might lead to inaccurate assessments of the estimated invasion rate. Additionally, an evaluation of the similarity patterns of the commonly used methods on estimating overall spread rate and spread dynamics is rare [15,17,18]. A variety of measurements (such as number of infested county, total infested area, and spread distance) had been used by different estimation methods to estimate the invasions [15,17,18], thus the estimated spread rates may vary depending on the measurement used. However, the spatial or temporal change pattern of spread rate revealed by different methods could be essentially similar [3]. For example, spread rates estimated by using number of infested county and total infested area would vary largely; however, they might both indicate that the spread rate had a consistently increasing pattern over the invasion periods. This similarity of change pattern of spread rate by different methods can lead to similar analysis result when analyzing critical factors that impact the invasions. This research aims to address these research gaps.
Gilbert and Liebhold [19] conducted a simulation research to evaluate the accuracy of two regression methods and a boundary displacement method on estimating the overall invasion rate in response to different spatial sampling configurations and sampling density. Tisseuil et al. [18] conducted a simulation study to evaluate the accuracy of a regression method and a boundary displacement method on estimating the overall invasion rate with field sampling data under different sampling strategies, densities, and spatial interpolation approaches. Both research provided valuable guidance on estimating the overall spread rate of invasive species with field sampling data [18,19]. However, simulation of spread of invasive species with GULD is rare [11][12][13][14][15][17][18][19]. To fill the research gap of a systematic evaluation of different methods on estimating invasions with GULD, we used a simulated data approach to evaluate the ability of multiple methods on estimating invasion with GULD.
Specifically, we used simulated data to evaluate (1) the accuracy of eight methods to estimate expansion pattern with GULD, (2) the accuracy of eight methods on estimating overall rate and spread dynamic of invasive species with GULD, (3) the impact of the geopolitical unit size and its variation on each method, and (4) the similarity of different methods on estimating spread. Earlier simulation research provided valuable insight on simulating multiple spatial configurations of invasion with consideration of different levels of complexity and stochasticity [18,19]. Similar with simulation by Tisseuil et al. [18], we simulated a simple symmetric spread scenario and a more complicated spread scenario with spatial anisotropy. Although long-distance jump dispersal is random and rare in the spread of invasive species, research showed long-distance jump dispersal can be more influential than local dispersal for some species [6]. Thus, we also simulated a third spread scenario with long distance jump dispersal. Additionally, we formulated an alternative boundary displacement method and an area-based regression method for estimating spread with GULD.

Common Methods to Estimate Spread with Geopolitical Unit
The most commonly used methods to estimate spread are regression and boundary displacement methods, which can be used with all kinds of invasion records [15,19]. Note that this research aimed to evaluate commonly used methods of estimating spread with GULD; however, alternative approaches are also available for estimating spread with GULD and more detailed invasion record [17,20]. For example, Goldstein et al. [17] used a Gaussian process model to estimate a gradient surface of the invasion, and their method can estimate the rate and direction of the spread as well as detecting potential jump dispersal sites. Pio et al. [20] used a trend surface analysis and spatial error simultaneous autoregressive model to estimate spread rate of an infectious disease.

Regression Methods
The general idea of regression methods is to regress the measurement of spread (e.g., the cumulative area or the distance to the invasion origin) against time, beginning when the infestation is first observed. Rather than estimating the spread rate for every temporal unit, e.g., each year, the regression methods output a spread rate for the entire studied period.
Spread Distance-With GULD, one way to calculate spread distance is to derive the minimum distance between the spread origin and the polygon of each geopolitical unit [14,15]. Distances between the spread origin and centroids of infested geopolitical units also have been used [3,7].
Square Root Area-This method assumes an invasive species spread by approximately concentric circles, for which the total spread distance (D) from invasion origin can be estimated as D = √ A/ √ π, where A represents the cumulative area of the infested regions [21][22][23]. For GULD, the square root of the cumulative area of all infested geopolitical units is used as the measurement [15].
Number of Infested Geopolitical Unit-Directly regressing the cumulative number of infested geopolitical units, n, on invasion times has been used in previous studies [6,8,13]. However, we argue that the square root of the cumulative number of infested units, √ n, should be used instead [24]. If the total number of infested geopolitical units is n and the mean county size is A, then the total infested area could be estimated as A = nA. As mentioned above, the spread distance from invasion origin can be estimated as D = √ A/ √ π for concentric spread, thus by replacing A with nA we get D = nA/ √ π. Therefore, D is linearly associated with √ n. The spread rates estimated from √ n should be linearly correlated with the ones by spread distance regression and square root area regression. Additionally, the measurement nA can be an alternative to the √ n method and can be used as an area-based regression method.

Boundary Displacement Method and Minimum Spread Distance Method
Boundary Displacement-This method estimates spread rates as the distance between two consecutive infestation boundaries, and can directly estimate the spread dynamics during different invasion periods. The outer boundary of geopolitical units infested within the same period can be used as the infestation boundary [5,25]. This method may require slight changes to the infestation boundaries to avoid folds, islands, and gaps on the boundary [5]. In addition to use county boundary to delineate infestation boundary, we also designed a concave hull method to derive the infestation boundaries by using polylines that connect centroids of all the outermost newly infested geopolitical units as the boundary. This method avoids the need to manually modify infestation boundaries, and the infestation boundaries can be derived automatically using programs such as R (see Supplementary Material 1). Additionally, Tobin et al. [15] used a spatial grid approach to delineate boundary with GULD. Although their method is not included in our analysis, we encourage readers to try their method, as the spatial grid approach also avoids the need of manually changing boundaries.
Minimum Spread Distance-This method takes the minimum distance between a newly infested geopolitical unit and all units infested in earlier periods as the distance that a species has to spread to invade the new unit [11,26]. The mean of all minimum distances of geopolitical units infested in the same period is taken as the spread rate in that period. Similar to boundary displacement methods, MSD also directly estimates temporal spread dynamics.

Spatial Area of Simulated Spread
The flowchart of the simulation research is detailed in Figure 1. All the simulations and results analysis were conducted with R program [27], and 'rgdal' was used to facilitate the spatial simulation and 'Metrics' was used to calculate the root mean square error (RMSE) [28,29]. To represent the real-world variety of geopolitical units we used the counties in the U.S. to conduct the simulation research ( Figure 2a). The county size in the eastern U.S. (e.g., Tennessee) is consistently smaller than a majority of regions in the western U.S. (e.g., Nevada), whereas the county size in the Midwest and Southwest of the U.S. has more variations than the eastern U.S. This pattern of geopolitical unit size being more homogenous within one region than other regions is representative of many countries (e.g., Canada, China, and Mexico). To evaluate the performance of eight methods in response to different geopolitical unit sizes and homogeneity in sizes, we conducted the same simulations independently in three regions-Regions 1, 2, and 3 ( Figure 2a). The county size and its coefficient of variation (CV) in three regions are listed in Table 1. In each region, we set a spread origin, from which the spread was simulated. Sustainability 2020, 12, x FOR PEER REVIEW 5 of 16      LGF is abbreviated for Logistic growth function; S1, S2, and S3 are short for symmetric spread scenario, asymmetric spread scenario, and long-distance jump dispersal scenario, respectively; R1, R2, and R3 are short for Regions 1, 2, and 3, respectively.

Three Expansion Types
To evaluate abilities of eight methods for estimating expansion patterns of invasive species, we simulated three expansion types as summarized by Shigesada et al. [23]: (1) linear expansion, (2) biphasic expansion resulting from two linear-spread phases, and (3) logistic curve expansion. These three expansion patterns were commonly observed in research focused on invading organisms [12,27]. We simulated these three expansion types for each spread scenario in all regions.

Three Spread Scenarios
We simulated three spread scenarios to evaluate abilities of different methods on estimating spread with different types of stochasticities and irregularities: (1) a symmetric spread (S1) (Figure 2a,b), (2) an asymmetric spread (S2) (Figure 2c,d), and (3) a spread scenario with long-distance jump dispersal (S3) (Figure 2e,f). The details on simulating three spread scenarios were described as follows.
Simulation of Symmetric Spread-For S1, the linear expansion pattern had a constant spread rate for all periods. We set this rate to 20 km/year, as it approximates the mean spread rate of invasive species based on multiple research [6,11]. For biphasic expansion the simulated rate was set to 20 km/year for the first 12 invasion years and 30 km/year for the following 12 years; thus, the mean spread rate is 25 km/year. The logistic curve expansion followed a logistic growth function y = 826/ 1 + e −0.43 * (x−10) , where y and x represent the total spread distance and spread time, respectively. The parameters of the logistic growth were carefully set to ensure that (1) the overall spread rate of logistic curve expansion is not smaller than linear expansion, and (2) the overall spread rate is not so large that there is enough terrestrial land for conducting the spread simulations in each region.
Simulation of Asymmetric Spread-Simulations of the three expansion types for S2 were similar with S1, except that the rates varied among different directions. The change pattern of spread rate along each direction kept consistent with the corresponded expansion pattern (Figure 2c,d). The simulated rate in all directions varied between 10-24 km/year for linear expansion, and 12-31 km/year for biphasic and logistic curve expansions.
Simulation of Long-Distance Jump Dispersal-We added two jump dispersal events in S3 with one occurring in the nineth year and another occurring in the 18th year (Figure 2e,f). To make the S3 in three regions comparable, the distances among the two jump points and the spread origin were set the same for all regions (Figure 2e,f). The jump point served as a new spread origin from which further spreads occurred in all directions, and we set this further spread rate to 20 km/year for all jump points and expansion types for clarity and simplicity.

Simulation over Landscape and Conversion to Geopolitical Unit-Level Spread
To simulate the spread over different invasion periods, we first determined the cumulative invaded regions at different periods based on the spread rates. We then derived the newly invaded regions for a given period as the difference of the cumulative invaded region between current and previous invasion period. As shown in Figure 2a,c,e, the original simulated spreads had perfect geometric shapes over the landscape as no stochasticity (except long-distance jump dispersal for S3) was added before converting to GULD. To convert the simulated spread (shown in Figure 2a,c,e) to GULD (shown in Figure 2b,d,f), we selected counties that were infested in the same periods (i.e., every three years). A county was defined as first infested only if more than 10% of its area was covered by the simulated spread zone. To reflect real-world stochasticity of species invasion, we randomly removed 5% of the counties first infested in each period as non-infested counties (Figure 2b,d,f).

Estimating Overall Spread Rate and Spread Dynamics
Five regression methods, two boundary displacement methods and an MSD were included in this research to estimate overall spread rate and spread dynamics ( Table 2). For boundary displacement methods, mean distance between two consecutive boundaries were measured as the mean length of all transects radiating from the spread origin at 2 • [3,25]. Ctd_BD, Cty_BD, and MSD directly estimated spread dynamics, and the overall rate was calculated as the mean spread rate across all invasion periods. For regression method spread dynamic for a given period were estimated as the difference of measurements between current and previous period. The overall rate for regression methods were estimated as the slope of a linear model for linear expansion and mean of two slopes of a segmented linear model with break point at the 12th year for Biphasic. For logistic curve expansion, instead of using derivatives of the logistic growth function, we estimated the spread rate as follows for practical purpose: Overall rate = cumulative spread distance/time since initial detection (1)

Ability of All Methods to Estimate Expansion Types
The cumulative values of spread measure for each period were used to fit regression models against spread time to determine the expansion type. The spread measures for regression methods were already cumulative values, thus we only derived the cumulative values of spread measure for each period for Ctd_BD, Cty_BD, and MSD methods. We then fitted three regression models, i.e., linear, biphasic (i.e., segmental linear with one break-point at year 12), and non-linear with logistic curve function, to the cumulative values of each estimation method by each expansion type, spread scenario, and region. For each simulation (i.e., one combination of spread scenario and expansion type in one region), the Akaike information criterion (AIC) of three fitted regression models were derived and the model with lowest AIC was assigned as the estimated expansion pattern for this simulation [30]. The estimated expansion type was then compared to the simulated expansion type to determine the ability of each method to estimate expansion pattern. For linear expansion, we defined the method that can accurately estimate the expansion pattern if the linear or biphasic linear fit has the lowest AIC, as biphasic linear can be an overfitting of linear model but have lower AIC.

Accuracy and Similarity of All Methods
The simulated rate was defined as the distance between the simulated boundaries, and was used to evaluate the accuracy of eight methods on estimating spread rates. We derived R 2 and the RMSE between the estimated and simulated overall rate and dynamics by each method for each spread scenario, expansion type, and spread region. R 2 , varying from 0 to 1, is scale independent with 1, indicating perfect estimation. RMSE measures the absolute deviation of estimation from the simulation, thus cannot be used on SqrtN_R method as the measure of this method is the square root of number of infested counties. We calculated the Pearson correlation coefficient (r) and the RMSE of estimated spread rates and dynamics among all methods to assess their similarity pattern. Hierarchical clustering (HC) with complete linkage using mean distance was conducted to group all methods based on their similarity of derived spread rates.

Impact of County Size on Spread Estimation
To determine whether the county size and the variation of county size affect the values of estimated rates, we tested the significance of correlations between the estimated rates with the mean and coefficient of variation (CV) of county size. To assess the impact of county size and the variation of county size on the accuracies of estimated rates, we tested the significance of correlations between the mean and CV of county size and R 2 of the estimation from the simulation for each region and spread scenario.

Accuracy on Estimating Overall Spread Rate
The estimated and simulated overall spread rates for all spread scenarios and expansion types were included in Supplementary Material 3. Based on R 2 , all methods had the best accuracy to estimate spread rate in S1 and the worst ability in S3 (Table 3). MSD and SqrtN_R estimated spread poorly when all three regions were analyzed (R 2 < 0.1). Ctd_BD, CtdD_R, and MinD_R were the best three models to estimate overall rate without long-distance jump dispersal (Table 3, R 2 for S1 and S2), whereas Cty_BD, SqrtA_R, and Ctd_BD were the best three models with occurring of long-distance jump dispersal ( Table 3, R 2 for S3). Based on both R 2 and RMSE on spread rate for all scenarios and regions, Ctd_BD had the best estimation, followed by Cty_BD, and SqrtA_R.

Accuracy on Estimating Spread Dynamics
Compared to the overall rate, the ability to estimate spread dynamics decreased for all methods, i.e., lower R 2 and higher RMSE values for spread dynamics estimation (Supplementary Material 4) than the overall rates (Table 4). All methods had better performance for S1 and Region 1 and lower performance for S3 and for Region 3, except SqrtN_R, SqrtNA_R, and MSD. MSD had low performances for all scenarios and regions. CtdD_R and MinD_R only had good estimation for S1 (Table 4). Ctd_BD, SqrtNA_R, and SqrtA_R were the best three methods for all scenarios and regions based on both R 2 and RMSE (Table 4), among which Ctd_BD showed consistently good estimates for all regions and scenarios (R 2 > 0.75).
Overall, Ctd_BD, SqrtNA_R, SqrtA_R, and Cty_BD were ranked as the best four methods for estimating both overall spread rate and spread dynamics, among which Ctd_BD constantly had the best estimation. The accuracy of two boundary displacement methods on estimating overall spread rate was not significantly different from each other (p = 0.69), whereas Ctd_BD had significantly higher accuracy on estimating spread dynamics than Cty_BD (p = 0.01). Between the two area-based regression methods, SqrtA_R had significantly better estimation on overall spread rate than SqrtNA_R (p = 0.001), whereas SqrtNA_R had marginally better estimation on spread dynamics than SqrtA_R (p = 0.07). Distance-based regression methods, CtdD_R and MinD_R, could provide good estimations for overall spread rate without long-distance jump dispersal, and had constantly lower performances on estimating spread dynamics compared to boundary displacement methods and area-based regression methods. There was also no significant difference of accuracy on estimated overall rate (p = 0.49) and spread dynamics (p = 0.48) between these two distance-based regression methods.

Impact of County Size and Its Variation on Estimation of Spread Rate
Overall, the county size and the variation in county size showed more significant impact on the estimation of spread dynamics than the overall spread rate (Table 5). However, significantly positive and negative correlations existed between the county size and spread rates estimated by MSD and SqrtN_R, respectively, for both spread dynamics and overall spread rate (Table 5). Additionally, larger county sizes led to higher estimated spread dynamics for SqrtA_R, Ctd_BD, and Cty_BD (Table 5). For the accuracy of estimated rates, significantly negative correlation of R 2 with mean and CV of county size was only observed on the SqrtA_R method for overall rate but were observed on SqrtA_R, Ctd_BD, and Cty_BD for spread dynamics (Table 5). These negative correlations suggest that accuracy is negatively impacted by the county size and its variation. The accuracy of two boundary displacement methods on estimating spread dynamics were negatively impacted by the county size and its variation, whereas for SqrtN_R and SqrtNA_R the variation of mean county size among different periods is more influential than the mean size of county on the estimation accuracy. Table 5. Correlation coefficient between mean of county size and estimated overall rate and spread dynamics, and between mean/coefficient of variation of county size and R 2 for each region, expansion type, and spread scenario.

Similarity of All Methods
The CtdD_R and MinD_R constantly estimated highly correlated overall spread rate and spread dynamics (r > 0.95, Figure 3a,c), and both estimations had low RMSE between two methods (Figure 3b,d). The spread rates estimated by MSD and SqrtN_R had low similarity with other methods when spread in all regions was analyzed, i.e., low correlation and high RMSE (Figure 3a-d). For asymmetric spread but with low variation in county size (i.e., S2 for Region 1), all methods still estimated highly correlated overall rates and spread dynamics (Supplementary Material 5- Figure S1a,b, Supplementary Material 5- Figure S2a, r > 0.75). With the increase of anisotropy, stochasticity, and variation of county size, strong correlations were observed among SqrtA_R, SqrtNA_R, Cty_BD, and Ctd_BD for both overall rate and spread dynamics (r > 0.80, Figure 3a,c). Additionally, the RMSE among these four methods also remained at a low level for estimation on overall spread rate (Figure 3b, RMSE < 1.5) and spread dynamics (Figure 3d, RMSE < 5). SqrtA_R, Ctd_BD, Cty_BD and SqrtNA_R were constantly classified into one group for both overall spread rate and spread dynamics due to their similarities on estimating spread rates (Figure 3a,c; Supplementary Material 5- Figures S1 and S2).

Discussion
Estimating spread is critical for predicting further invasion and guiding early management of invasive species and facilitates analysis of important factors affecting the spread to enhance agricultural and environmental sustainabilities [1,3,4]. Although researchers have been using GULD to estimate the spread of invasive species, a systematic evaluation of the performances of GULD to estimate invasion under different spread scenarios with multiple types of stochasticity and

Discussion
Estimating spread is critical for predicting further invasion and guiding early management of invasive species and facilitates analysis of important factors affecting the spread to enhance agricultural and environmental sustainabilities [1,3,4]. Although researchers have been using GULD to estimate the spread of invasive species, a systematic evaluation of the performances of GULD to estimate invasion under different spread scenarios with multiple types of stochasticity and irregularity is lacking. This research filled this gap by using simulation data, and findings of this research can guide selection of optimal methods to estimate expansion pattern, overall spread rate, and spread dynamics of invasive species with GULD.

Ability of Common Methods to Estimate Expansion Pattern and Spread Dynamic
Regression methods have long been used to determine expansion patterns of invasive species [3,25,31]. Here we found the cumulative value of boundary displacement methods can correctly estimate expansion patterns. In the simulation study, the boundary displacement method, Ctd_BD, and two regression methods, SqrtN_R and SqrtNA_R, always estimated the correct expansion patterns regardless of variations in county size or anisotropy and stochasticity in spread.
Estimation of spread dynamic is more challenging than the estimation of overall spread rate. Boundary displacement and MSD have been commonly used to estimate spread dynamic of invasive species [5,11,32]. Nevertheless, our research suggests that regression methods can also estimate spread dynamics by using the difference of measurements between two consecutive periods. In fact, two regression methods, SqrtA_R and SqrtNA_R, together with two boundary displacement methods, Ctd_BD and Cty_BD, were the top four methods to estimate both overall spread rate and spread dynamic.

Estimating Spread with Anisotropy and Stochasticity with GULD
Compared to long-distance jump dispersal, asymmetric spread caused by spatial heterogeneity does not seriously challenge the ability of all focal methods to estimate overall spread rate. Meanwhile, boundary displacement methods and area-based regression methods can still have good estimation of spread dynamics with long-distance jump dispersal. However, when the spread is highly asymmetric and the study area is at regional or larger scales, estimating spread rates by taking the whole infested region as one area does not reveal the spatial dynamics of spread caused by heterogeneities [3,33]. Thus, this may lead to failure of recognizing critical factors influencing the invasions [3].
Our research suggests that dividing a highly asymmetrical spread into several relatively symmetric spreads can improve the estimation accuracy for each sub-region, thus we recommend the use of neighborhood measurement when the spread is highly asymmetric. The neighborhood measurement was applied in multiple studies for better estimating localized spread dynamics [34,35]. Estimating spread within different homogenous sub-regions can further contribute to better understanding of the spread dynamics and facilitates analysis of spatial factors impacting the spreads [3,33]. Andow et al. [33] first proposed to divide the large infested area into multiple neighborhoods to increase homogeneity within each neighborhood to well reveal localized spread dynamics for asymmetrical spread. Liang et al. [3] proposed a quantitative method, i.e., spatial constrained clustering, to classify a large heterogeneous region into environmentally homogeneous sub-regions.
Long-distance jump dispersal is often caused by rare random events and human-related activities [6,36]. However, despite its rarity and stochasticity, it greatly facilitates the spread of invasive species and can be more influential than local dispersal for some species [6,19,21,36]. Long-distance jump dispersal causes new spread origins and obscure the actual expansion pattern that is determined by life traits and dispersal ability of the target species itself [31,36]. To better estimate the ability of an invasive species to spread, researchers could set multiple spread origins, including the ones caused by long-distance jump dispersal [6,37], and then estimate spread rate from different origins. Goldstein et al. [17] developed a method which can detect possible long-distance jump dispersal. Methods that can analyze local spread rate determined by dispersal ability of invasive species and long-distance spread rate impacted by human-related activities separately can also be used to better facilitate estimation of further spread of invasive species. For example, Meentemeyer et al. [38] designed an approach to estimate the local dispersal ability of invasive species impacted by the biological trait and long-distance dispersal impacted by human activity separately to simulate the spread dynamics of invasive species.

Similarity of Methods to Estimate Overall Spread Rate and Spread Dynamics
When the spread is symmetrical and the county size is relatively uniform, estimates of spread rates and dynamics among all eight methods are similar. With increase of anisotropy and stochasticity in spread, only boundary displacement methods and area-based regression methods continued to estimate similar overall spread rate and spread dynamics. Unsurprisingly, the two distance-based regression methods, CtdD_R and MinD_R, always estimated highly correlated overall rates and spread dynamics. Meanwhile, no significant difference was observed on estimation accuracy of overall rate and spread dynamics between these two distance-based regression methods, thus these two methods can be used interchangeably. The two boundary displacement methods also always estimated very highly correlated overall spread rate and spread dynamic, nevertheless, the Ctd_BD showed significantly higher accuracy on estimating spread dynamics (p = 0.01). Additionally, Ctd_BD had better estimation on expansion pattern than Cty_BD, therefore Ctd_BD can be preferred over Cty_BD. The two-area based regression methods also consistently estimated highly correlated overall spread rate and spread dynamics. SqrtA_R showed significantly higher accuracy on estimating overall spread rate than SqrtNA_R (p = 0.001) but marginally lower accuracy on estimating spread dynamics (p = 0.07).

Selection of Method to Estimate Overall Spread Rate and Spread Dynamics with GULD
Earlier simulation research focused on comparing accuracy of multiple methods on estimating overall spread rate with different approaches of field sampling strategy [18,19], and they found regression method consistently have good estimations. Our simulation results on estimating overall spread rate are consistent with earlier findings as the spread scenarios in earlier research resembled the S1 and S2 spread scenarios. Distance-based regression methods, i.e., CtdD_R and MinD_R, showed the top level of accuracy (based on R 2 ) on estimating overall spread rate for S1 and S2. However, distance-based regression methods have lower ability of estimating spread dynamics when compared to area-based regression methods and boundary displacement methods. Tobin et al. [14,15] found that the overall spread rate estimated from boundary displacement method was most similar to the rate estimated from field sampling data than the distance-and area-based regression methods. Our research confirmed their findings as the overall spread rate estimated by Ctd_BD had the lowest RMSE than all other methods.
Researchers have used MSD to estimate spread dynamic [11,26] and SqrtN_R to estimate overall rates and expansion types of invasive species [8,13]. However, when evaluating the accuracy across different regions, SqrtN_R and MSD constantly showed low abilities of estimating both overall rate and spread dynamics due to their sensitivity to the size of geopolitical units. Meanwhile, MSD constantly showed the lowest ability to estimate both overall rate and spread dynamics in all scenarios, and SqrtN_R had a low ability of estimating spread dynamics among all methods considered. Therefore, MSD is not recommended to estimate spread of invasive species, whereas SqrtN_R is only suggested to estimate overall spread rate when the mean sizes of geopolitical units across different periods or regions are relatively uniform. SqrtNA_R is a better alternative to the SqrtN_R as it rectifies the SqrtN_R by the mean area of geopolitical unit.
Generally, the abilities of all methods to estimate invasion rates decreased with the increase of irregularities and stochasticities, i.e. anisotropy and stochasticity in spread and size and variation in size of geopolitical unit. Additionally, we found these irregularities and stochasticities have more negative impact on the estimation accuracy of spread dynamics than overall spread rate. Selection of an optimal method depends on the question of interest and stochasticity in spread. Distance-based regression methods, i.e., CtdD_R and MinD_R, can provide a better estimation than area-based regression method and equally well estimation with CtD_BD on overall spread rate without long-distance jump dispersal. When the goal is to estimate the spread dynamics, boundary displacement methods and area-based regression method can be preferred. More specifically, SqrtNA_R showed better estimation on spread dynamics than SqrtA_R, whereas Ctd_BD constantly had the best estimation on both overall spread rate and spread dynamics. Additionally, Ctd_BD, SqrtN_R, and SqrtNA_R correctly estimated all the expansion patterns. Thus, Ctd_BD can be a top choice for estimating spread, whereas for spread without clear infestation boundaries, a distance-based regression method can be used to estimate overall spread rate, and an area-based regression method can be used to estimate spread dynamics.

Conclusions
Using simulated spread data, we found that GULD can be used for estimating spread of invasive species even with variations in geopolitical unit size. Both regression and boundary displacement methods are capable to estimate the expansion pattern, overall spread rate, and spread dynamics of invasive species. Selection of an optimal method depends on the question of interest and stochasticity in spread. We found the anisotropy and stochasticity in spread and the size and variation in size of geopolitical unit have more negative impact on the estimation accuracy of spread dynamics than overall spread rate, thus estimating spread dynamics is more challenging than estimating the overall rate with GULD. Boundary displacement methods and area-based regression methods estimated spread dynamics for all scenarios most reliably, among which Ctd_BD had the best estimation. Overall, we recommend the boundary displacement method, Ctd_BD, for estimating spread of invasive species, as it can reliably estimate expansion pattern, overall spread rate, and spread dynamics. However, for spread without clear infestation boundaries, area-based regression methods can be good alternatives to estimate spread dynamics and expansion pattern, whereas distance-based regression methods can be used to estimate overall spread rate without long-distance jump dispersal. We suggest readers to carefully design method to estimate spread rate with occurring of rare long-distance jump dispersal. Finally, in addition to the methods included in our analysis, we also encourage readers to explore other methods with GULD, such as the methods in Goldstein et al. [17] and Pio et al. [20].