# Dynamic Species Distribution Modeling Reveals the Pivotal Role of Human-Mediated Long-Distance Dispersal in Plant Invasion

^{1}

^{2}

^{3}

^{4}

^{5}

^{6}

^{*}

## Abstract

**:**

## Simple Summary

## Abstract

## 1. Introduction

## 2. Materials and Methods

#### 2.1. Ecological Process Model

#### 2.1.1. General Structure

- -
- ${s}_{i,t}={{\displaystyle \sum}}_{{i}^{\prime}=1}^{C}disp\left(i,{i}^{\prime}\right){{\displaystyle \sum}}_{{k}^{\prime}=1}^{K}{f}_{\gamma}\left({k}^{\prime}\right){n}_{{i}^{\prime},t,{k}^{\prime}}$

- -
- $disp\left(i,{i}^{\prime}\right)=1/{D}_{{i}^{\prime}}$ if $i={i}^{\prime}$ (within-cell dispersal),

- -
- ${d}_{s}>0$ determines the proportion of seeds from i’ that reach i if the latter is adjacent (short-distance dispersal), noted ${i}^{\prime}\in N\left(i\right)$
- -
- ${d}_{l}{a}_{{i}^{\prime}}$, where ${a}_{{i}^{\prime}}$ is the proportion of urban habitat area in cell i’, determines the proportion of seeds from i’ transported via long-distance dispersal to i, i.e., when i is not adjacent to i’, noted ${i}^{\prime}\ne i,{i}^{\prime}\notin N\left(i\right)$. In other words, long-distance dispersal diffuses a portion of seeds homogeneously and instantaneously across the domain.
- -
- The proportion of seeds from i participating in local recruitment is set as the reference in this parametrization.

#### 2.1.2. Age-Structured Fecundity

#### 2.1.3. Initial Populations

#### 2.2. Sampling Process Model

**i**during year t is modeled with a Poisson distribution:

#### 2.3. Data

#### 2.3.1. Temporal and Spatial Extent

#### 2.3.2. Occurrence Data

#### 2.3.3. Environmental Variables

**Land cover variables**. We focused on percent-cover variables for summarizing the land cover for each cell, i.e., the percentages of the land covered by forests, crops, and settlements, respectively, in each cell. The latter is used for modeling ${p}_{\beta}$ and the (long-distance) dispersal kernel $disp\left(.,.\right)$. The land cover has changed substantially in recent decades, affecting the spatial invasion dynamics. To account for these changes in our model, we reconstructed them by linearly interpolating our percent-cover variables between the 4 sampling years where global coverage datasets were available, namely 1992, 2001, 2010, and 2019. We also linearly extrapolated outside of these sampling years. For the year 1992, we used GLCC-IGBP [43] (accessed on 15 October 2021) with IGBP land cover classification. For the other years, we used MODIS/Terra+Aqua Land Cover Type Yearly L3 Global 500 m SIN Grid (accessed on 15 October 2021), which follows the same land cover classification.

**Terrestrial area within cells.**We computed each cell’s terrestrial area based on the sum of the terrestrial land cover areas for that cell using MODIS 2019 (our most recent land cover raster) and considered it as a constant over the study period.

**Bioclimatic variables.**We collated historical monthly weather data from CRU-TS 4.03 [44], downscaled with WorldClim 2.1 [45] and accessed through: https://worldclim.org/data/monthlywth.html (accessed on 20 August 2022). This dataset provides a worldwide coverage at a 2.5-arc-minute resolution of monthly maximum/minimum temperature and precipitation. We derived from it (i) the annual mean diurnal range (mean of monthly max–min temperature); (ii) the maximum temperature of the hottest month (max. of monthly mean of daily max. temperature); (iii) the minimum temperature of the coldest month (min. of monthly mean of daily min. temperature); (iv) the precipitation (v) the precipitation of the wettest month; and (vi) the precipitation of the driest month for each year from 1980 to 2018. Indeed, these six bioclimatic variables have been used to model the invasive species’ distribution response to climate change [46]. We extrapolated the values in each 2.5-arc-minute cell for the years from 2019 to 2021 based on the prediction of a simple linear regression fitted over the years from 2000 to 2018. We upscaled each raster to our 5 min grid scale by averaging over the 2.5-arc-minute cells whose center fell inside each 5 min cell. Finally, to restrict the number of parameters, we centered and scaled each of all of the six variables and synthesized them by taking the values along the two first axes of a singular value decomposition (SVD) of the bioclimatic variable values across the combinations of cells and years. This pipeline is provided in our GitHub script repository: https://github.com/ChrisBotella/plectranthus_barbatus (accessed on 20 August 2022).

#### 2.4. Model Fitting, Convergence Assessment and Posterior Samples

**coda**(see Appendix A.3). We used 1678 posterior samples in the main analysis, obtained from a thinning of samples from the last MCMC session after applying a burnin of 15,000 iterations. We set a thinning interval of 450 to reduce the auto-correlation between MCMC samples due to a very low acceptance rate.

#### 2.5. Output Representations

**Model interpretation.**Given the lack of identifiability of certain parameters, we had to be careful when interpreting the model output. For example, it is nearly impossible to disentangle population size from the reporting interest ${p}_{d}^{detec}$, due to our presence-only data (see a discussion about the problem in Hastie & Fithian, 2013); so, their identifiability mostly relied on the constraints imposed by their prior distribution (see Appendix A.4 for comments on parameter correlations and interpretation), and such parameters had a large posterior confidence interval. Consequently, our interpretations did not rely on the absolute estimated population size but compared its value or order of magnitude between spatial cells and/or from one year to another (relative population size).

**Population size percentile spatio-temporal maps.**We represented the relative population size across cells and years to understand the species’ invasion dynamics. For this purpose, we computed, for each parameter sample, the population size per couple (cell, year) and discretized it by associating each to one of the percentile intervals [0, 40th percentile], ]40th percentile, 70th percentile], ]70th percentile, 90th percentile], and ]90th percentile, maximum]. Based on this discretization scheme, we propose a simple way to deal with ambiguous couples (cell, year) for which there is no consensus across parameter samples: Each couple where less than two-thirds of the samples agree with their population size percentile is tagged as “uncertain”. We were thus able to draw comparable maps of relative population size for selected years in Figure 3 and for all years in our interactive web Appendix A.6 (R Shiny application): https://chrisbotella.shinyapps.io/plectranthus_barbatus_sa_maps/ (accessed on 20 August 2022)

**Invasion syndrome spatio-temporal maps.**We analyzed the growth and spread status of populations across space and time. We simplified the growth and spread status of a population by defining five discrete categories, called invasion syndromes: (i) certain population growth but uncertain dispersal (the population does not spread enough seeds for new plants to effectively grow from it in other cells); or (ii) certain population growth and dispersal (e.g., invasive population); or (iii) certain dispersal but no certainty about whether the population is growing or declining (e.g., large invasive population which is self-regulated); or (iv) certain population decline and dispersal (e.g., a large population declining due to environmental change); or (v) certain population decline but uncertain dispersal (e.g., a collapsing population). The growth status and spread status of any population are determined exactly and independently for each posterior sample. The growth (resp. decline/dispersal) status of a population is considered certain if at least two-thirds of the posterior samples agree that the population is growing (resp. decreasing/generating new plants in other cells) from one year to the next. If the status of growth and dispersal does not belong to any of the five invasion syndromes, we tag the population as uncertain. We map the invasion syndrome across the cells for 6 key years in Figure 4 and for all the years in our interactive web Appendix A.6 (R Shiny application): https://chrisbotella.shinyapps.io/plectranthus_barbatus_sa_maps/ (accessed on 20 August 2022). We also reconstructed the total population, seed production, shanon entropy of population across cells per year (Figure 5) and the number of new trees disseminated by long-distance dispersal per year (Figure 6).

**Simulated trajectories under restricted dispersal modes.**We compared the relative importance of short-distance dispersal and human-mediated long-distance dispersal in the past invasion dynamics by carrying out an ablation simulation experiment. We simulated the past invasion dynamics again for each posterior sample by removing either (i) the long-distance human-mediated dispersal (setting ${d}_{l}=0$) or (ii) the short-distance dispersal (setting ${d}_{s}=0$) in the model, with all the parameters otherwise unchanged. For each posterior sample, we computed yearly the difference between the ablated and the full model for the population size and the Shannon entropy of population sizes across cells and divided it by the full model value. We show the average across all the posterior samples (solid curve) and the 90% confidence interval (ribbon) of this “relative difference” for the population size (top) and the Shannon entropy (bottom) in Figure 7. Note that the first scenario, where we removed the long-distance dispersal, is closely related to the scenario where the plant would be systematically eradicated from every cell having a non-null urban area. The latter would have an even more drastic negative impact because the plants in the urban cells do not even contribute to the local and short-distance dispersal.

**Fecundity versus plant age.**Exploring the estimated fecundity, i.e., the number of seeds produced as a function of plant age, as defined in Equation (3), enables us to test our hypothesis on the role of the delay before reproductive maturity in determining the invasion dynamics. Figure 8 shows the posterior distribution of the fecundity as a function of plant age divided by its asymptote (M).

## 3. Results

**A marked information gain on most parameters despite imperfect MCMC convergence.**Despite the sampling heterogeneity of our presence-only data, the convergence of the MCMC chains was reasonable for most parameters from the visual inspection of their trace plots and univariate convergence criteria (PSRF < 2), but the multivariate criterion (MPSRF = 2.8) suggested that the algorithm did not fully converge to the posterior distribution (see Appendix A.3). This can be partly explained by a strong negative correlation between the detection rates and the maximal carrying capacity (Figure A7 of Appendix A.4). This suggests that the strong observed information gain on the detection rates (Figure A4 of Appendix A.4) came mainly from the informative prior distribution of the carrying capacity (see Appendix A.2). There was also a gain of information on mortality, with a posterior mean of 0.5, even though this parameter did not converge well, showing a large 95% confidence interval of 0.36. Most other parameters, including the initial population parameters, visually showed a strong information gain (Figure A4, Figure A5 and Figure A6 of Appendix A.4), except the maximal fecundity $M$, whose posterior sample distribution was very similar to the prior distribution.

**Model validation.**The model predictions in the validation depended on the time after data deprivation. In the following, we used the taxonomy of SDM performance with AUC [47] The validation performances for the short-term future (2000–2015), i.e., up to 15 years after data deprivation, were poor (mean AUC = 0.64) but significantly better than random and comparable to the training performances (mean AUC = 0.58) (see Figure A10 of Appendix A.7). In other words, the predicted population in the presence cells was higher than the one of the non-detection cells in 64% of the random pairs of detection/non-detection validation cells. However, the predictions failed (average AUC = 0.44) in validation after 16 years of data deprivation (2016–2021), with performances worse than a random guess, while the training performance was fair (mean AUC = 0.73) in the same period.

**Signals of introduction hotspots and residence time.**The estimated initial population sizes in 1980 varied significantly across the 24 introduction cells. Indeed, despite some uncertainty, the posterior estimates of the initial population size were significantly higher in some cells than others (Figure A5 of Appendix A.4). For instance, Figure 1—top shows that the highest population sizes (mean estimates) were in the north of Stellenbosch, near Paarl, Cape Town, and George. Nevertheless, significant populations were also located in other less urbanized areas. The oldest initial population was estimated to be in a coastal cell located around Wilderness, on the east side of George (Figure 1—bottom), where the mean age of the plants was nearly 40 years, suggesting the existence of (cultivated) plants in the area even before 1940, while the first BODATSA observation in the whole study area was recorded in 1963.

**An early massive spread wave driven by human-mediated long-distance dispersal.**The long-distance spread of seeds from urban areas happened consistently every year of the modeled period since 1980, with a higher impact on the invasion dynamics in the first 15 years. Indeed, while population sizes were low everywhere in 1980, by 1996 large populations had colonized the vicinity of two introduction areas, Stellenbosch and George, but also areas much further from the introduction sites, such as the vicinity of Swelledam in the middle of the study area (Figure 3). This was due to the long-distance dispersal from the urban areas of introduction that was already occurring in 1980 (top-left map of Figure 4). To prove that long-distance dispersal was necessary, we must compare what would have been the population dynamics without this dispersal mode. This counterfactual evidence is provided in Figure 7 (red curve)

**,**showing that without long-distance dispersal, the total population would have been 10 times smaller in 1990, and still 4 times smaller by the end of the study period, despite the slow catching up due to short-distance dispersal. Additionally, the species would have been far from reaching its current equilibrium. The other alternative scenario (blue curve in Figure 7), where the model is deprived of short-distance dispersal while keeping all the other parameters constant (blue curve), showed that the absence of short-distance dispersal can hardly affect the invasion dynamics, except for a slight delay in overall population growth from 1990 to 1995.

**A fast establishment phase driven by local reproduction.**The total population grew by about a million times between 1980 and 1996 (Figure 5—top), leading to a steady state due to the self-regulation in our model (carrying capacity). The steep and sudden increase in the Shannon entropy (Figure 5—middle) shows that the population sizes were rapidly balanced between cells over the period of 1987–1994. This is explained by a synchronous growth in environmentally suitable cells driven by local dispersal. Indeed, it is crucial to highlight that long-distance dispersal was not intense enough to drive population growth by itself. It only allowed the establishment of small pioneer populations in many remote areas, while self-sustained local dispersal was responsible for driving their fast growth in a second phase. Indeed, long-distance dispersal resulted in far too few new plants annually (${10}^{2}$ to ${10}^{6}$, Figure 6) to compensate for the overall annual mortality (50 +/− 20% of the total population, namely ${10}^{7}$ to ${10}^{9}$ annual deaths, according to Figure 5—top).

**The time before reproductive maturity induced marked growth steps.**As visible in Figure 8, representing the scaled fecundity curve, the age before reproductive maturity was estimated to be almost certainly three years. Indeed, despite a slightly earlier optima in the prior distribution of this parameter and under the model assumptions, the result suggested that the individuals were effectively reproductive in their third year. The fecundity increases quickly with age in older plants (Figure 8), although fecundity saturation is certainly not reached during the life span of most individuals, as 99% of plants die before the age of 4 to 13 years (given the uncertainty about mortality). Actually, the absolute fecundity only becomes greater than 100 with a probability of 0.95 at 5 years old, as illustrated by the wide confidence interval on fecundity between age 4 and 20 years (Figure 8) due to the uncertainty of the allometric scaling factor $\theta $ (estimated = 8.8 +/− 2). We thus conclude that only a very small proportion of germinated plants end up contributing significantly to the population growth in the fitted model. This latency phase before individual plants become significantly fecund also explains the two marked steps in the growth of the global seed production over time (Figure 5—bottom), and thus the lag phase. Note, we scaled the fecundity by its maximal value M in Figure 8. Indeed, because the model did not gain any information on this parameter from the data, its posterior distribution was the same as its prior.

**A possible lag phase of fifty years.**Our results showed that the species spread rapidly across the study domain, but the last population growth phase, which multiplied the population size by nearly 20, only occurred in the early 1990s (Figure 5—top). Given the inferred age-structure of the initial populations, which suggests an introduction prior to 1940, the model predicted a lag phase of 50 years or more preceding this last population growth phase.

## 4. Discussion

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Acknowledgments

## Conflicts of Interest

## Appendix A

#### Appendix A.1. Illustration of Fecundity Function

**Figure A1.**An example of the fecundity of a perennial plant as a function of the plant’s age ($\widehat{k}$ = 7 as represented by the red vertical line, i.e., age of reproductive maturity, $M=100,000$, $\theta =8$).

#### Appendix A.2. Parameter Prior Distributions

- -
- ${p}_{.}^{detec},{d}_{l},{d}_{s}\sim U\left(0,1\right)$
- -
- ${\beta}_{.}\sim N\left(0,900\right)$
- -
- $\theta \sim U\left(0,100\right)$ , which in practice does not constrain the potential functional shapes of the fecundity as an increase of this factor makes the shape converge quickly towards a step function.
- -
- $\left(\widehat{k}-1\right)\sim B\left(6,0.3\right)$
- -
- $log\left(M\right)\sim N\left(log\left(2\times {10}^{5}\right),0.25\right)$ : the expectation of $M$ is then approximately 200,000 seeds/plant/year, and its standard deviation is 50,000, which is coherent with prior knowledge on woody plant fecundity while acknowledging a large uncertainty.
- -
- $log\left(\phi \right)\sim N\left(15.95,0.5\right)$ : the expectation of $\phi $ is 9.5 million plants/cell (of 10 × 10 km). This number is very close to the one taken for Acacia longifolia from [16]. The standard deviation is set so that the standard deviation of $\phi $ is of approximately 8 million plants/cell, acknowledging a large uncertainty around this number.
- -
- $\rho \sim U\left(1-{\left(1/\left(9.5\times {10}^{6}\right)\right)}^{1/50},1\right)$ : the lower bound is defined such that all plants of an approximately saturated cell population will certainly die before reaching the maximal age, set as 50 in our application.
- -
- $popIn{i}_{.}\sim U\left(0,1000\right)$
- -
- $ageRati{o}_{.}\sim U\left(0,1\right)$

#### Appendix A.3. MCMC Procedure and Convergence Tests

**MCMC procedure.**We provide here more details about our fitting procedure and the reasons that have led us to these choices. The surface of our posterior likelihood was complex and discontinuous, especially because of the nonlinear components of the transition equations (e.g., the self-regulation coefficient) and also by population rounding and by the lack of identifiability of certain parameters. In such cases, the reference acceptance rate of 0.23, which optimizes the exploration of the parameter space by Metropolis-type MCMC algorithms under classic assumptions on the regularity of the posterior likelihood, does not hold anymore. As noted by Rosenthal [63], when the posterior likelihood has a complex surface with multiple modes, the optimal acceptance rate is lower and potentially much lower. We observed this phenomenon in numerous experiments not reported here: aiming at a high acceptance rate of our MCMC systematically forced us to decrease the variance of the samplers enormously, resulting in a much too local exploration of the chains, particularly when starting the chains in high likelihood regions. Hence, we set a relatively high variance for all parameter samplers to ensure a wide exploration of the parameter space, increasing our chances to detect multiple modalities and identifiability issues and to avoid getting stuck in sub-optimal regions. It was naturally at the cost of a high rejection rate (as visible from Figure A2 of this appendix), leading to high auto-correlation of samples. The downside is that we needed to refine the initial parameters and run many iterations to obtain a reasonable level of convergence. Hence, we ran a first MCMC session (10 parallel chains of 100,000 iterations) where the initialization parameters ${\Theta}_{0}$ were randomly drawn, independently for each chain, based on their prior distributions. Then, we retained the parameter sample, maximizing the posterior likelihood, and used it to initialize 10 chains of a new MCMC session. We repeated the procedure for a third and last session. Indeed, in contrast to the first and second MCMC sessions, the posterior likelihood value almost did not increase over the third session (visible from Figure A2 of this appendix), showing that the initialization parameter was likely close to the global maximizer of the posterior likelihood.

**MCMC Convergence tests.**In addition, the visual inspection of the main parameter trace plots of this last session including all chains (Figure A3 of this Appendix) show that, for most parameters, despite the low acceptance rate, the chain trajectories largely overlap and show no trend in their variation. However, this is not true for parameters $\rho ,\theta ,{d}_{l},{d}_{s}$, for which the different chains tended to diverge towards different ranges of values. It might be due to a too small variance of the proposed distribution of the samplers of these parameters. We checked the overall convergence accounting for all chains and all parameters using the multivariate Gelman and Rubin criterion (MPSRF, [64]), implemented in the R package

**coda**. A value close to one indicates a good level of convergence, whereas in our case the MPSRF estimate was 2.8, suggesting that convergence was not fully achieved, but we may still interpret our posterior samples as a reasonable approximation of the posterior distribution. The univariate version of the Gelman and Rubin criterion was also computed for the 16 main model parameters, and their values are provided in Table A1 of this appendix. They showed that some parameters converged better than others, with a criterion less than 1.5 for most parameters (10 over the 16), and the maximal criterion value was 2.1 for parameter $\rho $. The very low rejection rate was dealt with by selecting a very large thinning interval of 450 for all 9 chains. We also removed a burnin of 15,000 iterations, yielding a total of 1678 posterior samples used in the main analysis.

**Figure A2.**Posterior log-likelihood for iterations of the 10 independent Monte Carlo Markov chains of the third and last session. All chains were initialized with the same parameters obtained from the two previous MCMC sessions.

**Figure A3.**Trace plots of the 18 main model parameters for the 100,000 iterations of the third and last MCMC session and for 10 independent chains.

**Table A1.**Point estimate and upper confidence interval of the univariate Gelman and Rubin convergence criteria (PSRFs; Brooks & Gelman, 1998) for each of the main model parameters. These criteria were computed on the 9 chains of the last MCMC session. The multivariate upper bound (MPSRF), accounting jointly for all parameters, was 2.80. Values closer to 1 indicate a better convergence.

Parameter | PSRF’s Point Estimate | PSRF’s Upper Confidence Interval Bound |
---|---|---|

$\theta $ | 1.99 | 3.37 |

$\widehat{k}$ | 1.54 | 4.63 |

${d}_{s}$ | 1.75 | 2.82 |

${d}_{l}$ | 1.50 | 2.24 |

$\rho $ | 2.1 | 3.50 |

$log\left(\phi \right)$ | 1.43 | 1.87 |

$log\left(M\right)$ | 1.08 | 1.17 |

${\beta}_{1}$ | 1.36 | 1.82 |

${\beta}_{2}$ | 1.18 | 1.39 |

${\beta}_{3}$ | 1.07 | 1.15 |

${\beta}_{6}$ | 1.61 | 2.41 |

${\beta}_{7}$ | 1.40 | 1.84 |

${\beta}_{8}$ | 1.48 | 2.15 |

${p}_{1}^{detec}$ | 1.42 | 1.86 |

${p}_{2}^{detec}$ | 1.53 | 2.07 |

${p}_{3}^{detec}$ | 1.37 | 1.76 |

#### Appendix A.4. Parameter Estimability

**matur**); the fecundity allometric scaling factor (

**theta, Figure A4**in this appendix); the dispersal rates (d_l and

**d_s, Figure A4**in this appendix); the environmental suitability parameters (

**Beta_X**, Figure A4 in this appendix); the detection rates (

**pdetec_X, Figure A4**in this appendix); the initial population sizes (Figure A5); and the mean age of the initial populations (Figure A6). Conversely, there was very little information gained for the maximal carrying capacity (

**phi)**and none for the maximal fecundity (

**M**).

**Figure A4.**Posterior sample density (red histogram) and prior density (blue curve) for all 16 parameters of the model, excluding the initial population parameters (see below) and ${\beta}_{4},{\beta}_{5}=0$. The deviation of the posterior sample density from the prior density enables us to visualize the information gained on the parameters from the data.

**Figure A5.**Posterior sample density (red histogram) and prior density (blue curve) for the initial population size ($popIni$) in 1980 for each introduction cell. The mean posterior sample value of a cell is used as the cell value in Figure 1—top of the main manuscript.

**Figure A6.**Posterior sample density (red histogram) and prior density (blue curve) for the mean age of the initial population ($ageRatio$) of each introduction cell. The mean posterior sample value of a cell is used as the cell value in Figure 1—bottom of the main manuscript.

**Identifiability issues.**However, looking at marginal parameter posterior distribution, as previously, is not enough to answer the important question of the statistical possibility to fully disentangle all parameters, a property tied to the model design and data called identifiability. A lack of identifiability may lead to an apparently large marginal variance hiding in fact strong correlations between different parameter estimates. This would indicate that even though some information was captured on the parameters, there existed trade-offs between them, yielding equally likely values. For this reason, we also plotted and analyzed hereafter the correlation matrix of sample parameter pairs in Figure A7 of this appendix. A trade-off between the carrying capacity

**phi**(strongly related to the final total population size) and the parameters related to the species detectability (

**pdetec**) is structurally necessary to ensure that the number of model-predicted records per cell remains in line with the data when the underlying total population increases. Indeed, Figure A7 shows this expected negative correlation between

**pdetec**and

**phi**. This negative dependence was mentioned in the manuscript and already discussed in [39]. Indeed, estimating precisely the absolute total population size is nearly impossible given the presence-only data we have, unless relying on strong hypotheses/prior knowledge. We actually used prior knowledge in our model to constrain the prior distribution of

**phi**and

**M,**which at least partly explains the marked information gain on the detection rates (see

**pdetec_1, 2**and

**3**in Figure A4 of this Appendix). Figure A7 also shows much more discrete trade-offs that seem to have appeared between the fecundity allometric factor (

**theta**), the

**mortality (rho)**, the spread rates (

**d_l**and

**d_s**), and some environmental suitability parameters (

**Beta_7**and

**Beta_8**), but the correlations are not strong enough to interpret them as identifiability problems, and better data would most likely reduce the variations of these parameter estimates.

**Figure A7.**Scatterplots and Pearson correlations between pairs of model parameters across samples of the 9 chains of the last MCMC session.

#### Appendix A.5. Environmental Suitability Estimates and Interpretations

**Ecological process model**). Both effects are applied through the same coefficient ${p}_{\beta}\left({x}_{i,t}\right)$, a function of the vector ${x}_{i,t}$, which concatenates the value 1 (${x}_{i,t}^{1}$) associated with the intercept, and the environmental covariables in cell i at year

**t**(${x}_{i,t}^{2},\dots ,{x}_{i,t}^{8}$). The coefficient ${p}_{\beta}\left({x}_{i,t}\right)$ is expressed as follows:

**Figure A8.**Box plot of environmental suitability parameter posterior distribution. For each component of $\beta $, we summarize the posterior sample values in a box plot. Each colored point corresponds to an MCMC chain and iteration consistently with other figures. I(svd1) and I(svd2) are the components associated with the 2 synthetic bioclimatic axes (see section Environmental variables—

**Bioclimatic variables**of the main manuscript), while I(svd1^2) and I(svd2^2) are associated with their quadratic transformation.

**svd1**and

**svd2**are the two first axes of a singular value decomposition applied to the values of six bioclimatic variables. These variables turned out to have a non-significant and negligible effect. For instance, changing the value of svd1 from its maximal to its minimal value in the study area would not change the germination probability and carrying capacity. In other words, we found no evidence that bioclimatic aspects are constraining the distribution of Plectranthus barbatus in this area.

#### Appendix A.6. Complete Sequence of Growth and Population Maps

#### Appendix A.7. Model Validation

**Validation data subset.**We selected 50 cells to validate the predictions over the validation period 2000–2021. These cells were selected randomly under certain constraints: Plectranthus barbatus was detected in half of them (25 detection cells) over the validation period and not detected, despite the evidence of the sampling effort (TG records), in the others (25 non-detection cells). In addition, for the detection (resp. non-detection) cells, there was at least one detection (resp. non-detection) among the 25 cells during each 3-year interval. The 50 validation cells are represented in green in Figure A9 of this Appendix. All data collected in these cells during 2000–2021 (Plectranthus records and TG records) were hidden from the model during its fit so that the fitted model could not account for their information and so that we could test predictive performances over these last two decades in the 50 validation cells. In practice, the number of focal species records as well as the TG records were simply set to zero, respectively, in $y$ and ${N}^{TG}$, inside the validation cells for the years 2000 to 2021. To synthesize, compared to the main model fitted on all the data, our validation model was deprived of a part of the data and used 121 presence records (64%), 88 detections (69%), and 17,628 non-detections (97%).

**Figure A9.**Study area in the Southern Cape of Africa and its rasterization into 817 square cells of approximately 10km size: 50 validation cells (light green) were drawn and, from the year 2000, their validation data were hidden from the model fitting. We then used the validation data to the test model predictions over the period 2000 to 2021. The cells containing at least one TG record whose data were fully used in the model fitting (i.e., training) are shown in red, while cells without any record are in gray.

**Training and validation predictive performances.**We split the validation period into two consecutive intervals with balanced amounts of data: 2000–2015 and 2016–2021. This way, we can evaluate model predictions per interval to measure the predictive power in a more or less distant future. For each time interval, we applied a procedure to avoid a class-imbalance evaluation bias, given the prevalence of non-detections compared to detections, and used a measure of the sampling effort (the TG records) to minimize the chances of false absences in the non-detections. More precisely, we subsampled the non-detections (couples of a cell and a year) having the most TG records so that their number did not exceed twice the number of detections. This subsampling both balanced the numbers of detections and non-detections and extracted the non-detections having more chances to be real absences, minimizing the bias in the evaluation metric. Then, for each time interval, for the training and validation cells, and for each of the posterior samples, we computed the area under the ROC curve (AUC, [65]) of the predicted population sizes over the sampled couples of cell and year. These results are synthesized in Figure A10 of this appendix. We highlight that this AUC is different from the presence-background type AUCs that have been criticized for SDM evaluation [65]. Here, a detection is not used as a negative sample, and hence, our metric could vary from 0 (detections always have a lower predicted population than non-detections) to 1 (the contrary) through to 0.5 (equivalent to a random guess).

**Figure A10.**Evaluation of model predictions on training and validation data per time period. For two periods (2000–2015) and (2016–2021), we computed the AUC over all validation (resp. training) cells and years of the period.

## References

- Pyšek, P.; Křivánek, M.; Jarošík, V. Planting intensity, residence time, and species traits determine invasion success of alien woody species. Ecology
**2009**, 90, 2734–2744. [Google Scholar] [CrossRef] [PubMed] - Haubrock, P.J.; Cuthbert, R.N.; Tricarico, E.; Diagne, C.; Courchamp, F.; Gozlan, R.E. The recorded economic costs of alien invasive species in Italy. NeoBiota
**2021**, 67, 247. [Google Scholar] [CrossRef] - Renault, D.; Manfrini, E.; Leroy, B.; Diagne, C.; Ballesteros-Mejia, L.; Angulo, E.; Courchamp, F. Biological invasions in France: Alarming costs and even more alarming knowledge gaps. NeoBiota
**2021**, 67, 191. [Google Scholar] [CrossRef] - Cuthbert, R.N.; Bartlett, A.C.; Turbelin, A.J.; Haubrock, P.J.; Diagne, C.; Pattison, Z.; Catford, J.A. Economic costs of biological invasions in the United Kingdom. NeoBiota
**2021**, 67, 299–328. [Google Scholar] [CrossRef] - Haubrock, P.J.; Turbelin, A.J.; Cuthbert, R.N.; Novoa, A.; Taylor, N.G.; Angulo, E.; Courchamp, F. Economic costs of invasive alien species across Europe. NeoBiota
**2021**, 67, 153–190. [Google Scholar] [CrossRef] - Seebens, H.; Blackburn, T.M.; Dyer, E.E.; Genovesi, P.; Hulme, P.E.; Jeschke, J.M.; Essl, F. No saturation in the accumulation of alien species worldwide. Nat. Commun.
**2017**, 8, 1–9. [Google Scholar] [CrossRef] [PubMed] - Rouget, M.; Robertson, M.P.; Wilson, J.R.; Hui, C.; Essl, F.; Renteria, J.L.; Richardson, D.M. Invasion debt–quantifying future biological invasions. Divers. Distrib.
**2016**, 22, 445–456. [Google Scholar] [CrossRef] - Kowarik, I. Time lags in biological invasions with regard to the success and failure of alien species. Plant Invasions Gen. Asp. Spec. Probl.
**1995**, 15–38. [Google Scholar] - Wilson, J.R.; Panetta, F.D.; Lindgren, C. Detecting and Responding to Alien Plant Incursions; Cambridge University Press: Cambridge, UK, 2016. [Google Scholar]
- Encarnação, J.; Teodósio, M.A.; Morais, P. Citizen science and biological invasions: A review. Front. Environ. Sci.
**2021**, 8, 303. [Google Scholar] [CrossRef] - Hui, C.; Richardson, D.M. Invasion Dynamics; Oxford University Press: Oxford, UK, 2017. [Google Scholar]
- Elith, J. Predicting distributions of invasive species. Invasive Species Risk Assess. Manag.
**2017**, 10, 93–129. [Google Scholar] - Thomas, C.D. Climate, climate change and range boundaries. Divers. Distrib.
**2010**, 16, 488–495. [Google Scholar] [CrossRef] - Häkkinen, H.; Hodgson, D.; Early, R. Plant naturalizations are constrained by temperature but released by precipitation. Glob. Ecol. Biogeogr.
**2021**, 31, 504–521. [Google Scholar] [CrossRef] - Roura-Pascual, N.; Bas, J.M.; Thuiller, W.; Hui, C.; Krug, R.M.; Brotons, L. From introduction to equilibrium: Reconstructing the invasive pathways of the Argentine ant in a Mediterranean region. Glob. Change Biol.
**2009**, 15, 2101–2115. [Google Scholar] [CrossRef] - Donaldson, J.E.; Hui, C.; Richardson, D.M.; Robertson, M.P.; Webber, B.L.; Wilson, J.R. Invasion trajectory of alien trees: The role of introduction pathway and planting history. Glob. Change Biol.
**2014**, 20, 1527–1537. [Google Scholar] [CrossRef] [PubMed] - Melbourne, B.A.; Hastings, A. Highly variable spread rates in replicated biological invasions: Fundamental limits to predictability. Science
**2009**, 325, 1536–1539. [Google Scholar] [CrossRef] - Drenovsky, R.E.; Grewell, B.J.; D’antonio, C.M.; Funk, J.L.; James, J.J.; Molinari, N.; Richards, C.L. A functional trait perspective on plant invasion. Ann. Bot.
**2012**, 110, 141–153. [Google Scholar] [CrossRef] - Daehler, C.C. Variation in self-fertility and the reproductive advantage of self-fertility for an invading plant (Spartina alterniflora). Evol. Ecol.
**1998**, 12, 553–568. [Google Scholar] [CrossRef] - Pyšek, P. Is there a taxonomic pattern to plant invasions? Oikos
**1998**, 82, 282–294. [Google Scholar] [CrossRef] - Schurr, F.M.; Pagel, J.; Cabral, J.S.; Groeneveld, J.; Bykova, O.; O’Hara, R.B.; Zimmermann, N.E. How to understand species’ niches and range dynamics: A demographic research agenda for biogeography. J. Biogeogr.
**2012**, 39, 2146–2162. [Google Scholar] [CrossRef] - Louvrier, J.; Papaïx, J.; Duchamp, C.; Gimenez, O. A mechanistic–statistical species distribution model to explain and forecast wolf (Canis lupus) colonization in South-Eastern France. Spat. Stat.
**2020**, 36, 100428. [Google Scholar] [CrossRef] - Roques, L.; Desbiez, C.; Berthier, K.; Soubeyrand, S.; Walker, E.; Klein, E.K.; Papaïx, J. Emerging strains of watermelon mosaic virus in Southeastern France: Model-based estimation of the dates and places of introduction. Sci. Rep.
**2021**, 11, 1–11. [Google Scholar] [CrossRef] [PubMed] - Rejmánek, M.; Richardson, D.M. What attributes make some plant species more invasive? Ecology
**1996**, 77, 1655–1661. [Google Scholar] [CrossRef] - Higgins, S.I.; Richardson, D.M. Predicting plant migration rates in a changing world: The role of long-distance dispersal. Am. Nat.
**1999**, 153, 464–475. [Google Scholar] [CrossRef] [PubMed] - Caswell, H.; Lensink, R.; Neubert, M.G. Demography and dispersal: Life table response experiments for invasion speed. Ecology
**2003**, 84, 1968–1978. [Google Scholar] [CrossRef] - Pemberton, R.W.; Liu, H. Marketing time predicts naturalization of horticultural plants. Ecology
**2009**, 90, 69–80. [Google Scholar] [CrossRef] - Castro-Díez, P.; Godoy, O.; Saldaña, A.; Richardson, D.M. Predicting invasiveness of Australian acacias on the basis of their native climatic affinities, life history traits and human use. Divers. Distrib.
**2011**, 17, 934–945. [Google Scholar] - Caswell, H. Matrix Population Models; Sinauer: Sunderland, MA, USA, 2000; Volume 1. [Google Scholar]
- Stott, I.; Townley, S.; Hodgson, D.J. A framework for studying transient dynamics of population projection matrix models. Ecol. Lett.
**2011**, 14, 959–970. [Google Scholar] [CrossRef] - Qiu, T.; Aravena, M.C.; Andrus, R.; Ascoli, D.; Bergeron, Y.; Berretti, R.; Clark, J.S. Is there tree senescence? The fecundity evidence. Proc. Natl. Acad. Sci. USA
**2021**, 118, e2106130118. [Google Scholar] [CrossRef] - Wilson, J.R.U.; Richardson, D.M.; Rouget, M.; Procheş, Ş.; Amis, M.A.; Henderson, L.; Thuiller, W. Residence time and potential range: Crucial considerations in modelling plant invasions. Divers. Distrib.
**2007**, 13, 11–22. [Google Scholar] [CrossRef] - Caley, P.; Groves, R.H.; Barker, R. Estimating the invasion success of introduced plants. Divers. Distrib.
**2008**, 14, 196–203. [Google Scholar] [CrossRef] - Williamson, M.; Dehnen-Schmutz, K.; Kühn, I.; Hill, M.; Klotz, S.; Milbau, A.; Pyšek, P. The distribution of range sizes of native and alien plants in four European countries and the effects of residence time. Divers. Distrib.
**2009**, 15, 158–166. [Google Scholar] [CrossRef] - Cook, A.; Marion, G.; Butler, A.; Gibson, G. Bayesian inference for the spatio-temporal invasion of alien species. Bull. Math. Biol.
**2007**, 69, 2005–2025. [Google Scholar] [CrossRef] [PubMed] - Clark, J.S.; Scher, C.L.; Swift, M. The emergent interactions that govern biodiversity change. Proc. Natl. Acad. Sci. USA
**2020**, 117, 17074–17083. [Google Scholar] [CrossRef] [PubMed] - West, M.; Harrison, P.J. Bayesian Forecasting and Dynamic Models, 2nd ed.; Springer: New York, NY, USA, 1997. [Google Scholar]
- Miller, D.A.; Pacifici, K.; Sanderlin, J.S.; Reich, B.J. The recent past and promising future for data integration methods to estimate species’ distributions. Methods Ecol. Evol.
**2019**, 10, 22–37. [Google Scholar] [CrossRef] - Hastie, T.; Fithian, W. Inference from presence-only data; the ongoing controversy. Ecography
**2013**, 36, 864–867. [Google Scholar] [CrossRef] [PubMed] - Alasbahi, R.H.; Melzig, M.F. Plectranthus barbatus: A review of phytochemistry, ethnobotanical uses and pharmacology-Part 1. Planta Med.
**2010**, 76, 653–661. [Google Scholar] [CrossRef] - Phillips, L.A.; Greer, C.W.; Farrell, R.E.; Germida, J.J. Field-scale assessment of weathered hydrocarbon degradation by mixed and single plant treatments. Appl. Soil Ecol.
**2009**, 42, 9–17. [Google Scholar] [CrossRef] - Botella, C.; Joly, A.; Monestiez, P.; Bonnet, P.; Munoz, F. Bias in presence-only niche models related to sampling effort and species niches: Lessons for background point selection. PLoS ONE
**2020**, 15, e0232078. [Google Scholar] [CrossRef] - Loveland, T.R.; Reed, B.C.; Brown, J.F.; Ohlen, D.O.; Zhu, Z.; Yang, L.W.M.J.; Merchant, J.W. Development of a global land cover characteristics database and IGBP DISCover from 1 km AVHRR data. Int. J. Remote Sens.
**2000**, 21, 1303–1330. [Google Scholar] [CrossRef] - Harris, I.; Jones, P.D.; Osborn, T.J.; Lister, D.H. Updated high-resolution grids of monthly climatic observations—The CRU TS3.10 Dataset. Int. J. Climatol.
**2014**, 34, 623–642. [Google Scholar] [CrossRef] - Fick, S.E.; Hijmans, R.J. WorldClim 2: New 1km spatial resolution climate surfaces for global land areas. Int. J. Climatol.
**2017**, 37, 4302–4315. [Google Scholar] [CrossRef] - Bellard, C.; Thuiller, W.; Leroy, B.; Genovesi, P.; Bakkenes, M.; Courchamp, F. Will climate change promote future invasions? Glob. Chang. Biol.
**2013**, 19, 3740–3748. [Google Scholar] [CrossRef] [PubMed] - Araújo, M.B.; Pearson, R.G.; Thuiller, W.; Erhard, M. Validation of species–climate impact models under climate change. Glob. Chang. Biol.
**2005**, 11, 1504–1513. [Google Scholar] [CrossRef] [Green Version] - Chalmandrier, L.; Hartig, F.; Laughlin, D.C.; Lischke, H.; Pichler, M.; Stouffer, D.B.; Pellissier, L. Linking functional traits and demography to model species-rich communities. Nat. Commun.
**2021**, 12, 1–9. [Google Scholar] [CrossRef] - McLean, P.; Gallien, L.; Wilson, J.R.U.; Gaertner, M.; Richardson, D.M. Small urban centres as launching sites for plant invasions in natural areas: Insights from South Africa. Biol. Invasions
**2017**, 19, 3541–3555. [Google Scholar] [CrossRef] - Potgieter, L.J.; Douwes, E.; Gaertner, M.; Measey, G.J.; Paap, T.; Richardson, D.M. Biological invasions in South Africa’s urban ecosystems: Patterns, processes, impacts and management. In Biological Invasions in South Africa; Van Wilgen, B.W., Measey, J., Richardson, D.M., Wilson, J.R.U., Zengeya, T.A., Eds.; Springer: Berlin/Heidelberg, Germany, 2020; pp. 275–309. [Google Scholar]
- Pages, M.; Fischer, A.; van der Wal, R.; Lambin, X. Empowered communities or “cheap labour”? Engaging volunteers in the rationalised management of invasive alien species in Great Britain. J. Environ. Manag.
**2019**, 229, 102–111. [Google Scholar] [CrossRef] - Catterall, S.; Cook, A.R.; Marion, G.; Butler, A.; Hulme, P.E. Accounting for uncertainty in colonisation times: A novel approach to modelling the spatio-temporal dynamics of alien invasions using distribution data. Ecography
**2012**, 35, 901–911. [Google Scholar] [CrossRef] - Groom, Q.; Adriaens, T.; Bertolino, S.; Poelen, J.H.; Reeder, D.; Richardson, D.M.; Simmons, N. Holistic understanding of contemporary ecosystems requires integration of data on domesticated, captive, and cultivated organisms. Biodivers. Data J.
**2021**, 9, e65371. Available online: https://bdj.pensoft.net/article/65371/ (accessed on 29 August 2022). [CrossRef] - Li, E.; Parker, S.S.; Pauly, G.B.; Randall, J.M.; Brown, B.V.; Cohen, B.S. An urban biodiversity assessment framework that combines an urban habitat classification scheme and citizen science data. Front. Ecol. Evol.
**2019**, 7, 277. [Google Scholar] [CrossRef] - Aikio, S.; Duncan, R.P.; Hulme, P.E. Lag-phases in alien plant invasions: Separating the facts from the artefacts. Oikos
**2010**, 119, 370–378. [Google Scholar] [CrossRef] - Nelson, G.; Ellis, S. The history and impact of digitization and digital data mobilization on biodiversity research. Philos. Trans. R. Soc. B
**2019**, 374, 20170391. [Google Scholar] [CrossRef] [PubMed] - Randin, C.F.; Dirnböck, T.; Dullinger, S.; Zimmermann, N.E.; Zappa, M.; Guisan, A. Are niche-based species distribution models transferable in space? J. Biogeogr.
**2006**, 33, 1689–1703. [Google Scholar] [CrossRef] - Elliott-Graves, A. The problem of prediction in invasion biology. Biol. Philos.
**2016**, 31, 373–393. [Google Scholar] [CrossRef] - Cole, D.J. Bayesian Identifiability. In Parameter Redundancy and Identifiability; Chapman and Hall: London, UK, 2020; pp. 101–153. [Google Scholar]
- MacKenzie, D.I.; Nichols, J.D.; Lachman, G.B.; Droege, S.; Royle, J.A.; Langtimm, C.A. Estimating site occupancy rates when detection probabilities are less than one. Ecology
**2002**, 83, 2248–2255. [Google Scholar] [CrossRef] - Rosenthal, J.S. Optimal proposal distributions and adaptive MCMC. Handb. Markov Chain. Monte Carlo
**2011**, 4, 93–112. [Google Scholar] - Hartig, F. BayesianTools: General-Purpose MCMC and SMC Samplers and Tools for Bayesian Statistics, R package version 0.1-7; 2017; CRAN; Available online: https://cran.r-project.org/web/packages/BayesianTools/index.html (accessed on 29 August 2022).
- Rosenthal, M.; Glew, R. Medical Biochemistry, 1st ed.; Wiley: Hoboken, NJ, USA, 2011; Original work published 2011; Available online: https://www.perlego.com/book/1008615/medical-biochemistry-pdf (accessed on 29 August 2022).
- Brooks, S.P.; Gelman, A. General methods for monitoring convergence of iterative simulations. J. Comput. Graph. Stat.
**1998**, 7, 434–455. [Google Scholar] - Jiménez-Valverde, A. Insights into the area under the receiver operating characteristic curve (AUC) as a discrimination measure in species distribution modelling. Glob. Ecol. Biogeogr.
**2012**, 21, 498–507. [Google Scholar] [CrossRef]

**Figure 1.**Maps of estimated mean population sizes and ages in the initial year 1980.

**Top:**Mean posterior estimate of population size across the introduction cells.

**Bottom:**Mean posterior estimate of the mean population age across the introduction cells. The associated posterior distributions are provided in Figure A5 and Figure A6 of Appendix A.4.

**Figure 2.**Summarized model fitting procedure for a single Monte Carlo Markov chain with Metropolis–Hastings sampling algorithm. The parameters ${\Theta}_{0}$ included 48 parameters for the initial populations, 14 for the ecological process, and 3 for the sampling process.

**Figure 3.**A reconstruction of Plectranthus barbatus invasion in the Southern Cape, South Africa, Part 1. Maps of population size percentile range for selected years between 1980 and 2021.

**Figure 4.**A reconstruction of Plectranthus barbatus invasion in the Southern Cape, South Africa, Part 1. Maps of population growth syndrome for selected years between 1980 and 2021.

**Figure 5.**Global invasion metrics across sample parameters (mean / black line and 95% confidence interval / grey ribbon) over the study period (1980–2021).

**Top:**Log10 of total population size per year.

**Middle:**Number of 100 km² spatial cells colonized per year.

**Bottom:**Log10 of total seed production per year.

**Figure 6.**Contribution of long-distance (LD) dispersal to population recruitment over years. The posterior mean (solid curve) of the log10-number of new plants growing from seeds dispersed by long-distance dispersal is shown with its 95% confidence interval (grey ribbon) for each year.

**Figure 7.**Relative reduction in population size under ablation of long-distance dispersal (red) or short-distance dispersal (blue) per year. For each ablated dispersal mode, we show the mean and 90% confidence interval (across posterior samples) of the population difference between the ablated model and the full one divided by the population of the full model.

**Figure 8.**Posterior age-structured fecundity scaled by the maximal fecundity (M): mean posterior estimate (solid black line) and 95% confidence interval (gray ribbon). We first computed the curve from the three fecundity parameters ($\widehat{k},\theta ,M$ for each posterior sample and then calculated the mean and quantile values (2.5% and 97.5%) per age.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Botella, C.; Bonnet, P.; Hui, C.; Joly, A.; Richardson, D.M.
Dynamic Species Distribution Modeling Reveals the Pivotal Role of Human-Mediated Long-Distance Dispersal in Plant Invasion. *Biology* **2022**, *11*, 1293.
https://doi.org/10.3390/biology11091293

**AMA Style**

Botella C, Bonnet P, Hui C, Joly A, Richardson DM.
Dynamic Species Distribution Modeling Reveals the Pivotal Role of Human-Mediated Long-Distance Dispersal in Plant Invasion. *Biology*. 2022; 11(9):1293.
https://doi.org/10.3390/biology11091293

**Chicago/Turabian Style**

Botella, Christophe, Pierre Bonnet, Cang Hui, Alexis Joly, and David M. Richardson.
2022. "Dynamic Species Distribution Modeling Reveals the Pivotal Role of Human-Mediated Long-Distance Dispersal in Plant Invasion" *Biology* 11, no. 9: 1293.
https://doi.org/10.3390/biology11091293