Multi-Scenario Species Distribution Modeling

Correlative species distribution models (SDMs) are increasingly being used to predict suitable insect habitats. There is also much criticism of prediction discrepancies among different SDMs for the same species and the lack of effective communication about SDM prediction uncertainty. In this paper, we undertook a factorial study to investigate the effects of various modeling components (species-training-datasets, predictor variables, dimension-reduction methods, and model types) on the accuracy of SDM predictions, with the aim of identifying sources of discrepancy and uncertainty. We found that model type was the major factor causing variation in species-distribution predictions among the various modeling components tested. We also found that different combinations of modeling components could significantly increase or decrease the performance of a model. This result indicated the importance of keeping modeling components constant for comparing a given SDM result. With all modeling components, constant, machine-learning models seem to outperform other model types. We also found that, on average, the Hierarchical Non-Linear Principal Components Analysis dimension-reduction method improved model performance more than other methods tested. We also found that the widely used confusion-matrix-based model-performance indices such as the area under the receiving operating characteristic curve (AUC), sensitivity, and Kappa do not necessarily help select the best model from a set of models if variation in performance is not large. To conclude, model result discrepancies do not necessarily suggest lack of robustness in correlative modeling as they can also occur due to inappropriate selection of modeling components. In addition, more research on model performance evaluation is required for developing robust and sensitive model evaluation methods. Undertaking multi-scenario species-distribution modeling, where possible, is likely to mitigate errors arising from inappropriate modeling components selection, and provide end users with better information on the resulting model prediction uncertainty.

.4 The canonical factor loadings for the 2 nd -5 th dimension. (Subsequent graphs to the 1 st dimension graph given in Figure 4 of the manuscript).
Structure correlations (canonical factor loadings) for the second -fifth canonical dimensions. Arrows show the vector direction of variables that correspond to the canonical component on the y-axis. The corresponding variables for the x-axis (combinations of modelling components) were not labelled so not to overcrowd the graph. Red line indicates the linear regression line, blue ellipse (data ellipse) shows 68% of the data points (approx. 1SD) and their centroid (filled black dot) in relation to the linear regression line, green line shows the locally weighted scatterplot smoothing (LOWESS) fit. The grey shaded rows cover variables that lie in the top half (1-9) of the total 18 ranks.
To account for variables that contributed to PCA components the eigenvectors that correspond to principal component scores that explained up to 90% of the variance in the dataset were used (specifically variables with absolute loadings >= 0.32) (Dormann et al., 2013 -& references therein). In the case of NLPCA, due to the non-linear nature of feature extraction, it is not possible to get a single corresponding variable coefficient for the scores, however the final weight matrix was used as a proxy for estimating the major contributing variables toward the high variance principal component scores.

Note S5: Distribution predictions for the five species in this study
Species level results were briefly discussed in the results section, here observations related with each species and their associated prediction from the optimum model (the model that had the highest Kappa as well as the lowest cross-validation error) is discussed.
albopictus was BIOCLIM19 with the RF variable selection method and SVM model. Areas identified as having a high climatic suitability for A. albopictus could further be assessed by using high resolution data along with trade and cargo network information for the target area because used tyres and plant material imports are identified to be the most important introduction pathways for this species (Scholte & Schaffner, 2007;Scholte et al., 2008).

Anoplopis gracilipes (Smith, 1857) (Hymenoptera: Formicidae)
For A. gracilipes, areas of high probability of predicted presence obtained from the selected P1DR2QDA the Auckland area but was later eradicated (Wetterer, 2005). In light of the distribution prediction for A.
gracilipes in this chapter, it is probable that the success of the eradication could have been enhanced by the unsuitability of the climate in New Zealand. if an occurrence record from the same locality was included in the model training is particularly worrying because modellers will probably not investigate further. Therefore, further study is needed to investigate 2 such scenarios. Modelling the climatically suitable areas along with maize plantation cover is recommended to prioritize suitable areas at the risk of D. v. virgifera invasion (Aragón et al., 2010).

Vespula vulgaris (Linnaeus, 1758) (Hymenoptera: Vespidae)
The SVM model selected for V. vulgaris prediction was based on BIOCLIM19 data and a random forest variable selection method. The prediction covered the native Holarctic range of V. vulgaris and its introduced range in New Zealand including Stewart Island and Tasmania in Australia (Thomas et al., 1990;Matthews et al., 2000). An external validation carried out for New Zealand using V. vulgaris presence data obtained from the website 1 of Landcare Research showed that 91% of the occurrence sites were correctly predicted by the selected model (Appendix S10). Another area identified as a highly suitable was Southern Argentina, V. vulgaris was reported from this location in 2010 by Masciocchi et al. (2010) but no follow up report on its establishment could be found. However, since the German wasp (Vespula germanica) which co-occurs with V. vulgaris in New Zealand is present in Argentina (D'Adamo et al., 2002;Lopez-Osorio et al., 2014), it is entirely possible that the climate in the predicted areas of Argentina is also suitable for V. vulgaris. If this is the case displacement of V. germanica from Argentina is also a possibility according to the trend reported in New Zealand (Harris, 1991). A suitable area of notable size is also predicted in Canada and the U.S.A.