Analysis of the Adaptative Strategy of Cirsium vulgare (Savi) Ten. in the Colonization of New Territories

: The current situation of global environmental degradation as a result of anthropogenic activities makes it necessary to open new research lines focused on the causes and effects of the main alterations caused in the ecosystems. One of the most relevant is how the niche dynamics of invasive species change between different geographical areas, since its understanding is key to the early detection and control of future invasions. In this regard, we analyzed the distribution pattern of Cirsium vulgare (Savi) Ten., a plant of the Asteraceae family originally from the Eurasian region that currently invades wide areas of the world. We estimated its niche shifts between continents using a combination of principal components analysis (PCA) and Ecological Niche Modelling (ENM) on an extensive set of data on global presences of its native and invaded ranges from Global Biodiversity Information Facility (GBIF). A set of bioclimatic variables and the Human Footprint (HFP) with a resolution of 10 km were selected for this purpose. Our results showed that the species has a marked global trend to expand toward warmer climates with less seasonality, although in some regions its invasiveness appears to be less than in others. The models had a good statistical performance and high coherence in relation to the known distribution of the species and allowed us to establish the relative weight of the contribution of each variable used, with the annual temperature and seasonality being the determining factors in the establishment of the species. Likewise, the use of non-climatic variable HFP has provided relevant information to explain the colonizing behavior of the species. The combination of this methodology with an adequate selection of predictor variables represents a very useful tool when focusing efforts and resources for the management of invasive species.


Introduction
Globalization, with its intense commercial activity of nations around the world, implies a massive flow of transportation of goods, people, and others from one territory to another. All the processes carried out in these movements involve strong environmental consequences such as the emission of pollutants, global warming, changes in land use, destruction of habitat, and landscape fragmentation [1][2][3]. No less important is the spread of some species with high colonizing capacity that give rise to biological invasions, which is considered to be one of the phenomena responsible for the planet's loss of biodiversity [4,5].
Although a large number of species can be transported voluntarily or involuntarily from their natural habitats to other territories [6], the majority do not survive because they do not adapt to new ecological conditions. Some of them become naturalized, coexisting in harmony with the native species, but a few are able to overcome adaptive barriers and become invasive, displacing native species and causing serious damage to the balance of the ecosystem [7][8][9]. Despite having more information available on biological invasions, make it possible to anticipate their invasion, and these alternatives must be supported by science-based research such as the current study.
We tested the application of the methodologies described above using georeferenced presence data from the Global Biodiversity Information Facility (GBIF) platform in conjunction with an extensive literature review of the species' distribution. Raster's layers were downloaded from Chelsa repositories and resampled to 10 × 10 km. Parametrization was done according to previous works at regional different levels dealing with biological invasions [15,22,23,37].
The objective of this work is to test the application of specific ecoinformatics methodologies to explain and predict the adaptability to new environments of certain invasive species that are currently distributed on a global scale.

The Target Species: Cirsium vulgare (Savi) Ten. (Asteraceae)
Spear thistle (Cirsium vulgare (Savi) Ten.) is an herbaceous biennial or perennial plant, occasionally annual, thorny, erect, up to 2 m high. Its basal leaves form a rosette, and the plant produces 150 to 250 flowers with purple corollas. Its fruit is an achene [38]. Its native distribution is located in Eurasia and it is naturalized widely in many countries of the world [33,38]. C. vulgare has become an invasive plant that is difficult to eradicate mainly due to the high production and resistance of its seeds, its viable life form, and its sequential germination pattern [39]. The impact of the species on the native flora is quite notable, since it can form dense shrub populations that occupy all of the existing space, displacing other species. In particular, C. vulgare displaces smaller species by monopolizing the light capture and more efficiently extracting underground resources through its dense roots [38,40]. It is believed that this species appeared in North America in the colonial period and spread throughout the continent [35]. In Oceania (Tasmania), there are records from the 1830s, and it is possible that it was transported there from South Africa [41]. In temperate zones of South America, there is a permanent risk of invasion by this specie due to the accidental transport of contaminated agricultural products such as seeds or fodder or its deliberate introduction as an ornamental plant [41]. C. vulgare usually grows in sunny areas in well-irrigated soils with high nutrient concentrations [31,42].

Occurrence Datasets
In the current study, 387,510 occurrence data from The Global Biodiversity Information Facility (www.gbif.org (accessed on: 3 April 2020)) were downloaded, processed, and filtered to eliminate incorrect, duplicate, or badly georeferenced citations, as well as to homogenize their resolutions. Thus, 10 × 10 km grids were generated in the geographical space to avoid bias [10]. Then, an exhaustive bibliographic review of the species was carried out, and the geographical ranges were established. We obtained 24,632 grids in the native range in Eurasia. For the invaded ranges, we obtained 2967 grids in North America, 176 in South America, 284 in Africa, and 7356 in Oceania ( Figure 1).
The compiled database is the result of an extensive search of the occurrence of the species in the study area. Nevertheless, we acknowledge that this database may not represent the full range of environmental conditions in which the species can be found (e.g., other introduced areas) as in other studies elsewhere [10].

Figure 1.
The map shows all the zones for the native and invaded ranges which were defined based on the filtered presences (black spots) and the bibliographic analysis of C. vulgare. The native range, represented with the light red color, covers almost all Eurasia. The invaded range, represented by the color yellow, includes: NOA-North America; SA-South America, excluding some countries (Venezuela, Uruguay, Paraguay, Surinam, Guyana, and French Guyana); AF-Africa, in areas belonging some countries (South Africa, Ethiopia, Rwanda, and Kenya); and OCE-Oceania, the whole continent. Zones that do not fall into either range are represented in grey.

Environmental Variables
As potential predictors to characterize the species' ecological niche, we used a set of variables related to climate and human influence (Table 1). Climatic variables (seasonal and annual patterns of temperature and precipitations) were available from climatology databases at high resolutions for the Earth's land surface areas (CHELSA; http://chelsaclimate.org/ (accessed on: 05/04/2020)), which provides improved climatic estimates in landscapes with complex topography at 30 arc-seconds spatial resolutions (~1 km). Solar radiation was available from the WorldClim database (https://www.worldclim.org (accessed on: 05/04/2020)) at 30 arc-seconds spatial resolutions. Because of its effect on invasive species distribution, we included a variable related to human footprint on the landscape (https://wcshumanfootprint.org (accessed on: 05/04/2020)). This variable measures the cumulative impact of direct pressures on nature from human activities. It includes 8 inputs: The extent of built environments, crop land, pastureland, human population density, nighttime lights, railways, roads, and navigable waterways. Table 1. Selected explanatory variables, the first column of the table displays the abbreviation that was used in this study to represent the selected variables; in the second, the full name of each variable; and the third the type of variable, that is, whether it is climatic or non-climatic. Finally, in order to avoid the cross-correlation within the selected environmental variables, a multicollinearity test was conducted using Pearson's correlation coefficient [10]

Environmental Variables
As potential predictors to characterize the species' ecological niche, we used a set of variables related to climate and human influence (Table 1). Climatic variables (seasonal and annual patterns of temperature and precipitations) were available from climatology databases at high resolutions for the Earth's land surface areas (CHELSA; http://chelsaclimate.org/ (accessed on: 5 April 2020)), which provides improved climatic estimates in landscapes with complex topography at 30 arc-seconds spatial resolutions (~1 km). Solar radiation was available from the WorldClim database (https://www.worldclim.org (accessed on: 5 April 2020)) at 30 arc-seconds spatial resolutions. Because of its effect on invasive species distribution, we included a variable related to human footprint on the landscape (https://wcshumanfootprint.org (accessed on: 5 April 2020)). This variable measures the cumulative impact of direct pressures on nature from human activities. It includes 8 inputs: The extent of built environments, crop land, pastureland, human population density, nighttime lights, railways, roads, and navigable waterways. Table 1. Selected explanatory variables, the first column of the table displays the abbreviation that was used in this study to represent the selected variables; in the second, the full name of each variable; and the third the type of variable, that is, whether it is climatic or non-climatic.

Code
Name Type Finally, in order to avoid the cross-correlation within the selected environmental variables, a multicollinearity test was conducted using Pearson's correlation coefficient [10] in R software [43]. Variables with cross-correlation coefficient values of r > ±0.75 were excluded. The final explanatory variables selected were: Annual mean temperature (Bio 1), temperature seasonality (Bio 4), annual precipitation (Bio 12), precipitation seasonality (Bio 15), standard deviance of radiation (Rad_sd), and Human Footprint (HFP). Variables were resampled to 10 km to optimize processing time using an interpolation bilinear resampling technique. All spatial information processing was handled using the Spatial Analyst Tool from ArcGIS 10.5 [44].

Niche Shift Measurement
The niche shift of C. vulgare was measured on the basis of the environmental species envelope represented by explanatory variables, which contain climatic and non-climatic factors ( Table 1). The process consists of calibrating a PCA with the areas effectively occupied by the plant in its native and invaded ranges and the environmental conditions within the whole study area [21,22,32].
The process was carried out below the R-program software with the library "ecospat" [45]. First, we extracted the environmental conditions of the native range and the areas of the invaded range. Then, the PCA was calibrated, and the first 2 axes were taken into account for the analysis. Second, in order to avoid spatial bias, we divided the environmental space into 100 × 100 cells and transformed the data into densities [10]. Third, based on Schoener's D metric (0 = totally different to 1 = complete overlap), we measured the proportion of native niche that does not overlap with the invaded niche, i.e., "unfilled," and the proportion of invaded niche that does not overlap with the native niche or "expansion" [32]. In addition, in order to avoid bias, we used the 90th percentile of all environmental space and we compared it with the whole environmental extent. Fourth, the distribution of density and median environmental space in both ranges was calculated to determine the overall trend of ecological niche shift [21,22,32]. Fifth, we applied a niche equivalence test, that is, the correspondence between an observed and expected D, by randomly reassigning the occurrences of the native and invaded ranges. In this test, the null hypothesis (the niches are not identical) is rejected if p < 0.05. Also, we applied the niche similarity test, which compares the observed and expected D by randomly reassigning the occurrences in a single range. The value of p > 0.05 implies that the niches are not more similar than expected by chance [22].

Reciprocal Ecological Niche Models: Calibration and Evaluation
To explore niche conservatism across ranges of C. vulgare, we generated Ecological Niche Models that were compared to each other in a reciprocal way [10,46] using the MaxEnt program [21]. The MaxEnt model is a maximum entropy-based machine learning program that estimates the habitats suitability for a species based on the environmental constraints [21]. To generate reciprocal models, we first made distribution models of potential suitable habitats with the same occurrence points and environmental variables that were used in our PCA. Then, we projected native models into introduced ranges and visually compared them with models calibrated with data occurrence in the introduced range. We then repeated this step but projected the introduced models into the native range and compared them in the same way.
In fitting these models, we set up 15 replicates using 80% of the data for calibration and the other 20% for evaluation, the selection of feature classes (autofeature), a regularization multiplier value of 1, a maximum of 500 iterations, and 10,000 background points. To select the background points, we generated a Kernel Density map to draw background points at random in MaxEnt. This limits the background points to areas that we assume were surveyed for the species, which provides MaxEnt with a background file with the same bias as the presence locations [47]. We measured variable importance by comparing the jackknife of training gain values when models were made with individual variables. To avoid projections into environments outside which the models were trained upon, we used the 'fade-by-clamping' option in MaxEnt, which removes heavily clamped pixels from the final predictions [21]. Predictive performance of each model was assessed using 15-fold cross-validation and the area under a receiver operating characteristic curve (AUC), which measures a model's ability to discriminate presence from background records (0.5 = random, 1 = perfect).

Environmental Niche Analysis
For the target regions of this study, North America (NOA), South America (SA), Africa (AF), and Oceania (OCE) (Figure 1), a clear niche shift was observed for C. vulgare between the native and the invaded range. The overlap of "D" niches ( Table 2) between niches was much greater in NOA and OCE than in SA and AF. Metrics of niche displacement for NOA revealed that there is a large proportion of common environments occupied between ranges and that the species has expanded ("expansion") into warmer environments (Bio 1) and with less temperature variation (Bio 4). On the other hand, there is a very low proportion of environments with optimal conditions still not occupied ("unfilling," Figure 2a,e). The metrics also indicate that the presences are quite associated with anthropogenic activities (HFP), something similar occurring in OCE but in this case less expansion was observed (Figure 2d,h). In the case of SA (Figure 2b,f) and AF (Figure 2c,g), the climatic trend of the expansion was similar to that of NOA and OCE, but it was also observed that the species tends toward environments with greater seasonality of rainfall (Bio 15). The metric did not show significant variations when considering the whole extension of the environments or only marginal environments, that is, the 90th percentile, except for AF, where it was observed that the unfilled environments had a higher proportion considering the totality of the available environments. Similarly, it was found that, in all the analyses, the environments with less radiation had a generalized trend (Figure 2).  . The solid green color shows the areas that satisfy the requirements of the species but have not been occupied by it, that is, the unfilling (only invaded range). The solid blue color belongs to the stability, which is not more than the proportion of shared niche between the native and invaded ranges. The continuous OCE-Oceania. Also, continuous and discontinuous red lines indicate 100% and 90% of the available background environments for C. vulgare, respectively. The solid red areas indicate the expansion, that is, areas that are actually occupied by the species (only for native range). The solid green color shows the areas that satisfy the requirements of the species but have not been occupied by it, that is, the unfilling (only invaded range). The solid blue color belongs to the stability, which is not more than the proportion of shared niche between the native and invaded ranges. The continuous red arrow shows the environmental distance between the median of the distribution density for each range, and the discontinuous red arrow shows the environmental distance between the median of the environmental space in each range. The contribution of climate variables in the first two axes of the PCA and the correlation of variables for each of the zones are shown, respectively, in (e-h).
According to the equivalence test carried out between both ranges, the niches occupied in SA were similar to those occupied in its native range, while OCE was the region with the most differences with respect to the niche of origin. In addition, the similarity test showed that except for the NOA region, the results were repeated. The results indicated that, in a certain proportion of the regions, niches were not more similar than expected by chance, i.e., in NOA, the niches were more similar to the niche of the native region than would be expected by chance (Figure 3). red arrow shows the environmental distance between the median of the distribution density for each range, and the discontinuous red arrow shows the environmental distance between the median of the environmental space in each range. The contribution of climate variables in the first two axes of the PCA and the correlation of variables for each of the zones are shown, respectively, in (e-h).

Figure 3. Equivalence and Similarity tests to compare the niches between the native and invaded
ranges. The first row shows the equivalence values, i.e., the frequencies observed for the niche overlap index (D) in relation to the expected D for p = 0.05 (a-d). The niche similarity is shown in the second row (e-h). The first column compares native range with North America (a,e), the second column compares native range with South America (b,f), the third column compares native range with Africa (c,g), and the fourth column compares native range with Oceania (d,h).

Reciprocal Ecological Niche Models
The models generated for all regions showed a proper fitting of the models compared to random model, with good AUC values (Table 3), and a relatively low rate of omission, indicating that the presence itself of the species was correlated with the most suitable environments for it. The most important variable for C. vulgare in its native range was HFP, preceded by solar radiation (Rad_sd) ( Table 4).
Our results for the native region show a high adjustment of the suitability zones with the known presence of the species, which was expected given the greater historical extent of the colonization of this species (Figure 4). The prediction of the native model projected toward NOA (Figure 5a) was able to predict quite accurately the areas where C. vulgare is currently recorded. Projection calibrated in the invaded range ( Figure 5b) showed that there are zones in the center of the subcontinent, with several areas of high habitat suitability extending westward. Variable analysis determined that HFP and Rad_sd were the most representative variables for this model (Table 4). In the case of SA (Figure 6a), the native model fairly predicted areas where the species occurs (to the south of Brazil, Chile and Argentina) and also predicted suitable habitats in Uruguay, but it did not hit areas with presence records in northern Argentina, southern Bolivia and Peru, and the mountainous areas of Ecuador and Colombia. The model of the invaded SA (Figure 6b) range predicted highly viable habitats along Chile and in the Andes Mountains in Ecuador and Colombia. Also, annual mean temperature (Bio 1) and seasonal temperature (Bio 4) ( Table   Figure 3. Equivalence and Similarity tests to compare the niches between the native and invaded ranges. The first row shows the equivalence values, i.e., the frequencies observed for the niche overlap index (D) in relation to the expected D for p = 0.05 (a-d). The niche similarity is shown in the second row (e-h). The first column compares native range with North America (a,e), the second column compares native range with South America (b,f), the third column compares native range with Africa (c,g), and the fourth column compares native range with Oceania (d,h).

Reciprocal Ecological Niche Models
The models generated for all regions showed a proper fitting of the models compared to random model, with good AUC values (Table 3), and a relatively low rate of omission, indicating that the presence itself of the species was correlated with the most suitable environments for it. The most important variable for C. vulgare in its native range was HFP, preceded by solar radiation (Rad_sd) ( Table 4). Our results for the native region show a high adjustment of the suitability zones with the known presence of the species, which was expected given the greater historical extent of the colonization of this species (Figure 4). The prediction of the native model projected toward NOA (Figure 5a) was able to predict quite accurately the areas where C. vulgare is currently recorded. Projection calibrated in the invaded range (Figure 5b) showed that there are zones in the center of the subcontinent, with several areas of high habitat suitability extending westward. Variable analysis determined that HFP and Rad_sd were the most representative variables for this model (Table 4). In the case of SA (Figure 6a), the native model fairly predicted areas where the species occurs (to the south of Brazil, Chile and Argentina) and also predicted suitable habitats in Uruguay, but it did not hit areas with presence records in northern Argentina, southern Bolivia and Peru, and the mountainous areas of Ecuador and Colombia. The model of the invaded SA (Figure 6b) range predicted highly viable habitats along Chile and in the Andes Mountains in Ecuador and Colombia. Also, annual mean temperature (Bio 1) and seasonal temperature (Bio 4) ( Table 4) (Table 4). For OCE, the prediction of the native model (Figure 8a) matches quite well with the records of the species, while the model of the invaded range for this region predicted areas of suitable habitat for C. vulgare across Papua New Guinea in northern Oceania. The model calibrated on the invaded range indicates that the species occupies a large part of Australia and that there are suitable environments for further expansion. Another suitable area is New Zealand, where the species is widely distributed (Figure 8b). The most important variables were Bio 4 and Rad_sd (Table 4).

Discussion
The study and understanding of biological invasions are critical elements since they can provide crucial information of the impact of invasive species and its areas of potential invasion in order to develop effective strategies and prevention measures. Studies at the regional level [23,48] constructed with adequately selected predictor variables have allowed a better characterization of the ecological niche shifts at different scales and at dif-

Discussion
The study and understanding of biological invasions are critical elements since they can provide crucial information of the impact of invasive species and its areas of potential invasion in order to develop effective strategies and prevention measures. Studies at the regional level [23,48] constructed with adequately selected predictor variables have allowed a better characterization of the ecological niche shifts at different scales and at different stages of invasion [29,32]. However, defining the reasons why that happens is complex due to the different processes involved in the invasion of a species and, of course, it is essential the application of adequate methodologies to confront elements of the dynamics of ecological niches. A displacement of the niche centroid may imply that the species has found new suitable conditions within the invaded range in a non-analogous climate (climate absent in its native distribution), which would imply a shift in the realized niche but within the tolerance range of the species' fundamental niche through a preadaptation mechanism, i.e., at some point, the species could have developed in the climate in question [49,50]. Another possibility is that the species underwent changes at the genetic level beyond its fundamental niche and adapted to new environmental conditions. It has been determined that C. vulgare has adapted and proliferated to the conditions of the dune ecosystems in Chile, presenting therophytic characteristics [51]. These new capacities may have been developed in the whole invaded range, although it is possible that this may be a preexisting characteristic of the species, coinciding with what happens with other therophytic species that have been described in some areas of the Mediterranean where C. vulgare has its native range. On the other hand, the establishment of species in new environments is not such a simple matter apart from phenotypic plasticity and genetic adaptation, as biological interactions play a crucial role in the processes of colonization of new territories [50]. A recent study showed that the species of the genus Cirsium (L.) Mill., since they generally share their environmental preferences and are often found in the same geographical area, are capable of sharing genetic material by means of cross-pollination. This would generate a greater genetic diversity and would imply possible adaptive capacities to new environments [52]. Some weed control studies [36,53,54] for C. vulgare have shown that this species has many natural competitors, mainly herbivorous insects that feed on the plant, reducing its capacity to produce seeds. These natural enemies have been used to control the proliferation of the plant mainly in North America, South Africa, and Oceania. However, the production capacity and resistance of the seeds of this species make it a very difficult adversary. C. vulgare is a weed with a fairly efficient competition capacity and with high resistance to chemical control systems, especially in South Australia where these cases are common with other species not necessarily of thistles [55].
When we analyzed the niche overlap metrics between regions, we observed that the use of marginal environments produced a variation in the values obtained, something previously verified in other invasive species [56]. In our case, the same case was fulfilled in almost all the regions. For example, in NOA, the values were practically the same, although the values in the expansion and unfilling were quite low (Table 2, Figure 2). This may be due to the similarity between niches that exist between this region and its native region (Figure 4). In the case of OCE, the expansion values showed a slight variation when using marginal environments, but the region also presented low values in the expansion. There was a great variation in the unfilling ratio when using these environments, which would mean possible future invasion areas. Besides, the SA region did not present such significant variability in the use of marginal environments, but the expansion values were much more representative compared to NOA and OCE. In addition, there is an important proportion of the native niche that the species has conserved but there is also a proportion of unfilling that has not been occupied. This could be because the species has not yet been introduced into these areas. Alternatively, it could be attributed to the existence of some geographical barrier, as it is a continent with large geographical features, or to biological interactions with other organisms-predatory pressure or competition for resources-that have not allowed it to settle [12,37]. In the case of AF, the metrics showed there was a rather low proportion of overlap of niches in relation to the native range. In fact, it was the lowest compared to the other three regions analyzed, but, interestingly, the expansion metrics were quite high and rather constant when including marginal environments. On the other hand, the unfilling metric varied drastically since was moderate without marginal environments, and the metric reached an important value when including these environments. It is interesting that the direction of displacement of the species in all regions was generally toward warmer environments with less variation in temperature.
According to AUC data, the performance of our models is statistically acceptable [10,37]. The analysis of variables importance indicates the very important role of annual and seasonal temperature in the distribution of the study species (Table 4). The predictions of the models (Figures 5-8) show some differences with regard to the current distribution of the species (Figure 1). In this sense, it is important to note that, for this sort of species, it is necessary to consider variables beyond the bioclimatic ones [10,29]. In our case, the inclusion of the Human Footprint (HFP) provided us with insight into how anthropogenic activities are specially linked to the distribution of C. vulgare. This has also been observed for example in studies carried out in Africa [57] and in New Zealand [36], where the invasive species studied mainly proliferate in wastelands, roadsides, and areas of pasture cultivation. It is clear that there is a correlation between human intervention and the proliferation and success of invasive species, and some studies have highlighted the vulnerability of invaded ecosystems, emphasizing landscape structure as a determining factor in the success of the invasion [48,58]. In our case, when the weight of the Human Footprint (HFP) was analyzed (Figure 2h), we observed a relationship between the proliferation of our species in areas where humans modified the natural landscape, one of the most outstanding in the fields of agricultural production [59].
This contribution aims to show that the combination of tools and methods for predicting and inferring possible future invasions with an adequate selection of climatic and non-climatic variables, taking into account a global vision of the distribution of species at a regional level, can be a starting point for understanding the general trends of biological invasions. Likewise, through the comparison of the different types of ecological needs of the target species, it may be possible to achieve basic information for developing effective control strategies fitted to each territory and to invest resources in a more profitable way to protect native biodiversity.