Mapping the Potential Global Distribution of Red Imported Fire Ant ( Solenopsis invicta Buren) Based on a Machine Learning Method

: As one of the most notorious invasive species, the red imported ﬁre ant ( Solenopsis invicta Buren) has many adverse impacts on biodiversity, environment, agriculture, and human health. Mapping the potential global distribution of S. invicta becomes increasingly important for the prevention and control of its invasion. By combining the most comprehensive occurrence records with an advanced machine learning method and a variety of geographical, climatic, and human factors, we have produced the potential global distribution maps of S. invicta at a spatial resolution of 5 × 5 km 2 . The results revealed that the potential distribution areas of S. invicta were primarily concentrated in southeastern North America, large parts of South America, East and Southeast Asia, and Central Africa. The deforested areas in Central Africa and the Indo-China Peninsula were particularly at risk from S. invicta invasion. In addition, this study found that human factors such as nighttime light and urban accessibility made considerable contributions to the boosted regression tree (BRT) model. The results provided valuable insights into the formulation of quarantine policies and prevention measures.


Introduction
Biological invasion is a global problem, causing serious environmental, economic, and social damages [1,2]. Included in "100 of the World's Worst Invasive Alien Species" [3], the red imported fire ant (Solenopsis invicta Buren) is recognized as one of the most widespread and damaging invasive pests known to impact ecosystem processes, agricultural production, infrastructure, and human health [4][5][6][7][8][9]. S. invicta is native to South America and was accidentally introduced into the United States in the 1930s [10,11]. In subsequent years, this species rapidly spread throughout California and other regions of the world, including the Caribbean islands, Australia, New Zealand, Japan, China, and South Korea [12][13][14][15][16]. Given the strong adaptive and reproductive capacity, S. invicta has great potential to colonize numerous other regions, inflicting enormous damage to the local economy and ecosystems [17,18]. Prevention of biological invasion is much less expensive than post-entry control.
Hence, it is essential to study the potential global distribution of S. invicta to provide a scientific basis for the formulation of prevention and control measures.
Species distribution models (SDMs) have been widely used in predicting the potential geographic distribution of S. invicta (Table 1). For example, Killion et al. applied a colony-growth model to examine the potential range of S. invicta in the United States by incorporating temperature-driven development rates of S. invicta life-stages and simulating the number of workers in a colony [19]. Korzukhin et al. formulated a dynamic model of colony growth and alate production on the assumption that soil temperature was the key ecological factor determining colony growth and reproduction to predict the current extreme distributions and future range of S. invicta in the United States [20]. Morrison et al. superimposed precipitation data upon the temperature-based predictions to estimate colony alate production and predicted the future global geographic range limits of S. invicta at the station scale [21]. Sutherst and Maywald used the CLIMEX model to estimate the response of S. invicta to temperature and moisture from its range in the United States and estimated the potential global areas at risk for S. invicta invasion [14]. Ward modelled the potential geographic distribution of six invasive ant species in New Zealand by three different methods (BIOCLIM, DOMAIN, and Maxent) with 19 bioclimatic variables [22]. Ulrichs et al. predicted S. invicta distributions with climate (relative humidity, temperature, and precipitation) and habitat data (landcover type) [23]. More recently, Wang et al. quantified colony growth based on daily air temperature and precipitation data, to simulate the potential range of S. invicta in China under current and future climate conditions [24]. Sung et al. modelled the potential distribution of S. invicta under current climate conditions using six different species distribution models combined with 19 bioclimatic variables, and selected the random forest (RF) model to obtain its potential global distribution under climate change [13]. Numerous studies have indicated that biological invasion is often closely associated with human impacts [25][26][27][28]. Human activities such as trade, tourism, and transport are indispensable indicators that determine the potential range of invasive species [29]. S. invicta is more likely to colonize disturbed environments of human-associated habitats [22,[30][31][32], and the cross-regional spread is often aided by human transportation [33,34]. However, existing studies focus on the impact of climatic factors on the distribution of S. invicta, whereas few of them take human factors into account during modelling. Besides, the occurrence localities of S. invicta used in existing studies are relatively limited, and few studies predict the global potential distribution of S. invicta on a high spatial resolution. Therefore, the main purposes of this study are to (1) Fully collect the location records of S. invicta.
(2) Comprehensively analyze the climatic, geographical, and human factors that may influence the distribution of S. invicta and quantitatively simulate the relationship between each type of factors and S. invicta presence with a machine learning method-the boosted regression tree (BRT). (3) Predict the global potential distribution of S. invicta at a spatial resolution of 5 × 5 km 2 .

Environmental Variables
The geographical distribution of S. invicta is influenced by many environmental factors. In the present study, these factors were divided into three categories: climatic factors, geographical factors, and human factors.
Climatic factors such as temperature and precipitation are the main constraints that limit the distribution and growth of S. invicta. For example, continental areas where the annual precipitation is less than 510 mm, cannot provide a suitable habitat for S. invicta [21,35,36], and its overwintering population is confined by the minimum temperature of the winter season [37,38]. Previous studies have also indicated that the foraging activity and colony growth of S. invicta have remarkable relations with temperature and humidity [30,[39][40][41]. Therefore, we adopted accumulated annual precipitation, maximum temperature, minimum temperature, and relative humidity to reflect the climatic conditions.
Geographical factors also play a substantial role in determining the distribution of S. invicta. S. invicta is well adapted to opportunistic exploitation of disturbed habitats and some extreme environments [42]. For instance, in arid continental areas, S. invicta will be able to successfully establish if there are permanent sources of water (i.e., lakes, rivers, or dams) or regularly irrigated areas (i.e., fields or lawns) [11,21]. Additionally, some studies have indicated that S. invicta prefers open, sunny areas while it is less abundant in densely wooded areas [30,43]. Thus, the data of distance to a river, distance to lake, and distance to ocean [44] were adopted to reflect the water availability, the normalized difference vegetation index (NDVI) data was applied to reflect the density of vegetation cover, and the elevation dataset was used to reflect the temperature variance with altitude.
Human activities and urbanization are crucial factors that affect the distribution and abundance of ant species [45][46][47]. S. invicta may look for sanctuary in human habitations or infrastructure (such as climate-controlled buildings or greenhouses) in areas where the cold climate makes it difficult to overwinter [14]. The natural dispersal distances are usually limited to hundreds of meters or a few kilometers [30], while long-distance dispersal occurs primarily through human activities or major disturbances [22,32]. Commercial transportation of agricultural or landscaping materials (e.g., containers, equipment, or potted plants soil) has resulted in the cross-regional spread of S. invicta colonies [30,34]. Land cover type, population density, and nighttime light were obtained to represent the extent of human activities. The urban accessibility dataset, for estimating the travel time to major cities, was used to reflect the connectivity of different locations and the concentration of economic activity.
The environmental variables used for this study were received in a gridded format with various resolutions. To ensure the spatial consistency of these variables, we converted the spatial resolutions of all variable data to 0.05 degrees (approximately 5 km). Detailed information on the related spatial variables used in this study is shown in Table 2.

Presence Data
In this study, the occurrence localities of S. invicta were derived from various sources, including websites, literature, field surveys, and government reports. S. invicta occurrence records with detailed geographical coordinate information were downloaded from the website of the Global Biodiversity Information Facility (GBIF) [55], supplemented by AntWeb [56] and Centre for Agriculture and Bioscience International (CABI) [57]. Published studies and maps of S. invicta occurrences were also applied to extract the location information of S. invicta. Also, in this study, we used data from JK Wetterer's records to document the worldwide range of S. invicta until 2013, including both published and unpublished [58]. The occurrence records from the literature published after 2013 were included as well, as shown in Table S1 (see supplementary materials).
In addition, according to the "List of administrative areas for the distribution of agricultural plant quarantine pests in China 2019", 387 counties in 12 provinces of China have been invaded by S. invicta [59]. Since the occurrence was recorded at the county scale, the county centroids were used as the occurrence locations. To obtain more localities of S. invicta occurrence in China, we went to Fujian Province to conduct field surveys and collected the locations of nests of S. invicta.
After deduplication, a total of 1610 occurrence records of S. invicta were collected, as shown in Figure 1. The occurrence records of S. invicta from websites and literature were mainly located in North and South America, whereas records from field surveys and government reports were mostly distributed in China.

Pseudo-Absence Data
Pseudo-absences (PAs), also known as background data, are widely used in species distribution modelling when only presence data is available. As the selection of PAs could severely impact the model performance, different strategies have been proposed to improve the selection of appropriate PAs. One of the methods is to randomly select PAs from all points outside a pre-defined region based on a simple preliminary model or based on a minimum distance to the presence. Barbet-Massin et al. showed that this method performed better in machine learning methods (i.e., BRT and RF), and recommended using the same number of PAs as available presences [60]. In this study, we derived PAs with environmental exclusion based on prior knowledge from studies of Killion et al. [19] and Korzukhin et al. [20]. Their studies applied colony-growth models which incorporated temperature and precipitation (in Korzukhin et al. [20]) to identify the potential range of S. invicta. The estimated annual precipitation (510 mm) and annual minimum temperature (−17.8°C) were applied as reasonable thresholds that limit the range expansion for S. invicta. By setting environmental limits, PAs are selected outside the suitable area of the species to keep a certain distance from the presence points.
Besides, water and iced bodies were also excluded from PAs selection by overlaying with the land cover map. To reduce the effect of sample selection bias on the model prediction, we implemented the selection of PAs 300 times randomly. After each random selection process, we constructed 3220 samples (1610 presences and 1610 PAs) and divided the samples into two subsets, the training samples and validation samples accounted for 75% (2415) and 25% (805) of the total samples respectively.

Boosted Regression Tree Model
In this study, a machine learning method, the boosted regression tree (BRT) model, was adopted to predict the potential global distribution of S. invicta. BRT is a combination of statistical and machine learning techniques. It has been widely used in species distribution modelling [61][62][63][64]. BRT combines the advantages of both regression trees and boosting algorithms. The characteristics from the tree-based methods give the BRT abilities to deal with different types of predictor variables (numeric, categorical, binary, etc.), accommodating missing data, and being insensitive to outliers [65,66]. In addition, the model can fit complex nonlinear relationships as well as identifying and modelling the interactions between different predictors automatically. Based on the idea that it is easier to find and compute an average from many rough rules than to find a single highly accurate prediction rule, the boosting technique combines many simple tree models to improve the performance and predictive accuracy of single tree models [65]. Boosting implements a forward and stepwise procedure to merge results of several competing models, where tree models are fitted interactively to a subset of the training data, using appropriate methods (stochastic gradient descent here) gradually to increase emphasis on observations that were modelled poorly by the existing collection of trees.
Based on the above-mentioned reasons, the BRT model was applied in the present study. The R version 3.3.3 statistical programming environment [67] was used in combination with the extension packages (i.e., dismo [68], caret [69], and gbm [70]) to build the BRT model and evaluate the prediction accuracy. The optimal settings of the BRT models were determined by the cross-validation results from 300 times repeated computations. The learning rate determines the contribution of each tree to the growing model, the tree complexity controls whether interactions are fitted and the bag fraction determines the proportion of the used training data. The main tuning parameters were set as follows (tree.complexity = 4, learning.rate = 0.005, bag.fraction = 0.75, step.size = 10, cv.folds = 10, max.trees = ), and the other parameters were kept as gbm defaults. Following these steps, we fitted an ensemble of 300 BRT models to increase the robustness of the model prediction and quantify the model uncertainty.

Accuracy Evaluation
A ten-fold cross-validation method was applied to each model to avoid overfitting, and the area under the curve (AUC) statistic and true skill statistic (TSS) were used to evaluate the predictive performances of the BRT models. The validation statistics suggested that the ensemble BRT model performed well. The AUC values for the training dataset and validation dataset were 0.981(±0.004) and 0.981(±0.009) respectively, and the TSS were 0.879 for the training dataset and 0.853 for the validation dataset, which indicated a high predictive accuracy. In addition, the model uncertainty was also quantified in the spatial predictions based on the standard deviation values across the model ensemble. The uncertainty analysis revealed that the prediction uncertainty in most areas was low (lower than 0.26), and high uncertainty areas were mostly distributed around high-risk areas for S. invicta infestation (Figure 2). 1 Figure 2. Uncertainty distribution of the BRT model.

Potential Risk of S. Invicta Invasion
On the foundation of the observed relationship between S. invicta occurrence records and each environment predictor, we used the fitted ensemble BRT models to predict the global infestation risk of S. invicta. Figure 3 shows the infestation risk level, also considered as the suitability of S. invicta, on a continuous scale from 0 to 1, which was generated by calculating the mean prediction across all models for each grid cell. The predicted infestation risk map revealed that the high-risk areas are primarily concentrated in medium and low latitude regions, including southeastern North America, large parts of South America, East and Southeast Asia, and Central Africa, which were consistent with the current distribution range of the species. The predicted medium-risk areas were distributed around the high-risk areas.
In North America, the predicted high-risk areas were mainly distributed in the southeast and west coast of the United States, the east coast of Mexico, northern Guatemala, and northern Belize ( Figure S1). In South America, the potential areas suitable for S. invicta were located in most parts, including Brazil, Colombia, Venezuela, Guyana, Suriname, French Guiana, eastern Peru, northeastern Bolivia, western Paraguay, northeastern Argentina, and Uruguay ( Figure S2). The suitable areas in Europe were scattered around the Mediterranean, in coastal regions of Portugal, Spain, France, Italy, Albania, and Greece ( Figure S3). In Africa, the suitable areas were mainly concentrated in Central Africa, including southern Cameroon, Equatorial Guinea, Gabon, western Angola, Congo, midwestern DR Congo as well as some coastal cities in Ivory Coast, Ghana, Benin, Nigeria, and South Africa ( Figure S4). In Asia, the predicted high-risk areas were mainly distributed in East Asia, Southeast Asia, and coastal South Asia, specifically including southern China, Indo-China Peninsula (i.e., Burma, Laos, Thailand, Cambodia, and Vietnam), Malaysia, northern Philippines, and Bangladesh. Coastal areas in India, Indonesia, South Korea, and Japan were also medium or high-risk areas ( Figure S5). In Oceania, the southeastern coast of Australia, northern New Zealand, and coastal Papua New Guinea were at medium risk ( Figure S6). It is noteworthy that in both Central Africa and Southeast Asia (particularly on Indo-China Peninsula), though there were no or very few records of previous S. invicta presence, the model predicted high risk in these regions. Also, in some coastal areas and islands of Central America, the Mediterranean, Oceania, the Indian Ocean, and Southeast Asia, the model also predicted medium-or high-risk for S. invicta infestation.

Relative Contribution of Environmental Factors
To identify the key factors that determine the potential distribution of S. invicta, we quantified the contribution of each predictor to the ensemble BRT model using the relative contribution (RC) indicator ( Table 3). The statistics suggested that the climatic factors accounting for 71.14% of the variation explained by the ensemble BRT models, were the most important predictors in the model, followed by human factors (15.55%) and geographical factors (13.31%). Accumulated annual precipitation was the most significant predictor, with a relative contribution rate of 51% (±7.21%), followed by the maximum temperature, which had a relative contribution of 14.98% (±4.44%). The distance to a river, nighttime light, and urban accessibility had relative contributions of 8.34% (±2.33%), 6.75% (±3.24%), and 4.8% (±2.22%), respectively. The contribution rates of other predictors are shown in Table 3.
As illustrated in Figure 4, the accumulated annual precipitation was positively correlated to the probability of suitable land for S. invicta, an increase in the probability was observed as the accumulated annual precipitation initially increased from 500 mm. The maximum temperature and nighttime light also showed a positive correlation with suitability, whereas the distance to a river, urban accessibility presented a negative correlation with suitability, and for the minimum temperature, the correlation was not significant.  Figure 4. Marginal effect plots of main spatial predictors overall 300 boosted regression tree (BRT) ensembles fitted to the full data set. The black lines depict the mean effect curves, and the shaded areas represent the 95% confidence interval.

Discussion
The results were consistent with S. invicta's current range and the predictions of previous studies [13,14,21], and our study presented more details and finer distinctions with a higher resolution. The potential suitable areas for S. invicta were primarily concentrated in southeastern North America, large parts of South America, East and Southeast Asia, and Central Africa. This was also in line with the predictions of the Maxent model (see Figure S7). By comparison with the potential distribution of the black imported fire ant (Solenopsis richteri), a closely related species to S. invicta, we found that S. richteri was able to survive in higher latitude than S. invicta. S. richteri had broader potential in temperate regions such as Europe, East Asia, South Africa, and eastern North America [71]. By contrast , S. invict was more abundant in Southeast Asia, Centra Africa, and northern South America. This was largely because S. richteri is more tolerant to cold than S. invicta, while S. invicta has higher tolerance to heat and desiccation stresses than S. richteri.
These maps could be used to identify areas where S. invicta could establish but has yet to be reported or areas where the infestation risk is underestimated. For example, in Central Africa and Indo-China Peninsula, though S. invicta has rarely been observed or reported before, the BRT model predicted high-risks for S. invicta infestation. It is worth noting that the geographical distribution range of these high-risk areas overlapped with that of tropical rainforests. Many studies suggest that S. invicta constructs earthen mounds in the open, sunny areas for brood thermoregulation and is less abundant in the warm, wet, and dense forests [21,42,72]. Therefore, it is more likely to establish in those disturbed and developed forested areas, including the edges of forests or agricultural areas [73], deforested areas are especially in danger of becoming colonized. According to the "Global Ecosystems and Environment Observation and Analysis Annual Report 2019" [74], forest coverage decreased in both Central Africa and Indo-China Peninsula from 2000 to 2018, largely as a result of devastating forests for arable land [75,76]. The deforested areas could provide suitable habitats for the establishment of S. invicta, and should be taken into adequate account. In addition, the model also predicted medium-or high-risk in some coastal areas and islands of Central America, the Mediterranean, Oceania, the Indian Ocean, and Southeast Asia, where the ecological environment is relatively fragile, the introduction of S. invicta may cause devastating damage to local species and biodiversity. Therefore, inspection and quarantine measures in these medium-or high-risk areas need to be strengthened, thereby preventing it from becoming widespread, and minimizing the ecological and economic impacts.
Climatic variables were the most important factors responsible for the environmental suitability of S. invicta. Precipitation and temperature are the major determinants that limit the distribution range of S. invicta. The low minimum temperature in the winter, the high maximum temperature in the summer, or inadequate precipitation would prevent it from becoming successfully established. As illustrated in Figure 4, when the accumulated annual precipitation increased from 500 mm and the maximum temperature increased from 15°C, the habitat suitability for S. invicta also increased, which conform to the ecological characteristics of the species [40,77,78]. As the survival of S. invicta in arid regions is highly dependent on sources of permanent water or irrigated areas, the distance to a river also made considerable contributions to the BRT model. The suitability gradually decreased as the distance to a river increased. It is encouraging that human factors also had a strong influence on the model prediction with a total contribution of 15.55%. Nighttime light and urban accessibility appeared to be important human factors determining the potential distribution of S. invicta. Specifically, the environmental suitability for S. invicta increased as the nighttime light index rose (which indicates a high degree of human activities). This may reflect its preference for human disturbances, S. invicta has been shown to inhabit a wide variety of human habitats and infrastructures [30][31][32]. The urban accessibility factor is negatively correlated with suitability. When the travel time to major cities increased (which represents lower connectivity and less economic activity), the environmental suitability for S. invicta decreased. These results added credibility to the rationality of the initial assumption that human activities may provide S. invicta with suitable living conditions and promote its spread.
It should be noted that the potential distribution of S. invicta is also affected by many other factors. For example, soil properties, flooding, and the distribution of food resources, natural enemies, and host plants may also affect the distribution of S. invicta [30,31,[79][80][81], future prediction could be refined by including more comprehensive factors and higher spatio-temporal resolution datasets.

Conclusions
In this study, we combined the most comprehensive occurrence records with an advanced machine learning method and a variety of variables to predict the potential global distribution of S. invicta. Our results indicated that the potential distribution areas of S. invicta were primarily concentrated in southeastern North America, large parts of South America, East and Southeast Asia, and Central Africa. The deforested areas in Central Africa and the Indo-China Peninsula were especially at risk from S. invicta invasion. Some islands and coastal areas in Central America, the Mediterranean, Oceania, the Indian Ocean, and Southeast Asia were also found to be suitable habitats for S. invicta. These findings could provide a scientific basis to formulate prevention and control measures proactively. Additionally, human factors such as nighttime light and urban accessibility made considerable contributions to the BRT model, this could provide an important baseline for incorporating human factors in modelling the potential distribution of S. invicta as well as other species.
Supplementary Materials: The following are available online at http://www.mdpi.com/2071-1050/12/23/10182/s1, Table S1: Location records of S. invicta from literatures published after 2013, Figure S1: Occurrence points and infestation risk level of S. invicta in North America, Figure S2: Occurrence points and infestation risk level of S. invicta in South America, Figure S3: Occurrence points and infestation risk level of S. invicta in Europe, Figure S4: Occurrence points and infestation risk level of S. invicta in Africa, Figure S5: Occurrence points and infestation risk level of S. invicta in Asia, Figure S6: Occurrence points and infestation risk level of S. invicta in Oceania, Figure S7: Global infestation risk level of S. invicta predicted by Maxent, Figure S8: Sampling points of S. invicta field survey, Figure S9: Sampling points pictures of S. invicta field survey.