Distribution and Climatic Adaptation of Wild Tomato (Solanum lycopersicum L.) Populations in Mexico

Tomato (Solanum lycopersicum L.) is a vegetable with worldwide importance. Its wild or close related species are reservoirs of genes with potential use for the generation of varieties tolerant or resistant to specific biotic and abiotic factors. The objective was to determine the geographic distribution, ecological descriptors, and patterns of diversity and adaptation of 1296 accessions of native tomato from Mexico. An environmental information system was created with 21 climatic variables with a 1 km2 spatial resolution. Using multivariate techniques (Principal Component Analysis, PCA; Cluster Analysis, CA) and Geographic Information Systems (GIS), the most relevant variables for accession distribution were identified, as well as the groups formed according to the environmental similarity among these. PCA determined that with the first three PCs (Principal Components), it is possible to explain 84.1% of the total variation. The most relevant information corresponded to seasonal variables of temperature and precipitation. CA revealed five statistically significant clusters. Ecological descriptors were determined and described by classifying accessions in Physiographic Provinces. Temperate climates were the most frequent among tomato accessions. Finally, the potential distribution was determined with the Maxent model with 10 replicates by cross-validation, identifying areas with a high probability of tomato presence. These results constitute a reliable source of useful information for planning accession sites collection and identifying accessions that are vulnerable or susceptible to conservation programs.


Introduction
Tomato (Solanum lycopersicum L.), a member of the Solanaceae family, is a worldwide species present in a wide variety of habitats [1] associated with different climate and soil conditions [2]. Mexico and Peru are considered as the possible centers of origin, diversification, and domestication of this species [3,4].
According to Blanca et al. [5,6], the tomato domestication process involved two transitions, the first in South America, which involved the derivation of the partially domesticated species S. lycopersicum var. cerasiforme (Dunal) D.M. Spooner, G.J. Anderson & R.K. Jansen (SLC) from the wild species Solanum pimpinellifolium L. The second transition occurred in Mesoamerica from SLC, which gave rise to the fully domesticated species S. lycopersicum var. lycopersicum L. as a species with larger fruits. However, Razifard et al. [7] mentioned in recent reports that the origin of SLC is prior to its domestication, as many typical characteristics of tomatoes grown in South America are similar to those of this species. The scarce subsequent presence of SLC was because the partially domesticated forms spread largely.
Mexico is a center of diversification and the most important area in terms of tomato domestication, wild populations are still very frequent, and it is possible to find them in a tolerated, promoted, and even cultivated form [8]. Wild forms of tomato are generally annual or seasonal, although it can manifest as perennial if there are favorable environmental conditions [9,10]. Without favorable environmental conditions, cultivated tomato rarely persists for generations, requiring minimal agronomic management to survive [1,2]. Due to this condition, the conservation of landraces or cultivated forms has occurred mainly by traditional farmers [2,11]. Perturbed areas or natural environments with some degree of disturbance such as agricultural and livestock areas still prevail in several regions of Mexico where wild tomatoes are found [12].
As well as other crops of great worldwide commercial success, tomato has lost genetic variability during the domestication process [13,14], especially for genetic breeding and the development of new commercial varieties with adaptation and tolerance to adverse abiotic factors [15] and pathogens in pre-and post-harvest [16]. A way to recover this genetic variability loss is through native, wild, and related species germplasms to incorporate specific agronomic and fruit quality traits into commercial varieties [17], through programs for the enhancement of crop wild relatives (CWRs), as has already started in other parts of the world, in order not to lose those fundamental genetic characteristics [18].
In Mexico, there are valuable genetic pools of wild local naturalized tomato populations detected from studies that demonstrate diversity at both morphological and molecular levels [19], as well as variations in the tolerance to nematodes [20]. Likewise, quality characteristics of the fruit have shown great variation, for example, antioxidant capacity and isoprenoids metabolism [21]; concentrations of nutraceuticals and antioxidant compounds such as vitamin C, lycopene, organic acids, and soluble solids [22][23][24]; concentrations of sugars, carotenoids, carotenoid-derived volatiles, and consumer preference flavor and aroma [24]. In addition, the variations in size, shape, color, fruit flavor, postharvest quality, culinary characteristics [23,24], and hedonic quality [25] are high. Despite this, studies remain scarce given the vast diversity of tomato genetic resources in Mexico. Therefore, there is still a need to collect, characterize, conserve, and use them in a sustainable way through in situ and ex situ conservation.
To achieve these purposes, it is essential to identify the status of the current potential distribution of wild tomato populations in Mexico based on diversity and climatic characterization, information that currently does not exist. This information will allow the determination of its spatial and temporal disposition, the history and the dynamics of its development [26,27], and with this, a better understanding of the interactions between environmental conditions and biotic and abiotic factors with which this species has co-evolved [28]. To perform this task, the use of geographic information system (GIS) tools is required, allowing the observation, capture, entry, storage, and analysis of data for decision-making [27].
Due to the above, the objective of this research was to determine the current distribution areas of wild and native tomato populations in Mexico by identifying their adaptive ranges and climatic adaptation patterns, through the application of ecogeographic methods carried out with GIS tools and multivariate analysis.
Regarding statistical analyses, Principal Component Analysis (PCA) determined that three principal components (PCs) described 84.1% of the total variation in the data from the accession sites. PC1 captured 45.6% of the total variation and had a greater linear association with B2, B7, B12, B14, and ET. On the other hand, B3 and B4 excelled on PC2 that corresponds to 22.8% of the total variation ( Figure 1). Altitude through the digital elevation model and annual mean temperature were represented in PC3, which captured 15.7% of the total data variation.
Regarding statistical analyses, Principal Component Analysis (PCA) determined that three principal components (PCs) described 84.1% of the total variation in the data from the accession sites. PC1 captured 45.6% of the total variation and had a greater linear association with B2, B7, B12, B14, and ET. On the other hand, B3 and B4 excelled on PC2 that corresponds to 22.8% of the total variation ( Figure 1). Altitude through the digital elevation model and annual mean temperature were represented in PC3, which captured 15.7% of the total data variation. . PC1 and PC2 explained 45.7 and 22.8% of the total variation, respectively. B1 (annual mean temperature), B2 (mean diurnal range), B3 (isothermality), B4 (temperature seasonality), B7 (temperature annual range), B12 (annual precipitation), B14 (precipitation of the driest month), B15 (precipitation seasonality), ET (annual evapotranspiration), and ALT (digital elevation model).
Cluster analysis (CA) was performed in order to group geographic provinces based on the 10 climatic variables considered and Gower distances. The Hopkins statistic indicated that the clustering trend in the dataset corresponded to a normal distribution and showed evidence (H = 0.073) that there are real clusters. The dendrogram construction algorithm was k-means, chosen according to the result of clvalid. The Nbclust algorithm determined that the number of optimal groups was 5 ( Figure 2). Cluster analysis (CA) was performed in order to group geographic provinces based on the 10 climatic variables considered and Gower distances. The Hopkins statistic indicated that the clustering trend in the dataset corresponded to a normal distribution and showed evidence (H = 0.073) that there are real clusters. The dendrogram construction algorithm was k-means, chosen according to the result of clvalid. The Nbclust algorithm determined that the number of optimal groups was 5 ( Figure 2).
Regarding the five identified clusters, they present a well-defined geographical distribution ( Figure 2). Cluster 1 is made up of accessions from the Pacific coastal zone from southern Sonora to Chiapas. This group is distinguished by having the largest distribution in the country, as well as the greatest amplitude in the climatic ranges and ecological descriptors of all the variables used in the analysis (Table 1). Table 1. Ecological descriptors of 10 climatic variables for S. lycopersicum L. distribution in Mexico according to physiographic provinces and cluster groups identified in CA. Range (Minimum-Maximum value), Med (Median), CV (Coefficient of variation), B1 (annual mean temperature), B2 (mean diurnal range), B3 (isothermality), B4 (temperature seasonality), B7 (temperature annual range), B12 (annual precipitation), B14 (precipitation of the driest month), B15 (precipitation seasonality), ET (annual evapotranspiration), and ALT (digital elevation model).  Regarding the five identified clusters, they present a well-defined geographical distribution ( Figure 2). Cluster 1 is made up of accessions from the Pacific coastal zone from southern Sonora to Chiapas. This group is distinguished by having the largest distribution in the country, as well as the greatest amplitude in the climatic ranges and ecological descriptors of all the variables used in the analysis (Table 1).

CLUSTER PHYSIOGRAPHIC
Cluster 2 was located along the transvolcanic zone of the country. These accessions are those with the highest altitude range and the lowest annual mean temperature. Cluster 3 contains regions located in the northern part of the country with the lowest annual precipitation and evapotranspiration values. Cluster 4 identifies regions located along the coast of the Gulf of Mexico, from Tamaulipas to Yucatán, with a tendency to present low altitude, high annual mean temperature, annual precipitation, and evapotranspiration. Finally, cluster 5 corresponds to the mountainous area in the east of the country, from the Cluster 2 was located along the transvolcanic zone of the country. These accessions are those with the highest altitude range and the lowest annual mean temperature. Cluster 3 contains regions located in the northern part of the country with the lowest annual precipitation and evapotranspiration values. Cluster 4 identifies regions located along the coast of the Gulf of Mexico, from Tamaulipas to Yucatán, with a tendency to present low altitude, high annual mean temperature, annual precipitation, and evapotranspiration. Finally, cluster 5 corresponds to the mountainous area in the east of the country, from the south of San Luis Potosí to Chiapas with environments with high altitude, low annual mean temperatures, and high water availability.
Regarding ecological descriptors shown in Table 1 and Figure 3, the climatic amplitude and the well-defined correspondence of the wild tomato collection sites distributed in the different physiographic provinces can be observed and grouped according to the dendrogram generated by CA. Among the main findings, wild tomato populations in the Gulf of Mexico area stand out, where the highest annual precipitation and evapotranspiration values occur. In contrast, the populations of the Baja California area face water scarcity. The accessions of the province of Yucatán face high values of annual mean temperature and lower altitude, while those in the area of Altiplano Sur (Zacatecano-Potosino) withstand lower annual mean temperatures associated with higher altitudes. dendrogram generated by CA. Among the main findings, wild tomato populations in the Gulf of Mexico area stand out, where the highest annual precipitation and evapotranspiration values occur. In contrast, the populations of the Baja California area face water scarcity. The accessions of the province of Yucatán face high values of annual mean temperature and lower altitude, while those in the area of Altiplano Sur (Zacatecano-Potosino) withstand lower annual mean temperatures associated with higher altitudes.

Climatic Diversity and Hotspot Analysis
Beck et al. [29] reported 30 climate types using as references the Köppen-Geiger system. Among tomato accessions in Mexico, only 10 of them were identified: (Af, Am, Aw, BWh, BSh, BSk, Cwa, Cwb, Cfa, and Cfb), with a predominance of temperate climates (C). BSh (arid, steppe, hot) and Aw (tropical, savannah) climates were the ones present in most of the previously identified physiographic provinces. Af climate type (tropical, rainforest) was the one with the least abundance, appearing only in the accessions of the region of Gulf of Mexico and Oaxaca.
Regarding climate diversity among accessions located in each physiographic province, accessions of the Petén region were those that were in a single climatic type (Aw, tropical, Savannah). Accessions from Sierra Madre del Sur region, the Eje Volcánico, Gulf of Mexico, Sierra Madre Oriental, and Oaxaca are distributed in the greatest diversity of

Climatic Diversity and Hotspot Analysis
Beck et al. [29] reported 30 climate types using as references the Köppen-Geiger system. Among tomato accessions in Mexico, only 10 of them were identified: (Af, Am, Aw, BWh, BSh, BSk, Cwa, Cwb, Cfa, and Cfb), with a predominance of temperate climates (C). BSh (arid, steppe, hot) and Aw (tropical, savannah) climates were the ones present in most of the previously identified physiographic provinces. Af climate type (tropical, rainforest) was the one with the least abundance, appearing only in the accessions of the region of Gulf of Mexico and Oaxaca.
Regarding climate diversity among accessions located in each physiographic province, accessions of the Petén region were those that were in a single climatic type (Aw, tropical, Savannah). Accessions from Sierra Madre del Sur region, the Eje Volcánico, Gulf of Mexico, Sierra Madre Oriental, and Oaxaca are distributed in the greatest diversity of climates (seven climatic types). Figure 4 shows the distribution of climatic diversity for each province according to the clusters previously identified in CA. In this graph, it is possible to observe that, within each group, there is considerable climatic similarity with a different proportion between the physiographic provinces that make up each cluster.
On the other hand, hotspot analysis ( Figure 5) revealed the presence of zones of high and low diversity in a satisfactory manner. Among the areas with the greatest diversity or hot zones are practically all the accessions of Chiapas; the border area between Veracruz with Hidalgo, Querétaro, and Puebla; finally, a small area of the State of Mexico adjoining CDMX, as well as another small area of the state of Guerrero. Regarding the areas of importance due to the low diversity of accessions or coldspots, the northern zone of Sinaloa, the south of Veracruz, and the northern zone of Jalisco are located, bordering Guanajuato and Michoacán.
climates (seven climatic types). Figure 4 shows the distribution of climatic diversity for each province according to the clusters previously identified in CA. In this graph, it is possible to observe that, within each group, there is considerable climatic similarity with a different proportion between the physiographic provinces that make up each cluster.  [29]. Af (Tropical, rainforest), Am (Tropical, monsoon), Aw (Tropical, Savannah), BWh (Arid, desert, hot), BSh (Arid, steppe, hot), BSk (Arid, steppe, cold), Cwa (Temperate, dry winter, and hot summer), Cwb (Temperate, dry winter, and warm summer), Cfa (Temperate, no dry season, and hot summer), and Cfb (Temperate, not dry season, warm summer).
On the other hand, hotspot analysis ( Figure 5) revealed the presence of zones of high and low diversity in a satisfactory manner. Among the areas with the greatest diversity or hot zones are practically all the accessions of Chiapas; the border area between Veracruz with Hidalgo, Querétaro, and Puebla; finally, a small area of the State of Mexico adjoining CDMX, as well as another small area of the state of Guerrero. Regarding the areas of importance due to the low diversity of accessions or coldspots, the northern zone of Sinaloa, the south of Veracruz, and the northern zone of Jalisco are located, bordering Guanajuato and Michoacán. The potential distribution model of tomato had an adequate fit and performance, shown by the average value of AUC (0.932), which allows it to strongly discriminate the suitable areas from those not suitable for the distribution of wild populations of this tomato species [31]. Additionally, this model shows a very close coincidence with the regions where the collection sites are located, with the exception of the northern part of the country (Baja California, and some accessions in Chihuahua, Coahuila, and Tamaulipas).  [29]. Af (Tropical, rainforest), Am (Tropical, monsoon), Aw (Tropical, Savannah), BWh (Arid, desert, hot), BSh (Arid, steppe, hot), BSk (Arid, steppe, cold), Cwa (Temperate, dry winter, and hot summer), Cwb (Temperate, dry winter, and warm summer), Cfa (Temperate, no dry season, and hot summer), and Cfb (Temperate, not dry season, warm summer).     The potential distribution model of tomato had an adequate fit and performance, shown by the average value of AUC (0.932), which allows it to strongly discriminate the suitable areas from those not suitable for the distribution of wild populations of this tomato species [31]. Additionally, this model shows a very close coincidence with the regions where the collection sites are located, with the exception of the northern part of the country (Baja California, and some accessions in Chihuahua, Coahuila, and Tamaulipas).

Discussion
Mexico, as a center of tomato domestication, presents great variability in its wild populations growing as naturalized and ruderal plant [8,32]. This is due to the varied orography and climate conditions represented by the 19 physiographic provinces of this country. Thus, given the great variability of environmental conditions that wild populations faced during the processes of natural selection, they necessarily developed adaptations to adverse conditions, which makes them a valuable genetic resource for direct use or in the generation of new improved varieties. Even with the great climate diversity in Mexico, only one tomato species with two varieties is found naturally (S. lycopersicum var. cerasiforme and S. lycopersicum var lycopersicum).
Despite this recognized importance, information about the current geographical distribution of wild tomato in Mexico and the ecological requirements that determine its distribution is partial or incomplete. This information is relevant to formulate effective strategies for sustainable use and conservation of plant genetic resources, which must be in line with the specific characteristics of the species, of the habitats in which they grow, and with world laws on the conservation of genetic resources [33].
One threat to this vast tomato genetic diversity is climate change. For the 21st century, it is predicted that there will be significant modifications in the current thermal and rainfall patterns, causing extreme variations that will severely affect natural systems [34,35]. These weather modifications will strongly alter the geographic distribution of the species, positioning plant genetic resources as a highly vulnerable sector to the impacts produced by this phenomenon [35,36].
Regarding the results of this research, the identification of multicollinear variables through Pearson correlation allowed the elimination of 11 of the 21 variables initially considered, implying that the variation detected in them can be described by the chosen variables. This is because some variables are indices that share basic information to obtain them, hence the high values of correlations, which implies linear-type associations. Therefore, by eliminating "artificial variation", the performance of multivariate analyses (CA and PCA) was improved [37].
PCA made it possible to describe 84.1% of the data variation with three PCs. Unlike the results obtained by Ramirez et al. [38] on the climatic variables of greatest importance for the diversity and distribution of wild and related tomato species in Latin America, the variables that make up each PC are not grouped uniformly, i.e., in each component, there are variables related to temperature and humidity. Likewise, variables such as altitude and annual mean temperature, considered very relevant for the distribution of tomato species [38,39], were integrated in the third component. Given these results, it can be assumed that the distribution of wild tomato in Mexico is determined by the presence of thermal and pluvial factors in the same proportion, unlike the 12 wild and four related species distributed in Latin America that are more affected by precipitation [38,39]. These changes in the distribution associated with climatic variables can be attributed to the tomato domestication process.
Groups formed by CA based on the criterion of distribution in physiographic provinces ( Figure 2B) made it possible to identify populations of tomato accessions adapted to diverse environmental conditions, locating zones with the presence of germplasms tolerant to specific factors. Information generated about ecological descriptors associated with physiographic provinces (Table 1) is a source of information of great importance, as combining them opens the possibility of searching for specific tolerance or resistance genes to adverse environmental factors (extreme temperatures, drought, excess of humidity, salinity, and presence of diseases), which is very useful for genetic breeding of tomato commercial varieties. This information, together with all the agronomic information generated over the years, is of great help for the identification and selection of materials with potential use for genetic breeding programs [40][41][42][43].
Regarding climate diversity, the predominant climate type in the wild populations of Mexico is temperate (C), unlike the Latin American tomato species where dry climates predominate (B) [39]. This condition is also favorable to know the environmental suitability of each group of species.
Hotspot analysis ( Figure 5) results satisfactorily showed the areas of high diversity of wild tomato populations into areas with high-diversity climate conditions. These zones coincide with known areas of great diversity and the use of wild tomatoes [8]. Lowdiversity zones or coldspot areas are important sites to consider for the conservation of these resources.
The Maxent model used to determine the potential distribution of tomato in Mexico has been recognized for its efficiency in handling complex interactions between predictor and response variables [28,44,45]. The coincidence between the presence of wild tomato populations and those predicted by the distribution model are very close in areas where the crop is widely adapted. As environmental conditions become more stressful for wild populations, the model loses sensitivity and efficiency as environmental conditions and, consequently, the response of the populations are much more variable and strongly limit the prediction of their distribution. However, in areas where environmental conditions correspond to the needs of wild populations, the predictive performance of this distribution model is commendable. The lack of coincidence in some regions of the north of Mexico between accessions and the predicted distribution area is assumed to be due to the date of collection of accessions (accessions collected a long time ago and that generates uncertainty about their current presence), and that the current climatic conditions are no longer the most favorable for the development of the species.
The present study constitutes a reliable source of information for the generation of strategies for sustainable use and conservation of tomato genetic resources in Mexico. However, it is still necessary to evaluate the impact of climate change on the distribution of these populations, effects on genetic diversity, and agricultural systems [46]; together with the information generated in this research, it will be possible to design future collection routes for conservation and use of these resources.

Database
The database with which this research was carried out was integrated with georeferenced passport data of S. lycopersicum L. in Mexico. For this purpose, 2983 geographic coordinates were identified from: the National Network of Tomato-SNICS, germplasm banks (National Center of Genetic Resources-INIFAP, National Bank of Plant Germplasm-UACH), herbarium specimens (Herbarium of the University of Guadalajara, Herbarium of the University of Science and Arts of Chiapas, Herbarium of the Colegio de la Frontera Sur, and Herbarium of the Institute of History and Ecology of Chiapas), reports and scientific articles [40][41][42][43], and national (Global Biodiversity Information Network) [47] and international plant inventories (Tomato Genetics Resource Center and Global Biodiversity Information Facility) [48,49]. Of the total number of accessions collected, records with atypical data, repeated records, records with geographic coordinates with low precision (less than 3 decimal places), and accessions located in atypical areas were discarded, leaving a total of 1296 records (Figure 7). It is necessary to mention that the collections were identified at the level of taxonomic variety due to little or no information on some of the records used.

Environmental Information
A climate information system with 20 climatic and one geographic variables was used in raster format with a spatial resolution of 1 km 2 . Bioclimatic variables belong to WorldClim version 2.1 corresponding to the period of 1970-2000 [50]: B1 (Annual mean
In order to make information more understandable, accession sites were grouped according to the classification of physiographic provinces of Mexico [52] (Figure 8). This information was used to identify climatic patterns and diversity, perform statistical analysis, and define ecological descriptors.

Statistical Analysis and Ecological Descriptors
Before running statistical analyses, a selection of variables was made in order to identify high linear dependence (collinearity) among more than two variables. This selection of variables was obtained with Pearson correlations between variables, eliminating one of those two variables with absolute coefficients greater than 0.90.
Principal Component Analysis (PCA) with a correlation matrix was performed with the selected variables in SAS V 9.4 (PRINCOMP procedure) [53]. All graphics were elaborated in RStudio [54] (Figures 2B, 3 and 4). Eigenvalues, eigenvectors, and the contribution of the variables for each principal component for the corresponding figures were obtained with the packages FactoMineR [55] and Factoextra [56].
Subsequently, Cluster Analysis (CA) with Euclidean distances and Ward's lower variance clustering method was run to identify similar accessions by physiographic prov-

Statistical Analysis and Ecological Descriptors
Before running statistical analyses, a selection of variables was made in order to identify high linear dependence (collinearity) among more than two variables. This selection of variables was obtained with Pearson correlations between variables, eliminating one of those two variables with absolute coefficients greater than 0.90.
Principal Component Analysis (PCA) with a correlation matrix was performed with the selected variables in SAS V 9.4 (PRINCOMP procedure) [53]. All graphics were elaborated in RStudio [54] (Figures 2B, 3 and 4). Eigenvalues, eigenvectors, and the contribution of the variables for each principal component for the corresponding figures were obtained with the packages FactoMineR [55] and Factoextra [56].
Subsequently, Cluster Analysis (CA) with Euclidean distances and Ward's lower variance clustering method was run to identify similar accessions by physiographic provinces. The clustering tendency was verified with Hopkins (H) statistic with the clustertend package [57] where values greater than or equal to 0.5 indicate that they are very close and the data are uniformly distributed, so clustering does not make sense; values close to 0 are evidence in favor of clustering of the data. The best algorithm for clustering was calculated with the clValid package [58]. The selection of the optimal number of clusters was determined with the NbClust package [59].
To determine ecological descriptors, a vector of points with the geographic coordinates of each accession was used and the values of each variable were determined. These values were obtained with the Spatial Analyst Tools of ArcGIS (software GIS) version 10.3, ESRI Inc., Redlands, CA [63]. The information was concentrated in an Excel spreadsheet where the extreme values or range (minimum and maximum), median, and coefficient of variation (CV = [Q/Med] × 100, where Q = [Q3 − Q1]/2 (interquartile range), and Med = median) of each accession were determined [38]. This process was carried out for each of the selected variables.

Climate Diversity and Hotspot Analysis
Climate diversity patterns were identified by taking into account physiographic provinces in Mexico by "Comisión Nacional para el Conocimiento y Uso de la Biodiversidad" (CONABIO) [52]. For this analysis, the geographical coordinate vectors of each accession were used and the climate type was obtained ( Table 2) according to the world climatic classification with the Köppen-Geiger system with a spatial resolution of~1 km 2 proposed by Beck et al. [29]. With the information obtained, a frequency table was integrated identifying the number of accessions for each climate type in each physiographic province identified in the accession areas.  [29] to determine diversity patterns among wild tomato accessions S. lycopersicum L. in Mexico.
For hotspot analysis, critical zones of species abundance and areas with a high concentration of diversity were identified using the "Spatial Statistics Tools" of ArcGis.
Species density maps were constructed by identifying all accessions within 1 km of each other. This distance was chosen based on previous studies of diversity in wild tomato species in South America [39] and potato species (Solanum Sect. Petota), a sister group of tomatoes [64,65].
Hotspot analysis was performed to determine the hot or cold spatial clustering of collections as expected with a random distribution. The analysis was run with the Getis-Ord Gi* statistic [66] to quantify specific regions of high clustering and spatial significance for accessions abundance and diversity. Statistical significance of the analysis was calculated using Z values.

Potential Distribution
The potential distribution model for tomato in Mexico was determined with MaxEnt model V. 3.4.4 [30], which is based on the principle of maximum entropy to estimate a set of functions that relate the suitability of the environment to environmental variables and determine the potential distribution of a species [31]. The Maxent model has been recognized for its efficiency in handling complex interactions between predictor variables and response variables [28,44,45].
Regarding the model parameters, the occurrence data were randomly divided into training data (50%) and test data (50%) in order to test the fit and statistical significance of the model [67]. Finally, the model output was presented as the ensemble model of 10 replicates by cross-validation.
The model performance was evaluated by estimating the area under the curve (AUC) from plots of receiver operating characteristics [68]. Such a statistic is useful to evaluate the goodness of selection of suitable versus unsuitable areas for tomato distribution, where models with an AUC greater than 0.7 are acceptable and perform well [28,69].
The resulting ensemble model was presented as a binomial presence/absence map for tomato distribution by choosing the threshold value of environmental fitness by selecting the threshold value (Fixed cumulative value 1, Minimum training presence, 10 percentile training presence, and maximum training sensitivity plus specificity) that guarantees the lowest omission rate (known areas of predicted occurrence/absence) at a maximum logistic value [28].