Climatic Diversity and Ecological Descriptors of Wild Tomato Species (Solanum sect. Lycopersicon) and Close Related Species (Solanum sect. Juglandifolia y sect. Lycopersicoides) in Latin America

Conservation and sustainable use of species diversity require a description of the environment where they develop. The objectives were to determine ecological descriptors and climatic diversity of areas along the distribution range of 12 species of wild tomatoes (Solanum sect. Lycopersicon) and four wild species of phylogenetically related groups (Solanum sect. Juglandifolia and sect. Lycopersicoides), as well as their ecological similarity in Latin America. With 4228 selected tomato accessions and an environmental information system (EIS) composed of 21 climatic variables, diversity patterns of the distribution areas were identified for each species, as well as ecological descriptors through the use of geographic information systems (GIS). The contribution of climatic variables to the species geographical distribution was identified by principal component analysis (PCA), and similarity in species distribution as a function of the variables identified with cluster analysis (CA). Climatic characteristics and the environmental amplitude of wild tomatoes and related species along their distributional range were satisfactorily determined by ecological descriptors. Eleven climate types were identified, predominantly BSk (arid, steppe, cold), BWh (arid, desert, hot), and Cfb (temperate, no dry season, warm summer). PCA determined 10 most important variables were the most important for the geographical distribution. Six groups of species were identified according to CA and climatic distribution similarity. This approach has shown promissory applications for biodiversity conservation of valuable genetic resources for tomato crop breeding.


Introduction
Tomato (Solanum lycopersium L.), a member of the Solanaceae family, is one of the world's leading vegetable crops with worldwide distribution growing in an extensive variety of habitats [1]. Peru has been considered the center of origin, but it is accepted that the tomato diversification process involved two transitions; the first occurred in South America, from the wild species Solanum pimpinellifolium L. to a partially domesticated species Solanum lycopersicum L. var. cerasiforme (SLC); while the second occurred in Mesoamerica from SLC to the completely domesticated species Solanum lycopersicum L. var. lycopersicum.
Ecogeographic studies of plant genetic resources allow the identification of the adaptive ranges of the species and the most relevant environmental variables that define their distribution [24]. Its main applications are related to the collection, conservation, characterization, documentation, and use of plant genetic resources [7,[24][25][26][27]. Additionally, it is possible to predict the environmental conditions of the collection sites [11,28,29] from the ecological descriptors derived from the geographical location of germplasm and environmental variables obtained through GIS tools [10,11,[28][29][30].
The central hypothesis of this research postulate that patterns of climatic diversity might coincide with the classification of wild tomatoes reflecting close ancestral relationships.
The objectives were determining the ecological descriptors and the climatic diversity of 16 species (12 species of wild tomatoes (Solanum sect. Lycopersicon) and four species of phylogenetically related groups (Solanum sect. Juglandifolia and sect. Lycopersicoides)), as well as their ecological similarity in Latin America.

Climatic Diversity
Of the 21 existing climates in Latin America, according to the Köpen-Geiger classification adapted by Beck et al. [31], 12 wild tomatoes and four close related outgroup species were located in 11 of them. Some of the 16 species of the genus Solanum showed specific patterns in their distribution within the identified climate types ( Figure 1).
The species that presented accessions in the greatest number of climates are S. habrochaites, S. arcanum, S. ochranthum, and S. juglandifolum. In contrast, the species with the highest environmental restrictions were S. sitiens, S. lycopersicoides, S. corneliomulleri, and S. chmielewskii. Regarding the diversity of climates, those with the greatest predominance among the species are BSk, BWk, and BWh, corresponding to climates of the cold steppe arid type, arid cold desert, and hot arid desert, respectively. Figure 2 shows the distribution and percentage of climates in each species. In this image, the climatic similarity between nearby taxonomic groups derived from phylogenetic data can be observed [1], identifying the same climate types in different proportions for species groups.

Ecological Descriptors
Ecological zones, as well as the distribution environments and altitudinal ranges reported in the literature for the 16 species of Solanum (Sect. Lycopersicon, Juglandifolia, and Lycopersicoides), are shown in Table 1. There were no specific reports of annual mean temperature, annual precipitation, mean diurnal range and annual evapotranspiration, or any other variable in the available literature. In some cases, some climatic parameters associated with the distribution zones are mentioned in general [1,22,23].
The ecological descriptors derived from the geographic location of the accessions and the EIS through the use of GIS tools are shown in Table 2. The variables considered were chosen due to their influence on the establishment of the species: altitude, annual mean temperature, mean diurnal range, annual precipitation, and annual evapotranspiration.
Both the ecological descriptors and the values reported by Peralta et al. [1] and Grandillo et al. [22] show very similar altitudinal ranges. Considering the median altitude values (Table 2) S. cheesmaniae, S. galapagense, and S. pimpinelifollium are the species with the lowest altitudinal range (45-93 m above sea level), and S. lycopersicoides, S. sitiens, and S. ochranthum are the species with the highest altitude range (2740-2928 m above sea level).
S. juglandifolium and S. ochrantum require the highest water requirements (annual precipitation and evapotranspiration); in contrast, S. lycopersicoides and S. sitiens are species characteristics of more arid and drier sites.

Statistical Analysis
Linear correlation analysis detected multicollinearity between variables BIO6, BIO9-BIO11, BIO16, and BIO17, discarding them (10 variables) from subsequent statistical analyzes. The association patterns between variables were identified using a PCA. Thus, three principal components (PC1, PC2, and PC3) explained 86.2% of the variation, with an individual contribution of 47.8, 27.2, and 11.2%, respectively. Figure 3 shows the biplot of PC1 and PC2 (explaining 75% of the variation), showing the dispersion of the accessions and the contribution of the variables used. PC1 grouped variables related to water and humidity requirements: precipitation of wettest month (BIO13), annual precipitation (BIO12), and annual evapotranspiration (ETPA). PC2 was associated with annual mean temperature (BIO1), mean diurnal range (BIO2), altitude (ALT), mean temperature of the wettest quarter (BIO8), and maximum temperature of the warmest month (BIO5). Finally, PC3 was integrated by the coefficient of variation of seasonal precipitation (BIO15) and isothermality (BIO3). Table 1. Distribution and altitude (m above sea level) of 16 species of Solanum reported in two publications: Peralta et al. [1] and Grandillo et al. [23]. Species described according to taxonomic sections and groups proposed by Peralta et al. [1].

Section/Group
Solanum

S. lycopersicoides
Southern area of Peru and northern Chile. In ravines and rocky slopes. 1500-3700 1200-3700 S. sitiens Hyper-arid areas, northern region of Chile. 2350-3500 2500-3500 Table 2. Ecological descriptors of the wild and related species to S. lycopersicum. ALT = altitude, TEMP = annual mean temperature, DRAN = mean diurnal range, RAIN = annual precipitation, EVAPO = annual evapotranspiration. Max = maximum, Min = minimum, Med = median, and CV = coefficient of variation. Species divided according to sections and groups proposed by Peralta et al. [1].  The CA was carried out in order to identify patterns of similarity between accession distribution areas. This analysis included the median values of informative variables previously selected for 67 combinations identified, resulting from the interaction of species by climate type, using the distances of Gower and Ward's grouping method. According to the statistical indicators pseudo-F and pseudo t 2 , the number of statistically significant groups was six. In order to corroborate the belonging observations to each identified group, a discriminant analysis was carried out, where the test of restitution of linear discriminant function was applied, which did not indicate changes in the groups generated by the CA, confirming that the classification is reliable. The geographical distribution of the accessions that belong to each group is shown in Figure 4. Table 3 shows the medians and coefficients of variation of each of the groups identified in the CA. The significant variables of the PC were used to describe the groups generated in the CA. From these results, it is possible to identify ecological patterns among the groups formed; for example, the accessions of cluster 1 are those that are found at the highest altitude, and with the lowest annual mean temperature, the species that form cluster 4 and 6 are those with the highest annual precipitation and evapotranspiration, and species of group 1 are located in sites with less availability of humidity. Regarding Kruskal-Wallis non-parametric test of variables for obtained clusters, in all cases, the results were statistically significant (p ≤ 0.001). Table 4 shows the median values for each cluster, and the corresponding results of rank means comparison of informative variables of 3 PCs.   Table 4. Medians comparisons of the informative variables that integrate the first three principal components for the 6 clusters formed in the CA of the wild tomato species in Latin America. EVAPO = Annual evapotranspiration, BIO12 = annual precipitation, BIO13 = precipitation of wettest month, ALT = altitude, BIO8 = mean temperature of wettest quarter, BIO5 = maximum temperature of warmest month, BIO2 = mean diurnal range, BIO1 = annual mean temperature, BIO15 = precipitation seasonality, and BIO3 = isothermality.

Discussion
Characterization of genetic resources through environmental information of accession sites, also called ecogeographic description, allows the typification of adaptive ranges and the most relevant environmental factors that determine species adaptation [24]. On the other hand, using GIS techniques, georeferencing species sites allows the analysis of geographical distances and distribution patterns of germplasm collection sites. With this approach, it is possible to determine environmental conditions in which the wild species and local varieties of crops have acquired their adaptive ranges [25]. This ecogeographic characterization complements the phenotypic and genetic information, useful for the characterization of the germplasm.
The distribution of the 67 combinations of species with climates in the clusters and the phylogenetic group to which they belong can be observed in Table 3. The Lycopersicon group, corresponding to the species of the Galapagos Islands and some continental areas, is located in clusters 2 and 6. S. pennelli (Neolicopersicon group) is located in clusters 1 and 2. The species of the Arcanum group and S. sect. Juglandifolia are located in four of the six proposed clusters. S. huaylasense, S. corneliomulleri, S. peruvianum, S. chilense, and S. habrochaites, species of the Ericopersicon group are distributed in all the clusters formed. On the other hand, S. sect. Lycopersicoides is only present in cluster 1.
Some species are distributed in a more restricted area (S. lycopersicoides, S. sitiens), and others are more widely located, a condition attributed to their wide distribution. It is worth mentioning that each species has a specific geographic distribution, with overlapping regions between various species, reflecting their ecological adaptation patterns and habitat preferences [22] (Figure 2).
Regarding the climatic diversity of the species of the genus Solanum, there were no concrete data in the literature for all the species considered within the tomato group. Thus, the diversity of climates, ecological descriptors, and abundance patterns described by this research for each species constitutes new and valuable information, with potential use for the identification of germplasm tolerant to specific adverse biotic and abiotic factors, among other purposes (Tables 2-4, Figures 2 and 3). Wild tomato species are frequently found in isolated valleys with adaptations to particular types of climate, with possible tolerance or resistance to adverse conditions. Probably the Andean geography, the ecological diversity of habitats and climates together contributed to the diversity of wild species [32,33]. In general, it is mentioned that the wild tomato species are distributed in Ecuador, the Galapagos Islands, Peru, and the north of Chile and Colombia, in various ecosystems from sea level to approximately 3300 m above sea level [33][34][35]. It is important to highlight that a predictive classification of tomatoes and closely related groups were considered as a framework for the ecogeographic characterization and the actual taxonomic knowledge to select reliable species accessions from different sources of the database. This selection process is fundamental to generate a trustworthy species database for further analysis. There are often mistakes and inconsistencies due to incorrect taxonomic identification of accessions or wrong information of collection sites.
Few studies have been carried out with an ecogeographic or climatic approach, highlighting the publications of Peralta et al. [1], Nakazato et al. [22], Grandillo et al. [23], and Pease et al. [7]. These authors identified the ecological distribution environments and altitudinal ranges of adaptation of the tomato species. Both the ecological descriptors and the patterns of climatic diversity were generated from more current and more representative sources of information due to the diversity of variables of the EIS used and a large number of accessions from different sources of information. It should be noted that, although the results in altitude and ecological zones of distribution are very similar, the ecological descriptors provide information for the 16 species with greater amplitude and precision (Tables 2 and 3).
With the information generated, it is also possible to begin to identify those species found in critical environments that can be potentially used as a source of germplasm for genetic breeding programs for resistance to drought [11], extreme temperatures [12], resistance to pests and diseases [13,14], to mention some examples.
There are specific studies of some species to which a certain adaptation or tolerance characteristic has been attributed due to their distribution; For example, S. pennellii is considered a species with extreme tolerance to drought attributed to strict control of transpiration, increased efficient use of water, and tolerance to soil salinity [36,37]; S. sitiens is considered the species that inhabits the most arid places [38] with the ability to tolerate high levels of salinity [22], S. habrochaites is known to have good growth at low temperatures [39][40][41], and S. lycopersicoides has resistance to drought and has a preference for colder sites [42,43]. These statements coincide with the values obtained in the ecological descriptors; for example, S. sitiens and S. lycopersicoides are located as the species from sites with the lowest availability of humidity.
In the present study, the use of multivariate analysis allowed to satisfactorily identify the climatic variables with the greatest association with the distribution of the ecogeographic diversity of the species. The present results, through the PCA, indicated associations among variables of altitude, humidity, and temperature, explaining in good proportion the variability of the data. Such behavior in the results satisfactorily summarizes the importance of the variables in the distribution of the Solanum species evaluated.
The characterization of the species generated from the CA could be satisfactorily validated by means of a discriminating analysis. In addition, some of the species that form the groups agree with the analysis of morphological and genetic characters, so it is possible to assume that there are relationships between these characters and the climatic characteristics generated, coinciding with the results of previous research [1].
An example of the validation of groups mentioned above is shown between S. sitiens and S. lycopersicoides considered as a group of related species or sister taxa by a cladistic study carried out by Peralta and Spooner [3] with morphological data and other similar investigations [1,[44][45][46]. S. neorickii and S. chmielewskii are considered sister species [1], according to studies based on ITS sequences [47], analysis of phenotypic data and microsatellite markers [48], and cladistic studies with morphological data [49].
Conesa et al. [50] performed a climatic classification of 14 of the wild and related species to S. lycopersicum based on the mean value of annual precipitation and temperature and the De Martonne index. These authors proposed the formation of three groups: species from humid regions (S. ochranthum, S. neorickii, S. chmielewskii, S. juglandifollium and S. lycopersicum), species from semi-arid sites (S. arcanum, S. habrochaites, S. pimpinelifollium, S. galapagense, and S. chesmaniae) and species from arid regions (S. sitiens, S. chilense, S. lycopersicoides, S. penneellii, and S. peruvianum). This classification agrees with the results obtained from the mean annual precipitation reported in the ecological descriptors ( Table 2).
The present study, in addition to identifying valuable ecogeographic information not previously reported in the literature, constitutes a precedent for investigating from the use of tools developed by GIS, collections and/or valuable distribution areas as a source of germplasm for the development of varieties tolerant and resistant to specific biotic and abiotic factors through genetic improvement.
In addition, this information can be used for the formation of germplasm conservation strategies, identification of material in danger of extinction due to climate change, and germplasm collection routes for the formation of core collections. Likewise, when addressing the classification of the ecogeographic conditions achieved, they could be associated with the presence of adverse factors, both biotic and abiotic, to define areas with the probable presence of genes for resistance to such factors. Finally, it is important to mention that it is necessary to identify the actual and future ecological niches of the studied species, in that sense, these results constitute the first step on the ecogeography of the wild tomato species, being necessary to identify the ecological niches and the impact of climate change on their distribution and ecological patterns.
It was possible to collect the coordinates of 11,707 accessions, which were reviewed to rule out atypical data, eliminating repeated records, with coordinates of little geographic precision (less than 3 decimal places) and accessions outside the study area according to altitude reported and respecting the previously distributed areas described by Peralta et al. [1] and Grandillo et al. [23] (Table 1). All these strategies were applied to avoid considering accessions that correspond to introductions outside the natural areas of distribution. Finally, a frequency analysis was applied, eliminating those accessions associated with climatic types with less than 3 accessions. From this, 4228 accessions of 12 wild tomatoes and 4 closely related species distributed in Latin America were selected (Figure 1).
Annual evapotranspiration was calculated from monthly values in raster format with a spatial resolution of 30 arcs second (~1 km 2 ) [60]. Finally, the altitude of the collection site of each accession was determined from an elevation model in raster format, also with spatial resolution~1 km 2 [61].

Climatic Diversity and Ecological Descriptors
Climatic diversity patterns were identified with vectors of the geographical location of each accession. With these vectors and the "Extraction" module of the ArcGis software "Spatial Analyst Tools", the value of each pixel of the corresponding climatic classification was considered, then all the information was integrated into a worksheet (Microsoft Excel) to identify all types and frequencies of climates for each species.
Ecological descriptors were determined with the methodology proposed by Ruiz-Corral et al. [11], using geographic location vectors of all accessions and the EIS; with this, climatic ranges of adaptation were identified. These values were obtained with the ArcGis "Spatial Analyst Tools". Information was concentrated in a worksheet where extreme (minimum and maximum) and median and coefficient of variation of each variable for each species were subsequently determined [10,27].

Statistical Analysis
Linear correlations between pairs were obtained to identify multicollinearity between variables. In those variables with an absolute coefficient greater than 0.95, one of the corresponding pairs was chosen. With the chosen variables, a PCA was carried out to identify the most important variables in the description of the variation between accessions. In order to identify the similarity between the species from the present climatic diversity, a grouping analysis (CA) was carried out with the Gower distances and Ward's method of minimum variance. In order to carry out this analysis, the possible combinations between species (16) and climatic type (11) were identified, with which 67 combinations were obtained. To corroborate the belonging observations to each identified group, a discriminant analysis was carried out. Finally, the non-parametric test Kruskal-Wallis and range comparison test [63] were performed for clusters generated. To describe the groups, the variables identified as significant in the PCA were used. Statistical analyses were carried out using the Statistical Analysis System software version 9.4 [64].

Conflicts of Interest:
The authors declare no conflict of interest.