Worldwide Selection Footprints for Drought and Heat in Bread Wheat (Triticum aestivum L.)

Genome–environment Associations (GEA) or Environmental Genome-Wide Association scans (EnvGWAS) have been poorly applied for studying the genomics of adaptive traits in bread wheat landraces (Triticum aestivum L.). We analyzed 990 landraces and seven climatic variables (mean temperature, maximum temperature, precipitation, precipitation seasonality, heat index of mean temperature, heat index of maximum temperature, and drought index) in GEA using the FarmCPU approach with GAPIT. Historical temperature and precipitation values were obtained as monthly averages from 1970 to 2000. Based on 26,064 high-quality SNP loci, landraces were classified into ten subpopulations exhibiting high genetic differentiation. The GEA identified 59 SNPs and nearly 89 protein-encoding genes involved in the response processes to abiotic stress. Genes related to biosynthesis and signaling are mainly mediated by auxins, abscisic acid (ABA), ethylene (ET), salicylic acid (SA), and jasmonates (JA), which are known to operate together in modulation responses to heat stress and drought in plants. In addition, we identified some proteins associated with the response and tolerance to stress by high temperatures, water deficit, and cell wall functions. The results provide candidate regions for selection aimed to improve drought and heat tolerance in bread wheat and provide insights into the genetic mechanisms involved in adaptation to extreme environments.


Introduction
Since its domestication more than 10,000 years ago, common wheat has experienced a series of selective events caused by humans and the environment, contributing to the increase in its genetic diversification [1]. Climate change has severely reduced wheat production in recent years due to extreme temperature episodes and unpredictable precipitation patterns [2,3]. Simulation models predict losses of more than 20% in agricultural production by 2050 [4].
There is an urgent need to discover new sources of adaptation to drought and heat that contribute to maintaining crop productivity [2]. To address this scenario, landraces are valuable, because they have developed survival mechanisms for challenging environments through natural and human selection [5,6]. This is why they preserve loci of adaptation to climate change in their places of origin [7].
Recent advances in sequencing technologies have allowed the exploration of entire genomes in various species with increasingly dense single-nucleotide polymorphism (SNP) data that identify selective events [8]. Likewise, novel bioinformatic analysis approaches, such as genome-wide association studies (GWAS), are very efficient in time, cost, and precision for identifying genes that control important agricultural traits [9].
Genome-environment Associations (GEA) or Environmental Genome-Wide Association scans (EnvGWAS) have been used successfully for studying adaptive traits in local populations. They consist of associating SNPs distributed throughout the genome with environmental variables of the accession sampling sites [10].
In the last decade, significant progress has been reported in the characterization of wheat genomes through high-throughput genotyping with DArT-seq technology. More than 100,000 accessions belonging to the germplasm bank of the International Maize and Wheat Improvement Center (CIMMYT) have been characterized through the Seeds of Discovery initiative [24,25].
Considering the effectiveness of GWAS for the identification of genomic regions associated with traits of agronomic importance, as well as the functional genetic variation to adapt crops to climate change, this research aims to identify genomic regions related to the adaptation process to arid climates through Genome-environment Association studies in the Triticum aestivum collection maintained in the CIMMYT germplasm bank.

Exploratory Analysis
As expected, the highly significantly (p ≤ 0.001) and correlated variables share with each other temperature in their definition (Table 1). For instance (Table 1a), AMT was consistently significantly and positively correlated with the variables MaxTWM (r = 0.79), MeanTDQ (r = 0.71), and MeanTWQ (r = 0.77), similar to MaxTWM with MeanTWQ (r = 0.94). For those based on precipitation, only PDM with PDQ (r = 0.99) exhibited a significant correlation. Similarly, Table 1b shows positive correlations between the variables associated with temperature, such as the case of AMT with the variables MaxT (r = 0.98), HITmead (r = 0.92), and HITmax (r = 0.93); MaxT with HITmead (r = 0.92) and HITmax (r = 0.96); and HITmead with HITmax (r = 0.97). On the other hand, negative correlations were observed only for MeanTDQ with PWQ (r = −0.84) and AP with DI (r = −0.83), which reflect contrasting trends between temperature and precipitation.
In the PCA, we observed that the evaluated variables had contrasting contributions to the total variation of each component, especially for the second set of variables (Figures S1 and S2). In the biplot graphs of the PCA (Figure 1), we identified that the accessions sites are well-differentiated with respect to their Köppen-Geiger climate, revealing a greater representation of collections from the temperate (C) and cold (D) groups. For the first set (Figure 1a), those related to temperature (AMT, MaxTWM, MeanTDQ, and MeanTWQ) are oriented along PC1, while the precipitation-derived variables (PDM, PS, and PDQ) predominate in PC2. Similarly, for the second set (Figure 1b), PC1 includes the temperature-related variables (AMT, HITmead, MaxT, and HITmax), while PC2 includes AP and DI. The PCA helpfully identified some variables with discriminating potential of accessions according to the Köppen-Geiger climate groups (Figure 1b). For example, the high DI sites are related to dry climates (B), whereas the temperature-related variables (HITmax, HITmead, AMT, and MaxT) define gradients from tropical (A) to temperate (C) groups. On the other hand, we identify the variables of most significant importance concerning the monitoring, follow-up, and informativeness of drought and heat stress events and their contribution to the total variation. In the PCA, we observed that the evaluated variables had contrasting contributio to the total variation of each component, especially for the second set of variables (Figu S1 and S2). In the biplot graphs of the PCA (Figure 1), we identified that the accessio sites are well-differentiated with respect to their Köppen-Geiger climate, revealing greater representation of collections from the temperate (C) and cold (D) groups. For t first set (Figure 1a), those related to temperature (AMT, MaxTWM, MeanTDQ, a MeanTWQ) are oriented along PC1, while the precipitation-derived variables (PDM, P and PDQ) predominate in PC2. Similarly, for the second set (Figure 1b), PC1 includes t temperature-related variables (AMT, HITmead, MaxT, and HITmax), while PC2 includ AP and DI. The PCA helpfully identified some variables with discriminating potential accessions according to the Köppen-Geiger climate groups ( Figure 1b). For example, t high DI sites are related to dry climates (B), whereas the temperature-related variab (HITmax, HITmead, AMT, and MaxT) define gradients from tropical (A) to temperate ( groups. On the other hand, we identify the variables of most significant importance co cerning the monitoring, follow-up, and informativeness of drought and heat stress even and their contribution to the total variation.

Population Structure
The cross-entropy validation implemented in the LEA package, based on SNP ma ers, suggested an optimal number of 10 subpopulations (Figure 2). This subdivision flects diversity according to the climate in the subpopulations, since there is no expl representation between these and their geographic or regional origin. The main contrib tion of accessions comes from Turkey and China of 49.5% and 27.8%, respectively. This followed by Afghanistan, Tajikistan, and Iran of 5.2%, 4.2%, and 3.4%, respectively. T rest of the countries contribute with less than 2% of the accessions. Subpopulations II, VI, and X constitute 60.4% of the total population under study. On the other hand, t dispersion pattern observed in the molecular PCA biplot revealed that the subpopulatio were well-differentiated, reflecting the high genetic diversity of the analyzed accessio ( Figure 3a). The correspondence analysis (Figure 3b) between the subpopulations and K ppen-Geiger climate groups showed that dry weather accessions (B) were better asso ated with subpopulation VII. This subpopulation comprised 83 accessions from eig countries (Afghanistan, Armenia, Azerbaijan, China, Iran, Iraq, Tajikistan, and Turk cataloged principally within the region of Southern Asia (SAS) with records of an arid index (DI) greater than 5, whose values indicated water deficiency.

Population Structure
The cross-entropy validation implemented in the LEA package, based on SNP markers, suggested an optimal number of 10 subpopulations ( Figure 2). This subdivision reflects diversity according to the climate in the subpopulations, since there is no explicit representation between these and their geographic or regional origin. The main contribution of accessions comes from Turkey and China of 49.5% and 27.8%, respectively. This is followed by Afghanistan, Tajikistan, and Iran of 5.2%, 4.2%, and 3.4%, respectively. The rest of the countries contribute with less than 2% of the accessions. Subpopulations II, IV, VI, and X constitute 60.4% of the total population under study. On the other hand, the dispersion pattern observed in the molecular PCA biplot revealed that the subpopulations were well-differentiated, reflecting the high genetic diversity of the analyzed accessions ( Figure 3a). The correspondence analysis (Figure 3b) between the subpopulations and Köppen-Geiger climate groups showed that dry weather accessions (B) were better associated with subpopulation VII. This subpopulation comprised 83 accessions from eight countries (Afghanistan, Armenia, Azerbaijan, China, Iran, Iraq, Tajikistan, and Turkey) cataloged principally within the region of Southern Asia (SAS) with records of an aridity index (DI) greater than 5, whose values indicated water deficiency.

Genome-Wide Association Studies
We identified 59 SNP markers associated with the climatic variables evaluated in all 21 bread wheat chromosomes ( Table 2). The chromosomes with the highest number of associated markers were 2B (seven SNPs), 7A (six SNPs), 3B (five SNPs), and 5B (five SNPs). The chromosomes with a single associated marker were 1B, 5A, 6A, 6D, and 7D.  For all variables, the QQ plots (Figures 4 and 5) show a good adjustment, with most −log 10 (p-values) for the null hypothesis of no association, being close to the diagonal. In contrast, some points at the top of each plot may be in linkage disequilibrium (LD) with a causal polymorphism, indicating that the model has a reasonable control for both false positives and negatives.

Identification of Genes Related to Adaptation to Abiotic Stress in Plants
For the associated regions, we found 89 candidate genes encoding proteins related to various biological processes in plants ( Table 2). Among these, we identified the significant presence of 26 proteins involved in the signaling network (Table 3), 15 cell wall structural proteins (Table 4), 21 response proteins to various types of abiotic stress (Table 5), and 7 proteins related to morphological changes (Table S2). Regarding the individual detection of association for each variable, there were different levels of associated SNPs (Table S1). These differences in detecting different SNPs in highly correlated variables may be due to the possible presence of atypical data in the collection sites. Likewise, it is known that the effects produced by drought and heat are differential in certain phases of reproductive development, during which plants are more susceptible. We observed a lower association with variables DI, AMT, and AP, with less than eight markers for each one. In contrast, the variables with more associated loci were PS, HITmead, MaxT, and HITmax, with 15, 14, 12, and 11 SNPs, respectively.

Identification of Genes Related to Adaptation to Abiotic Stress in Plants
For the associated regions, we found 89 candidate genes encoding proteins related to various biological processes in plants (Table 2). Among these, we identified the significant presence of 26 proteins involved in the signaling network (Table 3), 15 cell wall structural proteins (Table 4), 21 response proteins to various types of abiotic stress (Table 5), and 7 proteins related to morphological changes (Table S2).  Given that a response to stress begins with the perception and signal transduction of environmental stimuli, it is not surprising that we found an abundance of signaling proteins with a well-documented role in plant responses to drought and heat stress. This is the case for protein kinases serine-threonine (Pstp-2A; STPK-2B, 3A, 4A, and 4D; and CIPK2-2B) and some proteins activated by stress-related plant hormones, such as ethylene (ET), abscisic acid (ABA), jasmonic acid (JA), salicylic acid (SA), and auxins, reported mainly by the BIO15, HITmax, and MaxT variables and chromosomes 7A, 4D, 2B, and 1D (Table 3).

Environmental Variables Involved in the Detection of Adaptive Loci by GEA
Climate change affects diverse geographic areas throughout the world. H effects in arid and semiarid climatic zones have devastating impacts [26]. Th environmental effects play an essential role in the local adaptation, genetic di population structure of wild accessions [17]. Therefore, using climatic variab sent selective environmental pressure can be valuable to capture important of the mechanisms of resistance and tolerance to abiotic stress. We observed l tion footprints in multiple genomic regions along the 21 wheat chromosome climatic variable having different numbers of genomic associations of bio portance.
It should be remarked that the climate data came from records spanning

Environmental Variables Involved in the Detection of Adaptive Loci by GEA
Climate change affects diverse geographic areas throughout the world. However, its effects in arid and semiarid climatic zones have devastating impacts [26]. These selective environmental effects play an essential role in the local adaptation, genetic diversity, and population structure of wild accessions [17]. Therefore, using climatic variables to represent selective environmental pressure can be valuable to capture important components of the mechanisms of resistance and tolerance to abiotic stress. We observed local adaptation footprints in multiple genomic regions along the 21 wheat chromosomes, with each climatic variable having different numbers of genomic associations of biological importance.
It should be remarked that the climate data came from records spanning 1970-2000, which naturally did not cover the adaptation period of the accessions. However, these 30-year records are good indicators of the prevailing climate type in the collection sites. On the other hand, the accessions were collected from 1983 to 2011, with the main bulk of the collection occurring in 1984 and 2011. One must be aware of the noise arising from the migration dynamics of those materials, especially for the most recent collection efforts. Thus, the herein reported results are subject to validation by different approaches.
Exceptionally, the seasonality of precipitation (PS) had the largest number of significant SNP markers along different chromosomes, placed close to unique genes of resistance and tolerance to abiotic stress. This makes sense, because it is considered an important variable in influencing the distribution of species through water availability [27]. On the other hand, measuring the variations in precipitation [28] at the sites of origin of the collections over three decades  faithfully represents the alterations in the uniformity and distribution of precipitation.
The research of Cortés et al. [29] reaffirmed the above; they found a strong influence of rainfall patterns on the population structure and the ecological diversity to tolerance drought in wild beans. Usually, prolonged periods of drought cause the expression of genomic regions associated with the activation of plant survival mechanisms [17].
The associations with the maximum temperature (MaxT), heat index (HITmead), annual precipitation (AP), and drought index (DI) identified several adaptation genes to drought and heat stress. This is primarily explained by the nature and importance of these variables in the monitoring of conditions of meteorological drought. Both AP and DI are valid descriptors for measuring the drought intensity [30]. DI is calculated through a combination of climatic and meteorological variables, among which precipitation is the most important [31]. In addition, the estimate values DI presented an excellent discriminating potential of accessions from arid climates (B), which gives reliability to its use. On the other hand, according to López-Hernández & Cortés [10], the maximum temperature and the heat index are better estimators of the natural adaptation to high temperatures and identify successfully associated genetic factors markers.
Different variables shared associations with some loci, suggesting that their selective pressures can shape the same genomic regions [22] and, therefore, remain stable in the landraces of Triticum aestivum. The most frequent loci on chromosomes 1A and 4D are related to two genes: CONSTANS-like and proteins abundant in late embryogenesis (LEA). The first is involved in various biological processes of plants, such as the control of the flowering time, regulation of growth and development, and responses to abiotic stress [32][33][34]. LEA proteins are recognized during the adaptation to abiotic stress, which includes dehydration, salinity, high temperature, and cold [35][36][37].
On the other hand, the only matching locus between AP and DI flanked an F-box domain gene, which is a homolog of the GrpE protein. In Arabidopsis, this gene acts as a nucleotide exchange factor of the 70-kD heat shock protein complex (HSP70), which specializes in thermotolerance to heat stress [38,39].

Adaptation to Drought and Heat Stress
Despite their coexistence in a climate change scenario, the combined effects of drought and heat stress have been poorly studied [40]. They have a synergistic effect, altering the metabolism and gene expression in ways other than those induced independently [41]. These combined effects affect several physiological, cellular, and molecular processes in plant cells [10].
The stress response mechanism in plants is very complex and requires several integrated pathways to be activated in response to external stress [42]. Plant hormones, such as auxins, abscisic acid (ABA), ethylene (ET), salicylic acid (SA), and jasmonates (JA), operate together in the modulation of the plants' heat and drought stress responses [43,44].
Typically, auxin and the auxin pathway regulate thermomorphogenesis in plants, coordinating the growth and defense against heat stress [45], while ABA and ET interact positively to activate or repress the expression of numerous stress response genes, such as LEA proteins and dehydrins [46,47]. Likewise, SA is related to the synthesizing of protein chaperones, heat shock proteins, protective membrane proteins, antioxidants, and secondary metabolites [46,48,49]. Jasmonates significantly improve the tolerance to heat and drought stress through various TFs, which induce responsive gene expression and organic osmo-protectant activation, osmotic adjustment, and antioxidant activity [50][51][52].
Many biochemical and physiological impacts affect the growth and development of plants [40], as they affect the photosynthetic system, gas exchange, and water relations [45]. Consequently, a series of physiological and molecular responses are produced, which include root increase, reduction in the number and conductance of the stomata, decrease in leaf area, and morphological changes in leaves [53]. On the other hand, among the molecular responses, one should consider the production of antioxidants and osmolytes for osmotic adjustment and the expression of various proteins, such as HSP, WRKY, MYB, LEA, and GrpE [40].
The genes reported in this work are involved in most of the mentioned biological processes, including genes with signaling roles and genes associated with the cell walls and membranes, photosynthesis, flowering, and, of course, proteins involved in the response to heat and drought stress on various chromosomes. Our findings are consistent with Y. Li et al. [22]: plant genomes have been shaped by natural selection during local adaptation to different environmental conditions, so there is a close relationship between species survival and response to climate change.

Geographical Data
Through code written in the R language v.3.4.4 [54], the passport data of 174,553 accessions from the CIMMYT Wheat Germplasm Bank were filtered to select 1151 landraces of Triticum aestivum with unique and sensible geographic coordinates. Subsequently, the location mapping was carried out through the geographic information system QGIS version 2.18 [55]. This filtering process yielded 990 landraces with validated geographic data. The accessions came from 33 countries distributed in 13 geographic regions (Table 6) [56].

Genotypic Data
Germplasm genotyping was carried out through DArT-seq technology in CIMMYT under the Seeds of Discovery initiative for 45,871 accessions belonging to the wheat germplasm bank. The information was integrated using R v.3.4.4 [54] in a data table with the HapMap format containing the information of 86,683 SNP loci. The markers' physical locations were obtained by reference genome sequences provided by Diversity Arrays Technology (wheat_ChineseSpring04), and only markers unambiguously located in the wheat genome were retained. Subsequently, they were filtered for quality control through a selection of the cleanest and most informative SNPs, with a maximum missing data rate of 20%, Shannon entropy greater than zero, and variants with a minor allele frequency (MAF) ≥ 2%. The filtered table contained 26,064 SNP loci in 990 landraces with geographic data.

Climatic Data
For each collection site, we extracted the values of altitude, temperature, precipitation, and eight bioclimatic variables related to drought and heat stress (Table 7) at a spatial resolution of 2.5 min (4.5 km) from the WorldClim platform (https://www.worldclim.org/ accessed on 31 July 2022) using the getData function of the R raster package version 3.5-15 [57]. Bioclimatic variables are derived from the monthly temperature and rainfall historical climate data from 1970 to 2000 in order to generate more biologically meaningful variables [58]. Additionally, we determined the Köppen-Geiger main climate groups (Table 8) through the R kgc package version 1.0.0.2 [59]. Köppen classification was constructed based on five vegetation groups that distinguish between plants of the equatorial zone (A), the arid zone (B), the warm temperate zone (C), the snow zone (D), and the polar zone (E); in the subclassification, the second letter considers the precipitation while the third letter the air temperature [60].

Index Estimation
Two heat indices (HIT) were estimated by the Thornthwaite model [61] using values of the mean and maximum temperature from 1970 to 2000. We denominated the HITmean and HITmax, respectively: For tmean i , tmax i > 0, where tmean i is the average monthly temperature, and tmax i is the maximum monthly, respectively, for the ith month.
Furthermore, a drought index was calculated [29]. This index is based on the relationship between the potential evapotranspiration and the annual precipitation of each collection site: where DI is the drought index, PET is the potential evapotranspiration, and AP is the annual precipitation. In this index, negative values indicate excessive precipitation, while positive values indicate water deficiency. The calculation of PET was done with the thornthwaite function of the R SPEI package version 1.7 [62], through the values of the monthly average temperatures from 1970 to 2000 (tmed i > 0) with the estimated solar radiation being based on the latitude of each collection site.

Exploratory Analysis of Climatic Variables
To look for patterns of relationships among climate variables, we used the Pearson correlation coefficient (r) and principal component analysis (PCA). The standardized variables are used in PCA to estimate the correlation matrix and determine the principal components (PC). The bioclimatic variables were grouped into two sets, with the first one containing bioclimatic variables related to aridity (AMT, TS, MaxTWM, MeanTDQ, MeanTWQ, AP, PS, PDM, PDQ, and PWQ). The second set included AMT; MaxT; AP; constructed indices (HITmean, HITmax, and DI); and ELEV. AMT and AP were included in both sets, because they are considered the main variables for indices related to aridity. The biplot graphs were constructed with the factoextra R package version 1.0.7 [63]. In the latter, the vector's length and the angle's cosine were used to group the variables into different groups. The Köppen-Geiger climate groups of each collection were also included.

Population Structure
The stratification of the collection was explored by two methods. The first one was based on the Landscape and Ecological Associations studies (LEA) package [64], with the SNPs coded in numerical form (0, 1, and 2). The smnf function was used to estimate the ancestry coefficients (K) with cross-entropy. This algorithm was executed with 10 replications and a K-value from 1 to 10. The optimal number of K was defined to assign genotypes to subpopulations according to estimates of individual admixture coefficients from the genotypic matrix. Subsequently, we visualized principal components through GAPIT R package version 3.0 [65]. We compared the results of the population stratification in subpopulations with the Köppen-Geiger climate groups using a correspondence analysis (CA), which reveals the close relationships between and within two groups of categorical variables based on data provided in a contingency table.

Association Analysis
We used 990 landraces of Triticum aestivum genotyped with 26,064 SNP loci and seven variables (AMT, MaxT, AP, PS, HITmead, HITmax, and DI) to run Genome-environment Association (GEA) studies with the Fixed and random model Circulating Probability Unification multiple-locus model "FarmCPU" [66] implemented in the R GAPIT (Genome Association and Prediction Integrated Tool) package version 3.0 [65]. FarmCPU is characterized by iteratively using two models, a linear mixed model (MLM) and a fixed-effects model, to select a set of markers associated with a trait of interest.
The significant SNPs were determined according to the Bonferroni threshold to an alpha of 0.05, with a threshold value of -log 10 (0.05/26,064) = 5.72, coupled by the visual interpretation of the Q-Q plots. The Manhattan and Q-Q plots were built with the R CMplot package version 4.0.0 [67].

Candidate Genes and Their Annotation
The sequence of the significant SNP markers was blasted in the wheat reference genome IWGSC_refseqv1.0 [68] published in the Ensembl Plants database (www.https://plants. ensembl.org/ accessed on 31 July 2022) to identify the candidate genes. For this, the genes found in the overlapping region and within one Mb upstream and downstream of the matched regions were selected as the candidate genes, and their molecular functions were determined. We identified the proteins in the UniProt (https://www.uniprot.org/ accessed on 31 July 2022) and InterPro (https://www.ebi.ac.uk/interpro accessed on 31 July 2022) databases encoded by the candidate genes, the functionality of the sequences, their domains, and classification. Finally, we elaborated a schematic representation of the physical map of bread wheat with the significant SNPs associated with response proteins to water stress and heat stress with the R LinkageMapView package version 2.1.2 [69].

Conclusions and Practical Implications
The results suggest that local adaptations have footprints along the 21 wheat chromosomes in multiple genomic regions. We found vital genes that include several critical points on the abiotic stress response mechanisms-highlighting a considerable number of signaling genes mediated by plant hormones, regulatory processes of the cell wall, morphophysiological changes, photosynthesis, flowering, and some response mechanisms to abiotic stress.
We showed that the climatic variables estimated with historical data help capturing the environmental variability that occurred in the collection sites of the landraces. The variables of the maximum temperature, annual precipitation, precipitation seasonality, and heat and drought indices relevantly participated in identifying genes related to the response to water deficit and high temperatures. This confirms its representativeness in determining aridity in some climatic regions.
This study points to 89 genes involved in the adaptation of bread wheat to its native habitats by association with seven specific climatic variables. The results are consistent with the idea that environmental pressure has modeled, through natural selection, the structure of genomic regions in local wheat populations over time.
Our findings constitute a new resource to select accessions carrying alleles linked to specific climatic responses, which can be exploited through genomic prediction tools to select germplasms with genetic potential for adaptation to climate change.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/ 10.3390/plants11172289/s1: Figure S1: Contributions to the total variations of each component for the evaluated variables in the PCA. Figure S2: Contributions to component principals 1 and 2 for the evaluated variables in the PCA. Table S1: Detection of associated SNPs for seven climatic variables. Table S2: Other genes and proteins identified by Genome-environment Association with seven climatic variables.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author.