Discovering Loci for Breeding Prospective and Phenology in Wheat Mediterranean Landraces by Environmental and eigenGWAS

Knowledge of the genetic basis of traits controlling phenology, differentiation patterns, and environmental adaptation is essential to develop new cultivars under climate change conditions. Landrace collections are an appropriate platform to study the hidden variation caused by crop breeding. The use of genome-wide association analysis for phenology, climatic data and differentiation among Mediterranean landraces led to the identification of 651 marker-trait associations that could be grouped in 46 QTL hotspots. A candidate gene analysis using the annotation of the genome sequence of the wheat cultivar ‘Chinese Spring’ detected 1097 gene models within 33 selected QTL hotspots. From all the gene models, 42 were shown to be differentially expressed (upregulated) under abiotic stress conditions, and 9 were selected based on their levels of expression. Different gene families previously reported for their involvement in different stress responses were found (protein kinases, ras-like GTP binding proteins and ethylene-responsive transcription factors). Finally, the synteny analysis in the QTL hotspots regions among the genomes of wheat and other cereal species identified 23, 21 and 7 ortho-QTLs for Brachypodium, rice and maize, respectively, confirming the importance of these loci.


Introduction
Climate change may be the single unifying and chronic issue that will affect everyone and every aspect of the economy. Changes in weather patterns and variability, and differential combinations of effects in different parts of the world, including the Mediterranean region, are expected. The Mediterranean Basin embraces countries between 27 • and 47 • N and between 10 • W and 37 • E extending over three continents and a coastline of 46,000 km [1]. Recent accelerated climate change has exacerbated existing environmental problems in the Mediterranean Basin that are caused by the combination of changes in land use, increasing pollution and declining biodiversity. The expected change in the global climate will significantly affect wheat production, with an extraordinary impact in the Mediterranean basin, where prediction models have projected a rise of temperatures by 3-5 • C and a decrease of annual rainfall by 25-30% in the next decades [2].
In the Mediterranean Basin, wheat is mainly cultivated under rainfed conditions with an irregular precipitation pattern across years and locations, and along the plant growth cycle resulting in major yield variations [3]. In addition, wheat usually experiences terminal drought originated by high temperatures during the grain-filling period [4], causing a reduction in yield potential of about 50% [5]. Therefore, there is a need to improve the selection of crops to be able to maintain acceptable levels of yield and stability in semi-arid environments, which have been identified as the regions most sensitive to the effects of Int. J. Mol. Sci. 2023, 24, 1700 2 of 19 climate change [6]. To achieve that, strategies to retain and increase genetic diversity are being explored since climate change is expected to constrain it [7]. The adaptability and stability of new cultivars that can be successfully grown in dry areas will be the main concern in breeding programs.
The 'green revolution' based on dwarfing genes, photoperiod insensitivity and high yield potential had a demonstrable impact in breeding. However, it was soon sensed that there was a need to accelerate the use of unique genetic diversity. Future yield gains, especially under stressed conditions, will require the exploitation of the largely untapped sources of genetic diversity housed in collections of wheat landraces and wild relatives [8]. Landraces are genetically diverse repositories of unique traits that have evolved in local environments characterized by a wide range of biotic and abiotic conditions. Landraces were developed during the evolution of wheat along new territories by human selection after the advent of agriculture. Mediterranean landraces have a good adaptation to their environments, forming populations with different genetic constitutions and are the reservoir of the greatest genetic variability of the species [1]. The pioneer Mediterranean farmers started selecting the plants with the most favourable characteristics in terms of vigour, phenological adaptation, spike length, and yield with the aim to produce improved lines [1]. Wheat landrace collections contain wider genetic diversity than most breeding programs, including adaptation to different conditions according to the place of origin [9]. Knowledge of the genetic diversity and population structure of landraces is essential for their conservation and efficient use in breeding programs [10,11], especially concerning the field of adaptation to climate change [9]. To achieve that, several studies using molecular markers such as simple sequence repeats (SSRs) or single nucleotide polymorphism (SNP) are currently conducting since they have been proven to be very useful for evaluating the genetic diversity and population structure of Mediterranean wheat collections [10,12]. Several efforts were carried out to identify the genetic loci responsible for the changes that occurred at the genome level in wheat during the breeding process by eigenGWAS [13,14].
Under drought conditions, wheat productivity can vary depending on the phenological stage at which the water deficit occurs [15], being larger when water is limited at reproductive stages than if it occurs only at the vegetative stage [16]. Therefore, matching phenology to growing season length, changing the cultivar day length and temperature response could be a good prospect of adaptation to climate change [17]. The genetics of flowering time in wheat is complex due to a strong genotype x environment (GxE) interaction [18]. The genetics of wheat development are determined mainly by the allelic diversity within the loci regulating the vernalization requirement (Vrn) and photoperiod sensitivity (Ppd). The third group of genes controlling earliness when the vernalization and photoperiod requirements are accounted for are the earliness per se loci (Eps), characterized by a polygenic inheritance and lower effects than Vrn and Ppd loci [19].
To support breeding for the development of climate-adapted cultivars, different studies have investigated the impact of the environment in domestication and found evidence of swift evolution in response to environmental changes [20]. New approaches using long-term climatic data of the regions of origin of germplasm collections in combination with genomic analyses are useful for detecting genome regions controlling adaptation to environmental conditions. It is the so-called environmental (env)GWAS [14,21].
In the present study, we aimed to identify the genomic regions from a collection of wheat Mediterranean landraces controlling (1) the phenology fitting as a mechanism to escape drought and heat episodes, (2) the response to environmental conditions by envGWAS, and (3) the differentiation patterns among the landrace subpopulations by eigenGWAS.

Climatic Data and Phenology Assessment
Long-term climatic data of the 23 countries origin of de Mediterranean landraces (Supplementary Materials Table S1) were collected. A period of 15 years was used to determine the following variables: average daily values for minimum, maximum, and mean temperatures (Tmin, Tmax and Tmean, • C), sunshine (h), solar radiation (Rad, MJ m −2 day −1 ), relative air humidity (Rh, %), potential evapotranspiration (ET0, mm) and rainfall (Rain, mm). Climatic data from each country were averaged for two main periods: sowing to anthesis (SA) and anthesis to maturity (AM) ( Table 1).
Field trials were carried out in a typical Mediterranean climate characterized by an irregular pattern of rainfall distribution during the year, low temperatures in winter that rise abruptly in spring, and high temperatures until the end of the crop cycle. Table S2 shows the rainfall and maximum and minimum temperatures during the crop cycle in the 3 years of field trials and the average of the last 15 years (2006-2021). Precipitation and temperature values were representatives of long-term data from the region for each growing season. However, 2017 was considered exceptionally dry due to the low rainfall. On the other hand, 2018 was the wettest year during the crop cycle, with 262 mm of rainfall (higher than the average of 15 years), whereas the first and second growing seasons, with 207 and 179 mm, respectively, were rather dry. The crop suffered severe water scarcity during the grain filling period in 2016 (3.5 mm) and 2018 (8.7 mm). The years with the lowest rainfall before and after booting were 2016 (22.2 mm) and 2017 (25 mm), respectively. The warmest winter occurred in 2016, especially during January and February, with temperatures above the average for the period of 15 years.
Results of ANOVA showed that all the traits showed statistical differences for the year, climatic zone and genetic subpopulation. In contrast, when the interaction between year and climatic zone, and year and genetic structure were considered, only D87 and GFD were statistically significantly different at p < 0.05 ( Table 2).
Comparisons of the mean values of the phenology traits recorded in landraces during the three years of field trials (Table S3) revealed significant differences across years for all traits ( Table 2). The shortest growing cycle was found during 2016, whereas this year also showed the longest DBA and GFD. In addition, 2017 showed the shortest periods from booting to anthesis and grain filling. Considering the climatic zones defined by [22], a clear separation between the north and south of the Mediterranean basin is reflected. No significant differences were observed between North Balkan and North Coast zones, whereas, for those in the south, only significant differences were observed for DBA. Landraces from the south of the Mediterranean showed shorter crop cycle. According to their genetic structure, although not statistically significant, those landraces from northern Mediterranean countries showed a longer crop cycle but shorter GFD (statistically significant) due to a longer period till anthesis. Western and eastern Mediterranean landraces did not show statistically significant differences in phenology except for DBA with a shorter period for eastern landraces.
The bidimensional clustering shown in Figure 1 represents the relationships among accessions and their mean phenotypic performances. The horizontal cluster grouped the landraces according to their phenotypic similarity according to the traits in the vertical cluster. The horizontal clustering separated three main groups. Group A was characterised by longer GFD but lower values for the rest of the traits, whereas group B was characterised by shorter GFD but longer D45, D55, D65 and D87. The first cluster within this group is distinguished because of the shorter D87. Group C showed intermediate values, separated into two main clusters based on higher values of the traits except for GFD. Groups A and C are composed mainly of south Mediterranean landraces (76% and 61%, respectively), whereas 95% of the landraces in group B corresponded to north Mediterranean landraces. From the total of southern landraces in group A, 68% corresponded to the South East climatic zone defined by Royo et al. [22], whereas 90% of the southern landraces in group C corresponded to the South West climatic zone. If the genetic subpopulation (SP) is considered, the main SP for the groups A, B and C were SP3 (East Mediterranean) (47%), SP2 (North Mediterranean) (68%), and SP1 (West Mediterranean) (51%), respectively. Tmax, maximum temperature ( • C); Tmin, mínimum temperature ( • C); Tmean, mean temperature ( • C); Rh, relative humidity (%); Sunshine (h); Rad, radiation (MJ m −2 day −1 ); ET, evapo-transpiration (mm); Rain, rainfall (mm) respectively), whereas 95% of the landraces in group B corresponded to north Mediterranean landraces. From the total of southern landraces in group A, 68% corresponded to the South East climatic zone defined by Royo et al. [22], whereas 90% of the southern landraces in group C corresponded to the South West climatic zone. If the genetic subpopulation (SP) is considered, the main SP for the groups A, B and C were SP3 (East Mediterranean) (47%), SP2 (North Mediterranean) (68%), and SP1 (West Mediterranean) (51%), respectively.

Detection of Loci for Phenology Adjustment
A total of 306 MTAs were detected for phenology traits, with D55 showing the highest number of associations (75) and DBA the lowest (32). Seventeen chromosomes reported MTAs (Table S4). Chromosome 7A reported 48 associations, most of them (60%) at 208 cM, followed by chromosome 1B with 42. Chromosomes 1D and 7D harboured only one MTA, and chromosome 3A, only 2.

Detection of Loci for Climatic Variables by envGWAS
Climatic variables were separated into two groups: (1) sowing to anthesis (SA), and (2) anthesis to maturity (AM). Sixty-three MTAs corresponded to the first period and 195 to the second one. The number of MTAs from SA ranged from 11 for Rh and solar radiation to 5 for Tmin and Tmean, whereas AM ranged from 93 for Tmin to 5 for rainfall (Table S4). The MTAs for climatic variables were detected throughout the genome except for 2D and 4D chromosomes. Chromosome 1D reported the maximum number of MTAs (75), and all of them were related to Tmin during the AM period and were in a region of approximately 19 cM (160.85-179.54). On the other hand, chromosomes 3D and 7D reported the lowest number of MTAs (2).

Detection of Loci for Genetic Differentiation by eigenGWAS
After data filtering (duplicated patterns, markers with more than 25% of missing values, genotypes and markers with minor allele frequency lower than 5%), a total of 10,458 SNPs were used for molecular analyses of the bread wheat Mediterranean landraces, as reported by Rufo et al. [12].
EigenGWAS was conducted using the top five eigenvectors resulting from the PCoA obtained for the whole collection of genotypes ( Figure 2). Genotypes were clearly separated by their genetic SP [12] into three clusters based on their origin: West Mediterranean (SP1), North Mediterranean (SP2), and East Mediterranean (SP3). Genotypes with a high level of admixture remained in a central position between the three SPs ( Figure 1A). A high level of admixture was observed when genotypes are identified by their climatic zone ( Figure 1B). South East genotypes were mostly clustered on the negative axes, opposite to the north Balkan genotypes. In comparison, those from the North Coast and South West (the regions with intermediate values) were widely distributed in the PCoA.
2) anthesis to maturity (AM). Sixty-three MTAs corresponded to the first period and 195 to the second one. The number of MTAs from SA ranged from 11 for Rh and solar radiation to 5 for Tmin and Tmean, whereas AM ranged from 93 for Tmin to 5 for rainfall (Table  S4). The MTAs for climatic variables were detected throughout the genome except for 2D and 4D chromosomes. Chromosome 1D reported the maximum number of MTAs (75), and all of them were related to Tmin during the AM period and were in a region of approximately 19 cM (160.85-179.54). On the other hand, chromosomes 3D and 7D reported the lowest number of MTAs (2).

Detection of Loci for Genetic Differentiation by eigenGWAS
After data filtering (duplicated patterns, markers with more than 25% of missing values, genotypes and markers with minor allele frequency lower than 5%), a total of 10,458 SNPs were used for molecular analyses of the bread wheat Mediterranean landraces, as reported by Rufo et al. [12].
EigenGWAS was conducted using the top five eigenvectors resulting from the PCoA obtained for the whole collection of genotypes ( Figure 2). Genotypes were clearly separated by their genetic SP [12] into three clusters based on their origin: West Mediterranean (SP1), North Mediterranean (SP2), and East Mediterranean (SP3). Genotypes with a high level of admixture remained in a central position between the three SPs ( Figure 1A). A high level of admixture was observed when genotypes are identified by their climatic zone ( Figure 1B). South East genotypes were mostly clustered on the negative axes, opposite to the north Balkan genotypes. In comparison, those from the North Coast and South West (the regions with intermediate values) were widely distributed in the PCoA. The largest eigenvalue was 4075.3, explaining 7% of the genetic variation, whereas the 5th eigenvalue was 1601.6, explaining 3% of the genetic variation. The top five eigenvalues accounted for 22% of the genetic variation, which indicates the complexity of the population structure of the collection. The largest eigenvalue was 4075.3, explaining 7% of the genetic variation, whereas the 5th eigenvalue was 1601.6, explaining 3% of the genetic variation. The top five eigenvalues accounted for 22% of the genetic variation, which indicates the complexity of the population structure of the collection.
A total of 87 MTAs were identified for the top five eigenvectors using a moderate threshold of -log10 p = 3.0 (Table S4)

Identification of QTL Hotspots
To simplify all this information and to identify consensus genomic regions controlling different traits, QTL hotspots were identified using the QTL overview index defined by Chardon et al. [23] for each cM of the genetic map reported by Wang et al. [24]. Confidence intervals (CIs) were calculated based on the extent of the linkage disequilibrium (LD) for each chromosome following Rufo et al. [12]. A total of 567 positions were detected using as a threshold the mean of the overview index across the 21 chromosomes (0.19), corresponding to 153 peaks. In contrast, using a high threshold (0.93), a total of 173 positions were identified over the threshold, corresponding to 69 peaks ( Figure 3).
To simplify all this information and to identify consensus genomic regions controlling different traits, QTL hotspots were identified using the QTL overview index defined by Chardon et al. [23] for each cM of the genetic map reported by Wang et al. [24]. Confidence intervals (CIs) were calculated based on the extent of the linkage disequilibrium (LD) for each chromosome following Rufo et al. [12]. A total of 567 positions were detected using as a threshold the mean of the overview index across the 21 chromosomes (0.19), corresponding to 153 peaks. In contrast, using a high threshold (0.93), a total of 173 positions were identified over the threshold, corresponding to 69 peaks ( Figure 3). These 69 peaks were reduced to 46 QTL hotspots (Table S5), 16 in genome A, 26 in genome B and 4 in genome D. To simplify the search for candidate genes (CGs), QTL hotspots were excluded when the CI was higher than 20 Mb (Table 3). These 69 peaks were reduced to 46 QTL hotspots (Table S5), 16 in genome A, 26 in genome B and 4 in genome D. To simplify the search for candidate genes (CGs), QTL hotspots were excluded when the CI was higher than 20 Mb (Table 3).
Thirty-three QTL hotspots remained for subsequent analyses, with 12, 18 and 3 hotspots for genomes A, B and D, respectively. When the physical position of the SNPs was considered, QTL hotspots 1B.1 and 1B.2, and 1D.1 and 1D.2 showed overlapping positions, thus, they were considered single hotspots (1B.1-2 and 1D.1-2). According to the physical positions for functional genes reported by Liu et al. [25], some of the QTL hotspots co-localize with them. Sec-1, linked to the 1B/1R translocation, was found to be within the hotspot 1B.3, the flowering genes TaELF3-B1 and TaELF3-D1 corresponded to the hotspots 1B.8 and 1D.2, respectively, the photoperiod sensitivity gene Ppd-A1 was in hotspot 2A.1 and the vernalization gene Vrn-A1 in 5A.3. Finally, the position of the phytoene synthase gene Psy-A1 matched with the position of QTL hotspot 7A.5.
To identify the genomic regions most involved in subpopulation differentiation, QTL hotspots detected by eigenGWAS were analysed for allelic frequencies within them. Eight out of the thirteen QTL hotspots showed differences in the allele frequency of the markers associated with differentiation patterns (Table 4). A threshold of allele frequency within a group was set at 80% to identify robust differences among groups, whereas moderate differences were determined at 60%. The most considerable difference among SPs was for QTL hotspot 7B.2, where SP1 showed a clear different pattern from the other SPs. QTL hotspot 2B.1 helped to distinguish SP2 from SP3 with a moderate and robust threshold, respectively. A moderate threshold was observed in QTL hotspot 5B.1 to differentiate SP3 from the other SPs. The marker RAC875_c19099_434 (T/C) in QTL hotspot 5B.3 discriminates SP1 (T allele) from SP3 (C allele), whereas GENE-1074_108 (T/C) in QTL hotspot 6B.2, SP1 (T allele) from SP2 (C allele).

In Silico Analysis of Candidate Genes
A total of 1097 candidate genes (CG) were found within the selected 33 QTL hotspots (Table S6). A total of 371 gene families were observed among the CGs, 10% of the genes corresponded to receptor-like kinases, 6% to F-box proteins and 4% to disease resistance proteins.
To classify these genes according to their molecular function (MF), biological process (BP) and cellular component (CC) gene ontology (GO) terms were downloaded from https://wheaturgi.versailles.inra.fr/Seq-Repository/Annotations (accessed on 1 September 2022). Eight hundred and seven CGs were classified according to their MF, 148 according to their BP and 78 according to their CC. The main gene families within MF ontology were protein binding (26% of the genes), nucleic acid binding (16%) and protein kinase activity (13%). For their BP, genes involved in oxidation-reduction processes were 23%, followed by metabolic process (21%) and proteolysis (9%). Finally, according to their cellular location, 45%, 17% and 13% of the genes corresponded to membrane, ribosome, and nucleosome, respectively.
Subsequently, a search for differentially expressed genes (DEGs) upregulated under three abiotic stress conditions reported in http://www.wheat-expression.com (accessed on 1 September 2022) was conducted to identify the best CGs. These conditions comprised (1) drought and heat stress time-course in seedlings, (2) spikes with water stress, and (3) seedlings treated with polyethylene glycol (PEG) to simulate drought. Forty-two CGs were filtered by performing DEG in different tissues: roots, shoots/leaves, spikes, and grain. To reduce the complexity of the analysis, only those DEGs with a difference larger than 2 tpm from no stress to abiotic stress conditions were considered, leaving 9 CGs that were upregulated under abiotic stress in 8 QTL hotspots from 4 chromosomes (Figure 4). From them, 3 DEG corresponded to receptor-like kinases expressed in roots under drought stress.
Two transcripts were ATP-dependent chaperone ClpB upregulated in shoots/leaves. The other gene families corresponded to an ethylene-responsive transcription factor, also upregulated in shoots/leaves; a cytochrome c oxidase subunit 3 upregulated in shoots/leaves and spikes, and, finally, a Ras-like GTP binding protein and NADH-ubiquinone oxidoreductase chain 6 upregulated in the spikes. Any of the genes were found to be upregulated in the grain. Four DEGs were located on chromosome 1B 3 on 5B, and one on chromosomes 1A and 7A. ducted to identify the best CGs. These conditions comprised (1) drought and heat stress time-course in seedlings, (2) spikes with water stress, and (3) seedlings treated with polyethylene glycol (PEG) to simulate drought. Forty-two CGs were filtered by performing DEG in different tissues: roots, shoots/leaves, spikes, and grain. To reduce the complexity of the analysis, only those DEGs with a difference larger than 2 tpm from no stress to abiotic stress conditions were considered, leaving 9 CGs that were upregulated under abiotic stress in 8 QTL hotspots from 4 chromosomes (Figure 4). From them, 3 DEG corresponded to receptor-like kinases expressed in roots under drought stress. Two transcripts were ATP-dependent chaperone ClpB upregulated in shoots/leaves. The other gene families corresponded to an ethylene-responsive transcription factor, also upregulated in shoots/leaves; a cytochrome c oxidase subunit 3 upregulated in shoots/leaves and spikes, and, finally, a Ras-like GTP binding protein and NADH-ubiquinone oxidoreductase chain 6 upregulated in the spikes. Any of the genes were found to be upregulated in the grain. Four DEGs were located on chromosome 1B 3 on 5B, and one on chromosomes 1A and 7A.

Synteny Analysis within Cereal Species
Markers from the selected QTL hotspots were launched against the mapping pipeline of the Brachypodium, rice and maize genomes, as reported in Marcotuli et al. [26]. Twenty-four out of the 33 selected QTL hotspots reported collinearity with the other genomes. The number of collinear QTL hotspots was 23, 21 and 7 for Brachypodium, rice and maize, respectively. From those hotspots, a total of 97 markers were identified in 79 syntenic regions of the three genomes, designated as ortho-QTL (Table 5). From them, 84, 57, and 9 markers were common with Brachypodium, rice and maize genomes. Among those genomes, 40 markers were in common between Brachypodium and rice, 3 between Brachypodium and maize, and 7 markers were in common in the three genomes ( Figure 5).

Discussion
Genetic diversity is crucial in plant breeding as it broadens the source of new alleles for essential genes. The use of wild relatives or landraces that are well adapted to their regions of origin is of special interest when breeding in suboptimal environments such as the Mediterranean basin to improve the modern cultivars to face the challenges of climate change [19]. In a previous study using the same collection of landraces, Royo et al. [22] found differences in the agronomic performance of the landraces from the northern and southern areas of the Mediterranean Sea. The authors found a clear association between these traits and the climatic conditions of the countries of origin of the landraces. According to these results in this study, we aimed to identify the genomic regions from the Mediterranean wheat landraces controlling 1) the phenology fitting as a mechanism to escape drought episodes, 2) the response to environmental conditions by envGWAS, and 3) the differentiation patterns among the landrace subpopulations by eigenGWAS.

Phenology Fitting
Phenology was differentiated by climatic zone, increasing from the warmest and driest regions to the coldest and wettest ones, showing statistically significant differences between the north and south coast but not within them. This agrees with Gooding et al. [27], who reported a reduction in flowering time in wheat in high temperature and Examining collinear regions within the four genomes outcomes in identifying 20 ortho-QTL containing 41 orthologous genes with similar functions that can be considered promising candidate genes controlling traits of interest across species. From them, 27 genes were orthologous between wheat and Brachypodium, 4 between wheat and rice, 2 between wheat and maize, 5 between wheat, Brachypodium and rice, 2 between wheat, Brachypodium and maize, and, finally, 1 gene was orthologous in the four species (Table S7).

Discussion
Genetic diversity is crucial in plant breeding as it broadens the source of new alleles for essential genes. The use of wild relatives or landraces that are well adapted to their regions of origin is of special interest when breeding in suboptimal environments such as the Mediterranean basin to improve the modern cultivars to face the challenges of climate change [19]. In a previous study using the same collection of landraces, Royo et al. [22] found differences in the agronomic performance of the landraces from the northern and southern areas of the Mediterranean Sea. The authors found a clear association between these traits and the climatic conditions of the countries of origin of the landraces. According to these results in this study, we aimed to identify the genomic regions from the Mediterranean wheat landraces controlling (1) the phenology fitting as a mechanism to escape drought episodes, (2) the response to environmental conditions by envGWAS, and (3) the differentiation patterns among the landrace subpopulations by eigenGWAS.

Phenology Fitting
Phenology was differentiated by climatic zone, increasing from the warmest and driest regions to the coldest and wettest ones, showing statistically significant differences between the north and south coast but not within them. This agrees with Gooding et al. [27], who reported a reduction in flowering time in wheat in high temperature and drought environments during the crop cycle. When the phenology is compared by the population structure, SP2 showed a longer cycle till booting, heading and anthesis, but shorter GFD; although this is contradictory according to the geographical distribution of SP2 landraces, mainly north Mediterranean countries, it may be explained by the higher level of admixture among Mediterranean landraces SP as reported by Rufo et al. [12]. However, other studies [28] reported a decrease in GFD due to late anthesis. The bidimensional clustering was helpful to visualize the variability within the collection and identifying phenological connections among landraces. The classification followed mainly the separation among climatic zones into three groups, South West (group A), north Mediterranean (group B, including north coast and north Balkan) and South East (group C). However, identifying a trend to distinguish genetic subpopulations according to phenology data was not as good as with climatic zones.
The dissection of genetic architecture for phenology in Mediterranean wheat landraces revealed the presence of 26 QTL hotspots (57%) with significant associations. According to the position of functional genes described by Liu et al. [29], major genes affecting photoperiod sensitivity and vernalization were in QTL hotspots 2A.1 (Ppd-A1) and 5A.3 (Vrn-A1), respectively. However, no MTAs were found in the homologous regions corresponding to the Ppd and Vrn genes on chromosomes 2B and 2D, and 5B and 5D, respectively. These contrast regions that were detected might be due to selection pressure during the domestication by selecting the alleles/genes targeted to environmental factors, growing type, phenological traits, yield and local adaptation. Continuity of the research in these regions can lead to underpinning the genes controlling these loci. Dissecting the complex genetic architecture to identify the favorable regions and haplotypes could be beneficial for the development of new varieties able to skip the drought periods by regulating their phenological stages.

Genetic Control of Environmental Conditions
EnvGWAS has been reported as a valuable and complementary approach to identify genomic regions related to adaptation to abiotic stress [21]. The hierarchical clustering reported by Royo et al. [22] using the long-term climatic data of the main growing areas of wheat in the Mediterranean basin showed a clear geographic pattern separating the north coast countries from the southern ones. Climatic data are highly correlated with the adaptation of the crop. As it is shown by different authors [22,29], PCoA using molecular data followed the climatic classification reported by these authors when genotypes were grouped by their country of origin. However, the high level of admixture and genetic exchange in the Mediterranean landraces according to Rufo et al. [12] revealed an unclear pattern when comparing PCoA using structure or climatic data.
In the present work, envGWAS, using the climatic variables from the region of origin of the Mediterranean wheat landraces, successfully detected genome regions involved in the control of traits related to environmental variation. QTL hotspots showed that most of the significant associations related to temperature were reported during the grain filling period, which, according to Royo et al. [22], is one of the variables that most contribute to differentiate the landraces from the north and south coast of the Mediterranean basin. Taking advantage of the benefits of envGWAS can lead to improving the knowledge of the adaptation of the Mediterranean landraces to their specific environments.

Genetic Loci for Differentiation Patterns among Mediterranean Landraces
Eigenvectors are commonly used to deduce the population's genetic structure because they are estimated for each genotype. In this way, different studies have indicated the suitability of primary eigenvectors to infer population differentiation [30][31][32], and posteriorly Chen et al. [33] developed the approach called eigenGWAS to identify genomic regions causing genetic differentiation. This approach has been used mainly to identify selective sweeps produced by breeding among old cultivars or landraces and modern varieties [13,25,34,35]. These authors found that most of the selective sweeps corresponded to regions involved in yield potential, phenology, plant height, and biotic and abiotic resistance.
In this study, we used eigenvectors to understand the genetic differences among wheat Mediterranean landraces. Out of the eleven QTL hotspots reporting significant associations by eigenGWAS, six hotspots on chromosomes 2B, 5B, 6B, 7A and 7B showed allelic or haplotype differences among genetic subpopulations, thus considering them as the main drivers of genetic differentiation among Mediterranean landraces. Four of these regions also showed associations with climate traits and three with phenology fitting, indicating not only genetic differentiation but also adaptive differentiation among different environments. These loci can be beneficial for breeding purposes by spotlighting the breeders to know the contribution of the genomic footprints in the wheat adaptation and response to the climatic change by selecting the high frequency of favorable alleles landraces and integrating them in the breeding programs.

Candidate Genes
To reduce genomic complexity and the number of CGs, the search was performed only in the selected QTL hotspots, such as those hotspots with a confidence interval lower than 20 Mb. The gene annotation from the "Chinese spring" reference genome sequence [36] allowed us to identify 1097 gene models within the 33 QTL hotspots. Mining of CGs was achieved by looking for differentially expressed genes (DEGs) upregulated under abiotic stress in different tissues through in silico analysis at http://www.wheat-expression.com (accessed on 1 September 2022). Only those genes with higher differences between nonstressed conditions and abiotic stress were selected. Among them, different gene families previously reported for their involvement in various stress responses were found: Kinases, Ras-like GTP binding proteins and ethylene responsive transcription factors.
Protein kinases play crucial roles in plant responses to different stress conditions (reviewed in [37]). Among them, the MAPKs are involved in ABA signalling and drought stress regulating root growth and stomatal closure; CPKs play roles in signalling modules in different abiotic stresses such as drought, heat, cold and salt; RLKs are a large group of kinases anchored to the cellular membrane, thus acting in the transmission of extracellular signals into the cells. They regulate fertilization, plant growth and development, and plant response to biotic and abiotic stresses. Differentially expressed kinases were also found in a previous study of QTL meta-analysis for abiotic stress (among others) in durum wheat by Soriano et al. [38].
A Ras-related GTP binding protein was found in hotspot 1B.5. These proteins have been involved in drought stress tolerance by modulating stomatal movements mediated by ABA [39].
Finally, an ethylene-responsive transcription factors (ERF) was upregulated in leaves. These genes have been reported to be involved in different processes. Djmal and Khoudi [40] studied the tolerance to heavy metals (HM) regulated by the durum wheat TdSHN1 in yeast and transgenic tobacco, concluding that the gene could improve HM tolerance in plants and phytoremediation of HM-contaminated soils. Gao et al. [41] studied the expression of the wheat ERF TaERFL1a. This gene is a member of the AP2/ERF family and was remarkably induced in wheat seedlings suffering freezing stress. In their study, the authors showed that its expression was rapidly upregulated in response to salt, cold, and water deficiency, suggesting roles in the responses to abiotic stresses.
Interestingly, in QTL hotspot 1B.6, a Cytochrome c oxidase gene was upregulated in leaves and spikes (TraesCS1B01G413200). This gene was previously found to be upregu-lated in spikes by Rufo et al. [42] in a GWAS for yield and vegetation indices-related traits using the same panel of landraces but also including a panel of modern wheat cultivars.

Synteny among Cereal Species
The study of synteny among different species and the identification of orthologous genes suggest that they may be associated with regulatory elements affecting many genes [43] and leading to the identification of key genes controlling important traits across species. In this study, we used the selected QTL hotspots to investigate the collinearity with other cereal species: Brachypodium, maize and rice (ortho-QTL).
The analysis revealed a higher synteny among the wheat, Brachypodium and rice, whereas a lower one among wheat and maize. The candidate gene investigation allowed the correlation between some ortho-QTL with genes. Only one gene, a 4-Coumarate: CoA ligase, was common among the four genomes. These proteins were observed to play a pivotal role in cell wall synthesis and abiotic stress in Fraxinus mandshurica [44]. The authors found that Fm4CL is involved in secondary cell wall development and lignin synthesis. Fm4CL plays an important role in osmotic stress by affecting cell wall and stomatal development. In rice, these proteins were involved in fungal resistance induced by reactive oxygen species [45]. The collinearity among cereal genomes could suggest a common mechanism of action to respond at stress conditions, which may facilitate the study of such complex traits.

Plant Material
A panel of 153 bread wheat (Triticum aestivum L.) landraces from 23 Mediterranean countries, including Balkan peninsula and Portugal, were used in the current study (Table  S1). Landrace populations were provided by public gene banks from Germany (IPK, Gatersleben), Italy (ISC, S. Angelo Lodigiano), Romania (Suceava GenBank, Suceava), Russia (VIR, St. Petersburg), Spain (CRF-INIA, Madrid), the Netherlands (CGN-WUR, Wageningen) and the USA (NSGC-USDA, Aberdeen, ID). Accessions were bulk purified during two cropping cycles to select the dominant type, and the seeds were increased on plots in the same field during 2015 to ensure a common origin for all lines.

Phenology Assessment
Field experiments were conducted under rainfed conditions in Gimenells, Lleida, northeast Spain, during 2016, 2017 and 2018 harvesting seasons, as reported in Rufo et al. [42]. The experiments consisted of a non-replicated augmented design with two replicated checks, the cultivars 'Anza' and 'Soissons', at a ratio of 1:5 between checks and tested genotypes. The sowing density was adjusted to 250 germinable seeds/m 2 , and the sowing dates were 2 December 2015; 21 November 2016; and 15 November 2017, whereas harvesting dates were 7 July 2016; 5 July 2017; and 5 July 2018. Weeds and diseases were controlled following standard practices at the site.
The development of plants at each plot was monitored twice a week for the following growth stages: D45 (boots swollen), D55 (heading), D65 (anthesis) and D87 (physiological maturity) [46]. A plot was considered to reach a given developmental stage when approximately 50% of the plants presented the specific characteristics of the stage. Days from booting to anthesis (DBA) were calculated as the number of days from booting to anthesis, and grain filling duration (GFD) was calculated as the number of days from anthesis to physiological maturity.
Phenotypic data were fitted to a linear mixed model considering the check cultivars as the fixed effect, and the row and column number and accessions as random in the model for each environment following the MIXED procedure of the SAS-STAT statistical package (SAS Institute Inc., Cary, NC, USA) y = Xβ + Zγ + ε where β is an unknown vector of fixed-effects parameters with known design matrix X, γ is an unknown vector of random-effects parameters with known design matrix Z, and ε is an unknown random error vector whose elements are no longer required to be independent and homogeneous. Restricted maximum likelihood (REML) was used to produce the best linear unbiased predictors (BLUPs). Analyses of variance (ANOVA) were conducted to assess differences among years and climatic zone or genetic subpopulations. Genotype within a climatic zone or genetic subpopulation was used as an error term. Mean comparisons between years, climatic zone and genetic subpopulation, were performed using the Tukey's HSD test at p ≤ 0.05. Mean phenotypic values across the 3 years were used to perform a hierarchical cluster analysis by the Ward method [47]. Both analyses were carried out using JMP v14.2.0 statistical package (SAS Institute, Inc., Cary, NC, USA).

Environmental Variables
Long-term climatic data of the 23 countries origin of landraces were collected using the CROPWAT software (http://www.fao.org/land-water/databases-and-software/cropwat (accessed on 1 September 2022)) from the CLIMWAT 2.0 FAO database. A period of 15 years of data (2006-2021) from 3 to 7 climatic stations located in the main wheat-growing areas of each country was used to determine the following variables: average daily values for minimum, maximum, and mean temperatures (Tmin, Tmax and Tmean, • C), sunshine (h), solar radiation (Rad, MJ m −2 day −1 ), relative air humidity (Rh, %), potential evapotranspiration (ET 0 , mm) and rainfall (Rain, mm). Climatic data from each country were averaged for the periods 20 November-31 March and 1 April-30 June (Table 1), assuming they represent the two main growing periods of wheat in the region; this is from sowing to anthesis (SA) and from anthesis to physiological maturity (AM) as described in Royo et al. [48].

Genome-Wide Association Analyses
Accessions were genotyped with 13177 SNPs from the Illumina Infinium 15K Wheat SNP Chip at Trait Genetics GmbH (Gatersleben, Germany). After marker filtering, 10,458 SNPs were used for subsequent analyses as reported in Rufo et al. [12].
Genome-wide association (GWAS) was performed using Tassel 5.0 software [49] for phenology, environmental variables and the first five eigenvectors. A mixed linear model (MLM) was fitted using a principal component analysis (PCA) matrix with 6 principal components as the fixed effect and a kinship (k) matrix as the random effect (PCA + K model) at the optimum compression level based on the groups defined by the kinship matrix. Compression levels range from "no compression" (compression = 1) when each genotype belongs to its own group, to "maximum compression" (compression = n) when all genotypes belong to the same group. A common threshold was established at -log 10 p > 3, as previously reported in the literature [42,[50][51][52][53]. Confidence intervals (CIs) for marker-trait associations (MTA) were estimated for each chromosome based on the LD decay reported by Rufo et al. [12] and standardized using the formula reported by Chardon et al. [23]: where CI is the LD decay for each chromosome. To simplify the MTA information, the associations were grouped into QTL hotspots. To define a hotspot, the density of MTAs along the chromosome was calculated as the QTL overview index [23] for each cM of the genetic map reported by Wang et al. [24]: Total length o f map where nbQTL is the number of QTLs and nbE is the total number of experiments.