Genome based QTL and meta-QTL analyses of thousand-kernel weight in tetraploid wheat

Wheat domestication and subsequent improvement formed a wide phenotypic variation in Grain Weight (GW) between the domesticated wheat species and their wild progenitors. GW continues to be an important goal of many wheat-breeding programs and yet, although studies found many quantitative trait loci (QTLs) for GW, not many genes that underlay these loci were identified. Here we performed QTL analysis for GW using a Recombinant Inbred Line (RIL) population based of a cross between wild emmer wheat accession ‘Zavitan’ and durum wheat variety ‘Svevo’. Using the recent Zavitan genome assembly, we anchored the identified QTLs to the reference sequence and added the positions of previously published QTLs for GW in tetraploid wheat. This genome based meta-QTL analysis enabled us to identify a locus on chromosome 6A with a positive effect on GW that was contributed by wild wheat in a few studies. This locus was validated using an introgression line that contains the 6A GW QTL from Zavitan in the background of Svevo with higher grain weight. Using the reference sequence and genes associated with GW from rice, we were able to identify a wheat ortholog in the 6A QTL region to rice gene, OsGRF4. The coding sequence of this gene, TtGRF4-A, showed four SNPs between Zavitan and Svevo. Molecular marker developed for the first SNP showed that the Zavitan allele of TtGRF4-A is rare in a core collection of wild emmer and absent in domesticated emmer genepool. We suggest that TtGRF4-A is a candidate underlying the 6A GW QTL and breeding with its natural Zavitan allele may have the potential to increase wheat yields.


Introduction
Grain weight (GW) in wheat, typically expressed by Thousand Kernel Weight (TKW), is one of the most important determinants of yield together with grain number per unit area (e.g number of grains per spike and number of spikes per plant) (Campbell et al. 1999). GW factors grain size (length, width and area), grain shape and grain density (Gegas et Ozkan et al. 2005). The western population is further subdivided into two subpopulations, designated Horanum and Judaicum, which greatly differ in their morphological characters (Poyarkova et al. 1991). Judaicum is characterized with a tall, upright phenotype, wide spikes, large grains, and is more fertile than the Horanum subpopulation that exhibit a smaller stature, and a more slender spike. The large phenotypic variation in the wild present a potential allelic diversity for wheat improvement. Yet, considering that WEW is advocated as an important resource for wheat improvement (Aaronsohn 1910  Many QTL studies have been done in wheat (PubMed results for 'wheat' AND 'QTL' yields ~1200 results) yet these data seem to have no continuity, which begs the question what is the overlap among these studies? Furthermore, although many QTLs have been located for grain size in wheat, very few panicle (similar to number of grains per spike in wheat) and GW. One example is the GIF1 gene that encodes a cell wall invertase, GIF1 mutants have lower GW due to loosely packed starch granules that reduce the grain density and weight (Wang et al. 2008, Control of rice grain-filling and yield by a gene with a potential signature of domestication). Duan et al. (2016) have shown that GIF1 interacts with GRF4 and that overexpression of GIF1 increases grain size and weight. Rice genotypes with a 2 bp mutation in the GRF4 target site of mir396 had larger grains in rice (Che et al. 2015;Duan et al. 2015;Tsukaya et al. 2016).
In the current study, we aim to identify the genetic elements controlling GW phenotypic variation in wheat and present our results from QTL analyses. We will associate these results with previously published GW QTL data and demonstrate a reference-based meta-QTL analysis. The genome data will also be used to associate rice GW genes and our meta-QTL data, allowing us to focus on a candidate gene underlying a major QTL on chromosome 6A with a positive effect on GW, contributed by wild emmer wheat. *For all experiments: Seeds were disinfected (3.6% Sodium Hypochloric acid, for 10 minutes) and placed for vernalization in a moist germination paper for 3 weeks in a dark cold room (4 o C), followed by 3 days acclimation at room temperature (24 o C), then planted in the field. The two plants at the edges of each plot served as borders, and the remaining three plants were harvested at the end of the experiment to estimate TKW. The field were treated with fungicides and pesticides to avoid development of fungal pathogens or insect pests and was weeded manually once a week.

QTL analysis
QTL analysis followed the same procedure described by Nave et al. (2016), briefly, a reduced version of the Sv × Zv map, containing 472 markers, was used for QTL analysis with the MultiQTL software (http://www.multiqtl.com). Significance of each QTLs was calculated using a permutation test followed by a genotype × environment interaction analysis.

Meta-QTL analysis
The meta-QTL analysis included the TKW data collected in the current study and from previously

Wheat Rice colinearity analysis of yield related genes
We searched the literature for characterized yield-related-genes from rice (Oryza spp.) and aligned their sequences to the Zv genome using blast search. The best hits of this search were compared against the WEW annotation and ortholog WEW genes were identified including their genomic location on the 92 WEW genome.

GRF4-A SNP marker development and allelic variation study
Sequences data with SNP information confined by  Table S10).

Phenotyping
In 2014R and 2015A experiments, the parental lines Sv and Zv, differed in every measure of yield related parameters (Table 2.).   Golan 2015). In these eight studies and our current study, the TKW ranged between 10g to 48g in the wild emmer parents and 30g to 74g in the domesticated parents (including emmer), while population means ranged from 29.9g to 58.9g (Table 3.).
To identify overlap between QTLs, we anchored the peak marker of every TKW QTL from all the studies to the WEW genome by a BLAST alignment. The best alignment was chosen by the highest percent of identity, e-value and agreement with the genetic maps. This process was successful in most cases, except when the marker sequence was absent from the public databases (wPt-9555 and gwm263    Figure. 2 Meta-QTL analysis of TKW QTLs using the WEW genome assembly. X-axis shows the position on the WEW genome (1 unit = 100Mb) and the y-axis shows the LOD score. Each study is represented in a different color (see legend) and the positive allele is indicated as (d) for durum, (w) for wild, or (e) for emmer.
Next, we focused our efforts to study the meta-QTL on chromosome 6A (designated mQTL-GW-6A) because it showed consistent contribution of higher TKW from WEW, therefore may present genetic diversity with breeding potential that is currently absent from the domesticated genepool.

Validation of the mQTL-GW-6A using Sv × Zv introgression lines
To learn more about mQTL-GW-6A, we selected a BC3F5 introgression line designated NIL-21.  Figure 3A and B).

GRF4-A polymorphisms
Sequence comparison of GRF4-A 1227-bp coding sequence in Zv and Sv (Maccaferri et al, in publication) showed four SNPs in positions 93, 342, 5610 and 5661. The first SNP is synonymous but the other three SNPs translate into three AA changes between Zv and Sv; P83S, R319G and G336S.

Allelic diversity study using molecular marker for the GRF4-A
We have developed a molecular marker based on the SNP in position 93 of GRF4-A. This marker was used to genotype a core collection of wild and domesticated tetraploid genotypes. The results showed that only two additional WEW genotypes (WE-10 and WE-12, both from Israel, Table S1) carried the Zv allele while all the rest of the accessions (wild and domesticated) carried the Sv allele (Table S1). The copyright holder for this preprint (which was this version posted September 12, 2018. ; https://doi.org/10.1101/415240 doi: bioRxiv preprint

Discussion
It is well established that domesticated wheat has heavier grains than its wild progenitor, the grain of domesticated wheat is usually also wider and shorter while wild wheat has longer and narrower grains Here, we used the WEW reference genome to anchor the QTL markers using their sequences alignment to the genome. This process was efficient as we were able to find the physical location of most QTL markers. This strategy allowed a straightforward comparison between the results from all the QTL studies, accurately locate similar overlapping QTLs, avoiding the need to have even one common marker between the populations. The current study focused on TKW but the general scheme is true for all QTL experiments in wheat, which can now be analyzed using a reference genome and without genetic distance estimations. Our meta-QTL analysis showed more than 10 loci that are associated with higher TKW from wild wheat and we chose to focus on a 6A locus that showed 1, yet since the meta-QTL on 6A did not include TaGW2 we think that the 6A QTL is independent of the TaGW2 effect.
Classically, the next step in genetic dissection of a QTL region would include saturation of the region with critical recombinant plants ( Distelfeld et al. 2004). This is also valid in the case of mQTL-GW-6A where further validation using backcrossed NIL-21.1 progeny is needed in order to clean the background from other wild introgressions and to reduce the 6A introgression. This process is time consuming taking typically few years but it usually allows a thorough examination of the QTL effects, including the study of trade-off with other yield components, and genotype by environment interactions. Alternatively, we decided to proceed with a candidate gene approach using knowledge from the literature about yield related genes. to possess a more robust grain phenotype than the more widespread Huranum subpopulations (Poyarkova et al. 1991;Ozkan et al. 2011;Sela et al. 2014). Therefore, we suggest that the polymorphisms in GRF4-A may be associated with those phenotypic differences between the two subpopulations.

Conclusions
We showed here that the recent assembly of the wild emmer genome opened the way for genome based genetic dissection of phenotypic variation. The existence of a high quality ordered genome facilitate colocalization of QTLs from different studies and different organisms (e.g rice). Combining such meta-QTL study with a well annotated genome can point out to a potential gene underlying the studied trait.
GRF4-A, the ortholog of yield related gene in rice, OsGRF4, was associated with mQTL-GW-6A, a meta-QTL with a positive effect on grain size originating from WEW. GRF4-A marker may be related to the differences between the Huranum and Judaicum subpopulations of WEW and the Zv allele of this gene is absent from domesticated wheat genepool. GRF4-A appears to be a valid target for genome editing and the integration of the Zv allele in different backgrounds is needed in order to assess its potential to regulate grain size and increase yields in wheat.  Slovenia  PI 377658  dicoccum  domesticated  -DE-21  Croatia  PI 264964  dicoccum  domesticated  -DE-22  Bosnia and Herzegovnia  PI 434995  dicoccum  domesticated  -DE-23  Iran  PI 254158  dicoccum  domesticated  -DE-24  Iran  PI 254169  dicoccum  domesticated  -DE-26  Central Turkey  PI 470739  dicoccum  domesticated  -DE-27  Central Turkey  PI 470738  dicoccum  domesticated  -DE-28  Armenia  PI 94661  dicoccum  domesticated  -DE-29  Central Turkey  PI 470737  dicoccum  domesticated  -DE-30 Georgia PI 326312 dicoccum domesticated -DDW Italy Svevo Durum domesticated -not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.