Identification of Genetic Loci on Chromosome 4B for Improving the Grain Number per Spike in Pre-Breeding Lines of Wheat

The grain number per spike (GNPS) is an important yield component, and much attention is given to the increase in GNPS for current yield improvement of common wheat. Here, a panel of 259 pre-breeding lines and elite commercial varieties were collected for the investigation of 12 agronomic traits, especially for spike-related traits, with 2-year replicates. The high correlation between GNPS and kernel number per spikelet (KNS) suggested that the high GNPS trait in our pre-breeding lines was mainly controlled by grain set number per spikelet. Genome-wide association studies (GWAS) using the 660K SNP genotyping assay suggested that a major locus on chromosomes 4BS contributed to the high GNPS trait, which contributed to 33% and 48% of the variation in KNS and GNPS, respectively. A good diagnostic KASP marker AX-109286577 flanking the 4BS locus was developed for easy selection of the large spike trait. Taken together, the results suggested that untapped rare allele variation in our pre-breeding lines can be used for improvement of the yield component of set grain number per spike.


Introduction
Common wheat (Triticum aestivum, allohexaploid, AABBDD genome) is the most widely grown crop globally, providing approximately 20% of the calories and protein in human diets [1]. Increasing wheat yield is a global challenge for food security, and six complementary approaches have been proposed, including optimizing developmental patterns to maximize spike fertility and grain number, and improving potential grain size and grain filling [2]. Wheat yield components are determined by three main factors, namely the spike number per unit area, grain number per spike (GNPS), and thousandkernel weight (TKW). Breeding practices have indicated that improvement of the spike number per unit area and TKW have increased to very high levels in China and other countries [3][4][5], while the increase in GNPS will therefore be of widespread concern for future yield improvements [2,6].
A few genes have been cloned to increase the GNPS, SNS and KNS. The GNI1 gene located on chromosome 2A was identified as responsible for increased floret fertility and grain number [18]. The physical region of the 7A-QTL revealed that TaAPO-A1 positively controls the spikelet number on the spikes [19,20]. An associative transcriptomic approach identified three candidate genes, TaVRS1-2B, TaTFL1-2D, and TaPAP2-5A, that affect inflorescence development in wheat and increase wheat grain number [21].
New sources of germplasm with maximized grain number per spike are necessary for increasing the yield potential, particularly in regard to spikelet fertility, i.e., the number of kernels per spikelet [22]. Our team continued to pre-breed large spike germplasms and reported two high GNPS wheat germplasm lines Pubing 3228 and Pubing 3504 for increasing grain number [23,24]. By crossing the high GNPS wheat germplasm lines with parents collected worldwide, we produced a panel of pre-breeding lines with diversified traits, including large spike, high grain weight, earliness and dwarf characteristics. Among these agronomic traits, spike-related traits are the most prominent. The panel of pre-breeding lines show the wide phenotypic variation, which is suitable for genome-wide association studies (GWASs). The aim of this study was mainly to evaluate yield component-related trait variations and identify the genetic loci of natural variation by completing a 2-year phenotypic investigation and whole-genome SNP genotyping. GWAS and population genetics analysis were employed to identify the genomic region for increasing grain number per spike, and the study provides us with new genetic resources and genetic loci for improvement of yield potential in wheat.

Plant Materials
A panel of 255 derivative lines from the cross between wheat and Agropyron cristatum were collected. These derivative lines were developed by Professor Lihui Li of the Chinese academy of agricultural sciences since 1998 [25] and show wide genetic diversity. These pre-breeding lines carry the genetic component from the worldwide parents. Among these parents, common wheat-A. , a total of 259 lines were used for population structure and association mapping analysis. All the lines were sown at Xinxiang station of CAAS (Henan Province, China) during the two winter wheat growing seasons of 2016-2017 and 2017-2018. The plant density was 25 cm by 10 cm with a 2 m length. The field was arranged in a randomized complete block design with two replications. The pedigree information is provided in Table S1.

Genotyping
A wheat 660K SNP array designed by the Chinese Academy of Agricultural Sciences and synthesized by Affymetrix was applied to genotype all 259 collected wheat accessions (http://wheat.pw.usda.gov/ggpages/topics/Wheat660_SNP_array_developed_by_ CAAS.pdf, accessed on 30 November 2021). The wheat 660K SNP array contained wide genetic variation from hexaploid and tetraploid wheat, emmer wheat and Aegilops tauschii. All genotypes were filtered by PLINK v1.90p 64-bit software (https://www.cog-genomics. org/plink/1.9, accessed on 30 November 2021). After removing nucleotide variations with missing rates ≥ 0.1 and minor allele frequency (<0.05), 310,097 SNPs were used for the population genetics analysis and GWAS (Table S2).

Population Structure and Phylogenetic Tree Analysis
Population structure was assessed using Admixture 1.3 software [26]. LD decay was analyzed by the PopLDdecay software v3.40 [27]. K values were set from K = 2 to Agronomy 2022, 12, 171 3 of 12 K = 18. The minimum CV error value appeared at K = 16. The phylogenetic tree was constructed with the neighbor joining (NJ) method with the program PHYLIP package 3.697 (https://evolution.genetics.washington.edu/phylip.html, accessed on 30 November 2021) and was visualized using the online tool of the ITOL website (http://itol.embl.de, accessed on 30 November 2021). Principal component analysis (PCA) using whole-genome SNPs was performed with Plink 1.9 software.

Phenotype Evaluation and Correlation Analysis
After harvest, the plant height (PH), fertile tiller numbers, grain number per spike (GNPS), kernel number per spikelet (KNS), total spikelet number per spike, spike length, and fertile spikelet number per spike were investigated. The heading date (Hd) was recorded by counting days from the sowing date to the date when ears of approximately half of the genotypes were fully visible. After threshing, the grain length (GL), grain width (GW) and thousand-kernel weight (TKW) were measured by the image analysis method provided with SC-E software (Hangzhou Wanshen Detection Technology Co., Ltd., Hangzhou, China). We measured five individual plants for each line except for the heading date trait; the average value of individuals was calculated for trait evaluation. All phenotype data are provided in Table S3. The broad heritability and BLUPs for traits were estimated using the R package lme4 [28]. Broad-sense heritability (H 2 ) was calculated using the following formula: H 2 = var (LINE)/[var (LINE) + var (LINE:YEAR)/2 + var (RESID-UAL)/4]. Visualization of a correlation matrix was performed with the R package ggcorrplot (http://www.sthda.com/english/wiki/ggcorrplot-visualization-of-a-correlationmatrix-using-ggplot2, accessed on 30 November 2021). Boxplot figures were drawn by using the R package ggpubr (https://cran.r-project.org/web/packages/ggpubr/index.html, accessed on 30 November 2021).

Genome-Wide Association Studies (GWAS)
GWAS was performed with the R package Genomic Association and Prediction Integrated Tool (GAPIT, version 3) using the mixed linear model (MLM) [29]. Principal component analysis (PCA = 4) was used to account for population structure and fitted as fixed effect [30]. The markers with an adjusted -log10 (P) ≥ 6.0 were regarded as significant for each trait. A high-quality drawing tool was used to create Manhattan plots of the genomic analysis with the R package CMplot (https://github.com/YinLiLin/R-CMplot, accessed on 30 November 2021).

KASP Markers Development and Validation
All SNP probe sequences on the wheat 660K SNP Array were used in a BLASTN search (e-value cut-off of 1 × 10 −5 ) against the wheat reference sequence (IWGSC RefSeq v1), and the flanking 100 bp sequences near the SNP position were extracted for primer design. The KASP TM primer set containing two allele-specific forward primers and one common reverse primer (Table S4) was designed by LGC Biosearch Technologies, Hoddesdon, UK. The primer set was synthesized with a PAGE purification method by Sangon Biotech (Shanghai, China). KASP assays were performed with 384-plate format 5 µL reactions (2.5 µL of template DNA (25 ng), 2.5 µL of 2 × Kaspar mix, and 0.07 µL of primer mix). A FluOstar Omega Microplate Reader was used to read the fluorescence signals.

Phenotype Variation and Trait Correlation Analysis
A panel of 255 pre-breeding lines derived from the cross between common wheat and A.cristatum and 4 elite commercial varieties were collected for agronomic trait investigation. To avoid the effects of environmental error on phenotype traits, 2-year replicates were conducted. Among the 12 investigated traits, PH, GNPS, total spikelet number, spike length, spikelet density, KNS and TKW showed wide variation ranges, whereas Hd showed the smallest variation. The maximum coefficient of variation (CV) was the 22.72% for the GNPS trait. Broad-sense heritability (H 2 ) of plant height showed the highest value, and the H 2 of GNPS was up to 0.78, ranking third ( Table 1). The panel of the collected pre-breeding lines contained the line Pubing 3228 with prominently high GNPS features [23]. The correlation of 12 traits showed that GNPS is highly correlated with fertile spikelet numbers and KNS ( Figure 1A), which suggests that the high GNPS trait in our pre-breeding lines was controlled both by kernel number per spikelet and spikelet number. The distribution figure shows that the GNPS trait of 259 lines varied widely in both 2-year replicates and exhibited a skewed distribution ( Figure 1B), which suggests that major genes are responsible for the high GNPS trait and the panel is suitable for GWAS.

Estimation of Population Structure
To reveal the genetic relationship of all pre-breeding lines, estimate the population structure and identify the number of subgroups, two methods, including phylogenetic tree and population structure analyses, were used. An SNP-based phylogenetic tree (with 310,097 SNP markers distributed throughout the whole genome of wheat) divided all accessions into 10 major groups, including a breeding line group and 9 other prebreeding lines groups (Figure 2A). The admixture structure showed identical results to the phylogenetic tree ( Figure 2B). The principal component analysis (PCA) of all accessions was consistent with the phylogeny and the population structure ( Figure 2C). We found that the population structure of 259 pre-breeding panels showed the perfect fit with the pedigree relationship. In our pre-breeding lines, the Pubing01 group was the largest group, containing 80 lines, in which the commercial varieties Zhoumai 18 and Zhoumai 27 belonged to the same group. The Pubing01 group can be further divided into eight subgroups. Pubing02 and Pubing04 were the second largest groups, which contained 48 lines. Pubing03, Pubing05, Pubing06, Pubing07, Pubing08 and Pubing09 were divided into relatively independent groups. The commercial variety Jimai 22 is the same group as from Japan (receipt cultivar of a wide cross between wheat and A. cristatum) showed rich genetic variations, which means that the breeding lines carry more complex genetic backgrounds than pre-breeding lines. According to the pedigree, the Pubing07, Pubing08 and Pubing09 group lines belonging to a relatively separate clade are early generation pre-breeding lines, and their spike traits show prominently high GNPS. As a result, genetic relationships revealed that our pre-breeding lines show wide variation and their genetic backgrounds are also diversified for association mapping studies.

Estimation of Population Structure
To reveal the genetic relationship of all pre-breeding lines, estimate the population structure and identify the number of subgroups, two methods, including phylogenetic tree and population structure analyses, were used. An SNP-based phylogenetic tree (with 310097 SNP markers distributed throughout the whole genome of wheat) divided all accessions into 10 major groups, including a breeding line group and 9 other pre-breeding lines groups (Figure 2A). The admixture structure showed identical results to the phylogenetic tree ( Figure 2B). The principal component analysis (PCA) of all accessions was consistent with the phylogeny and the population structure ( Figure 2C). We found that the population structure of 259 pre-breeding panels showed the perfect fit with the pedigree relationship. In our pre-breeding lines, the Pubing01 group was the largest group, containing 80 lines, in which the commercial varieties Zhoumai 18 and Zhoumai 27 belonged to the same group. The Pubing01 group can be further divided into eight subgroups. Pubing02 and Pubing04 were the second largest groups, which contained 48 lines. Pubing03, Pubing05, Pubing06, Pubing07, Pubing08 and Pubing09 were divided into relatively independent groups. The commercial variety Jimai 22 is the same group as Pub-ing06. Ten breeding lines (Kenong 2011, PZW 9, Pubing 176, Pubing 143, Pubing 9946, Pubing 3228, Pubing 3882, Jinmai 80, Pubing 701 and Pubing 717) and Fukuho introduced from Japan (receipt cultivar of a wide cross between wheat and A. cristatum) showed rich genetic variations, which means that the breeding lines carry more complex genetic backgrounds than pre-breeding lines. According to the pedigree, the Pubing07, Pubing08 and Pubing09 group lines belonging to a relatively separate clade are early generation prebreeding lines, and their spike traits show prominently high GNPS. As a result, genetic relationships revealed that our pre-breeding lines show wide variation and their genetic backgrounds are also diversified for association mapping studies.

GWAS Identify Genetic Loci Associated with Spike-Related Traits
To identify the genetic loci controlling the GNPS trait, a GWAS was completed using 259 accessions and whole-genome SNP markers combined with 2 years of phenotypic data. After filtering out low-quality SNP markers, a total of 310,097 SNPs were retained. These SNPs were distributed over each of the 21 chromosomes ( Figure S1). Marker density varied among chromosomes with a minimum of 4.53 markers per Mb on chromosome 4D and a maximum of 41.02 markers per Mb on chromosome 5B. Principal component analysis (PCA) was performed with GAPIT software. We conducted a Bayesian information criterion (BIC)-based model selection to find the optimal number of PCs for the GWAS. The scree plot shows that the eigenvalues start to form a relatively straight line after the fourth principal component ( Figure S2). Therefore, we selected PCA = 4 for optimum population structure estimation in the GWAS. In total, 477 marker-trait associations (MTAs) were detected for 12 traits across all the environments (Table S5). Finally, we identified 3 significant QTL regions on wheat chromosomes 1AL, 4BS and 5AL (Figures 3 and S3). SNP markers AX-109302049 and AX-108839971 flanking the 1AL locus in the physical region from 583.3 M to 584.7 M were tightly associated with both GNPS and KNS with p-values of 1.13 × 10 −7 and 3.93 × 10 −6 , respectively. Another major locus on chromosome 4BS delimited in the region between markers AX-109286577 and AX-94438527 was associated with GNPS and KNS traits, which suggests that the genomic region exhibits pleiotropic action. Moreover, we identified a locus on chromosome 5AL flanked by the marker AX-111514840 at the physical position 537.133 Mb, which has pleiotropic effects on spike length, spikelet number, and spikelet density. The reported Q gene on chromosome 5AL was an APETALA2 (AP2)-like gene controlling floral homeotic gene expression and free-threshing character, but its position was located at the position of 650.127 Mb [31]. Hence, the spikelet density locus on 5AL in this study is considered to be an independent locus different from Q gene.

GWAS Identify Genetic Loci Associated with Spike-Related Traits
To identify the genetic loci controlling the GNPS trait, a GWAS was completed using 259 accessions and whole-genome SNP markers combined with 2 years of phenotypic data. After filtering out low-quality SNP markers, a total of 310,097 SNPs were retained. These SNPs were distributed over each of the 21 chromosomes ( Figure S1). Marker density varied among chromosomes with a minimum of 4.53 markers per Mb on chromosome 4D and a maximum of 41.02 markers per Mb on chromosome 5B. Principal component analysis (PCA) was performed with GAPIT software. We conducted a Bayesian information criterion (BIC)-based model selection to find the optimal number of PCs for the GWAS. The scree plot shows that the eigenvalues start to form a relatively straight line after the fourth principal component ( Figure S2). Therefore, we selected PCA = 4 for optimum population structure estimation in the GWAS. In total, 477 marker-trait associations (MTAs) were detected for 12 traits across all the environments (Table S5). Finally, we identified 3 significant QTL regions on wheat chromosomes 1AL, 4BS and 5AL (Figures 3 and S3). SNP markers AX-109302049 and AX-108839971 flanking the 1AL locus in the physical region from 583.3 M to 584.7 M were tightly associated with both GNPS and KNS with pvalues of 1.13 × 10 −7 and 3.93 × 10 −6 , respectively. Another major locus on chromosome 4BS delimited in the region between markers AX-109286577 and AX-94438527 was associated with GNPS and KNS traits, which suggests that the genomic region exhibits pleiotropic action. Moreover, we identified a locus on chromosome 5AL flanked by the marker AX-111514840 at the physical position 537.133 Mb, which has pleiotropic effects on spike length, spikelet number, and spikelet density. The reported Q gene on chromosome 5AL

Chromosome 4BS Genomic Region Confers the Grain Number per Spikelet Trait
To investigate the effect of these loci on grain number per spike, we compared GNPS and KNS according to the SNP marker haplotype. Finally, we found that the locus on 4BS was significantly associated with GNPS and KNS. By analyzing the phenotypic variation of SNP allele AX-109286577 A/T in the 4BS genomic region, we found that GNPS and KNS were significantly higher with haplotype A allele than those with the haplotype T allele in both 2-year replicates. The average GNPS in the haplotype A allele group was 83.5 and 69.1 in the 2 years, while those of the haplotype T allele were 51.7 and 51. 6, respectively. The GNPS of the haplotype A allele was 61.4% and 34% higher than that of the haplotype T allele in 2016-2017 and 2017-2018, respectively. Moreover, the KNS of the haplotype A allele was 45.3% and 21.1% higher than that of the haplotype T allele in 2016-2017 and 2017-2018, respectively ( Figure 4A,B). Spike morphology showed that a typical pre-breeding line L711 with the pedigree of Pubing4201/CMH83.605//FC_big_spike/3/Xiaoyan6 exhibited the large spike characteristic compared with the commercial varieties Jimai22 and Zhoumai18 ( Figure 4D). Hence, the 2.76 Mb region on chromosome 4BS from 26.49 Mb to 29.25 Mb was verified to carry the genes for increasing KNS and GNPS of wheat. Our results suggest that the AX-109286577-A SNP haplotype is a good allele type for large spike traits. In view of the fact that the Rth-B1 gene TraesCS4B02G043100 is located on the 30.861-30.863 Mb region of Chinese Spring Refv1.0 chr4B [30], we analyzed the possible relationship between our identified locus and Rht-B1. The phenotypic correlation indicated that a weak positive correlation coefficient (r = 0.19) was found between the PH and GNPS trait (Figure 1). The average plant height of high GNPS groups is significantly higher than that of low GNPS groups and no significant PH allele were associated with the 2.76 Mb region on chromosome 4BS from 26.49 Mb to 29.25 Mb (Table S5), which means the locus confer grain number improvement is independent of the Rth-B1 gene. was an APETALA2 (AP2)-like gene controlling floral homeotic gene expression and freethreshing character, but its position was located at the position of 650.127 Mb [31]. Hence, the spikelet density locus on 5AL in this study is considered to be an independent locus different from Q gene.

Chromosome 4BS Genomic Region Confers the Grain Number per Spikelet Trait
To investigate the effect of these loci on grain number per spike, we compared GNPS and KNS according to the SNP marker haplotype. Finally, we found that the locus on 4BS was significantly associated with GNPS and KNS. By analyzing the phenotypic variation of SNP allele AX-109286577 A/T in the 4BS genomic region, we found that GNPS and KNS were significantly higher with haplotype A allele than those with the haplotype T allele in both 2-year replicates. The average GNPS in the haplotype A allele group was 83.5 and 69.1 in the 2 years, while those of the haplotype T allele were 51.7 and 51. 6, respectively. The GNPS of the haplotype A allele was 61.4% and 34% higher than that of the haplotype T allele in 2016-2017 and 2017-2018, respectively. Moreover, the KNS of the haplotype A allele was 45.3% and 21.1% higher than that of the haplotype T allele in 2016-2017 and 2017-2018, respectively ( Figure 4A,B). Spike morphology showed that a typical pre-breeding line L711 with the pedigree of Pubing4201/CMH83.605//FC_big_spike/3/Xiaoyan6 exhibited the large spike characteristic compared with the commercial varieties Jimai22 and To trace and utilize the genetic loci near the AX-109286577 SNP, we developed the KASP diagnostic marker for marker-assisted selection breeding. As shown in Figure 4C, we successfully developed a high-resolution KASP marker that can distinguish the allele AX-109286577 SNP into two obvious groups, i.e., the homozygous allele AA type and TT type. Using the AX-109286577 KASP marker, we detected other materials of our pre-breeding lines, including early generation parents and derivative breeding lines. We found that the AX-109286577 allele A with high GNPS and KNS genetic effects could only be detected in our early generation parent lines, including Pubing3228 and Pubing3504, whereas the AX-109286577 allele T existed in our present breeding lines and commercial varieties (Table S6). The results suggested that the locus with high GNPS and KNS alleles on 4BS is an untapped rare locus for grain number improvement programs. relationship between our identified locus and Rht-B1. The phenotypic correlation indicated that a weak positive correlation coefficient (r = 0.19) was found between the PH and GNPS trait (Figure 1). The average plant height of high GNPS groups is significantly higher than that of low GNPS groups and no significant PH allele were associated with the 2.76 Mb region on chromosome 4BS from 26.49 Mb to 29.25 Mb (Table S5), which means the locus confer grain number improvement is independent of the Rth-B1 gene.  (Table S5). Scale bar = 2 cm.  (Table S5). Scale bar = 2 cm.

Identification of Candidate Genes and Their Expression
To identify the candidate genes in the 4BS chromosome region for high GNPS and KNS traits, we searched the annotated genes based on Chinese Spring IWGSC RefSeq v1.1 reference genome annotation [32]. From the 26.49 Mb to 28.95 Mb physical regions of chromosome 4B, a total of 56 high-confidence genes were annotated in the Chinese Spring reference genome (Table S7). Combined with the expression data of the published transcriptome, we found that two genes near the Manhattan plot p-value peak (26.49 Mb position, p-value = 6.06 × 10 −7 ) were highly expressed in spike tissues. The two genes are TraesCS4B02G037000 (zinc finger CCCH domain-containing protein, i.e., TraesCS4B03G0080800 in Chinese Spring IWGSC RefSeq v2.1) and TraesCS4B02G037200 (WD-repeat protein, i.e., TraesCS4B03G0081100 in Chinese Spring IWGSC RefSeq v2.1). Their expression in different tissues is shown in Figure S4. According to the WheatExp database, we found that the TraesCS4B02G037200 gene was highly expressed in spike tissue Agronomy 2022, 12, 171 9 of 12 and showed the highest expression level on chromosome 4B compared to chromosomes 4A and 4D. The detailed functional annotation suggests that the TraesCS4B02G037200 gene encodes a WD-repeat protein with high homology to the maize cell division cycle protein cdt2 and the barley protein HORVU4Hr1G005810.7, which suggests that the TraesCS4B01G037200 gene is a candidate gene for increasing the GNPS and KNS traits.

Discussion
The increase in GNPS, an important yield component, will be of widespread concern in future yield improvements. It has been reported that spikelet fertility has increased during recent decades, especially in the European lines, and spikelet fertility is suggested as an essential driver of grain yield progress in wheat [22]. However, studies on the increase in floret set number are limited. Here, we reported that a panel of pre-breeding derivatives derived from the cross between high GNPS germplasm and diverse elite varieties carry prominent GNPS and KNS phenotypes. The correlation coefficient was as high as 0.81 between GNPS and KNS, which means that grain number per spike in our pre-breeding lines is mainly contributed by grain set number per spikelet and floret fertility. Our study showed the same feature as our previous results, in which GNPS showed significant positive correlations with KNS in the Pubing 3504 biparent population [33]. Additionally, GNPS in the Pubing 3228 and Pubing 3504 germplasms is independently inherited and freely recombined with the expected fertile tiller number and grain weight, which suggests that our high GNPS pre-breeding lines are suitable as breeding parents for the improvement of yield potential [23]. Guo et al. (2016) found that the post-anthesis process of grain set/abortion was important in determining genotypic variation in KNS, and an increase in KNS was mainly associated with improved grain survival [34]. Based on our observation of the developmental dynamic anatomy, good floret fertility after the anthesis process can also explain the high KNS phenotype in our pre-breeding lines. As a result, this study provides us with many useful pre-breeding lines for increasing the GNPS yield component.
Elucidation of the genetic loci controlling the GNPS and KNS in our pre-breeding lines is necessary. This study reported that a major locus on 4BS controlled both GNPS and KNS traits in our pre-breeding lines. According to the Chinese spring v1.1 genome reference, the approximately 2.76 Mb region on chromosome 4BS from 26.49 Mb to 29.25 Mb contains 56 high-confidence genes. To accurately detect the elite haplotype of the GNPSrelated genetic loci, we further developed a diagnostic KASP marker in the target genomic region of chromosome 4BS. The marker AX-109286577 will be useful for genomic selection breeding of the KNS trait in early offspring. Finally, we narrowed down the gene TraesCS4B02G037200, which was highly expressed in the spike tissues as a candidate gene. Moreover, we found that the elite haplotype on chromosome 4BS was only detected in our pre-breeding lines and not in commercial varieties, which means that the loci are still rare for a large number of wheat varieties. In the distal region of the short arm of chromosome 4B, another team also conducted an association mapping study of spike-related traits by using a panel of 768 wheat cultivars and genotyping-by-sequencing technology. They identified a genomic region on 4BS spanning from 28.7 to 44.4 Mb near the Rht-B1 locus and two candidate genes, TraesCS4B02G049100 encoding RNA-binding protein 34 and TraesCS4B02G049300 encoding a GPN-loop GTPase-like protein [17]. It is reported that the semi-dwarfing Rht-B1b gene reduced plant height and increased grain number and yield [35,36]. Our result showed that the locus for GNPS identified in our large-spike pre-breeding lines was independent of the dwarfing Rht-B1b gene. The difference between the two loci on 4BS suggests that the different genes take part in the process of grain number per spike. Overall, our results proved that the rare allele variation in our pre-breeding lines can be used for wheat improvement of yield potential, especially in the yield component of grain number per spike.

Supplementary Materials:
The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/agronomy12010171/s1, Figure S1: The distribution of SNPs with minor allele frequencies larger than 0.05 and missing rates ≤ 90%. Figure S2: The screen plot of principal components was used to find the optimal number of PCs for the GWAS. The eigenvalues start to form a relatively straight line after the fourth principal component. Figure S3: Manhattan plots for spike-related traits including spikelet density (A), spike length (B) and total spikelet number per spike (C) from the GWAS. The arrowhead indicates the peak signal containing the candidate pleiotropic genes. Figure S4: The expression profiles of TraesCS4B02G037000 and TraesCS4B02G037200 in different tissues and organs of wheat. The expression data of wheat genes were collected from the WheatExp database (https://wheat.pw.usda.gov/WheatExp/, accessed on 30 November 2021). Table S1: Pedigree information of a panel of 255 derivative lines and 4 elite commercial varieties. Table S2: The VCF file of 310,097 SNPs in all materials after removing nucleotide variations with missing rates ≥ 0.1 and minor allele frequency (<0.05). Table S3: All phenotype data were provided including 2-year replicates. Table S4: KASP primer sequences used in this study. Table  S5: The list of marker-trait associations (MTAs) for 12 investigated traits across all the environments. Table S6: Other pre-breeding lines, including early generation lines and derivative breeding lines, were chosen for detecting the AX-109286577 SNP genotype on chromosome 4BS.