Dissection of the Genetic Basis of Yield Traits in Line per se and Testcross Populations and Identification of Candidate Genes for Hybrid Performance in Maize

Dissecting the genetic basis of yield traits in hybrid populations and identifying the candidate genes are important for molecular crop breeding. In this study, a BC1F3:4 population, the line per se (LPS) population, was constructed by using elite inbred lines Zheng58 and PH4CV as the parental lines. The population was genotyped with 55,000 SNPs and testcrossed to Chang7-2 and PH6WC (two testers) to construct two testcross (TC) populations. The three populations were evaluated for hundred kernel weight (HKW) and yield per plant (YPP) in multiple environments. Marker–trait association analysis (MTA) identified 24 to 151 significant SNPs in the three populations. Comparison of the significant SNPs identified common and specific quantitative trait locus/loci (QTL) in the LPS and TC populations. Genetic feature analysis of these significant SNPs proved that these SNPs were associated with the tested traits and could be used to predict trait performance of both LPS and TC populations. RNA-seq analysis was performed using maize hybrid varieties and their parental lines, and differentially expressed genes (DEGs) between hybrid varieties and parental lines were identified. Comparison of the chromosome positions of DEGs with those of significant SNPs detected in the TC population identified potential candidate genes that might be related to hybrid performance. Combining RNA-seq analysis and MTA results identified candidate genes for hybrid performance, providing information that could be useful for maize hybrid breeding.


Introduction
With the increase in human population, it is expected that the production of staple food crops should be doubled to feed the growing population by 2050 [1]. Hybrid crop breeding can increase crop yield and meet human demand for food. As a successful example of hybrid breeding, hybrid maize has played an important role in maize yield increase in the last century [2]. To further increase the yield of hybrid maize by using molecular design breeding, it is necessary to dissect the genetic and molecular mechanism of hybrid performance in hybrid populations.
In maize, hybrid breeding generally requires selection of breeding materials according to line per se (LPS) and testcross (TC) performance [3]. However, LPS performance can only indirectly assess TC performance [4], and the accuracy of selection based on LPS performance depends on the relationship between the LPS and TC population. The genomic differences between an LPS population and its TC counterpart are the replacement of one half

Phenotypic Data of the Three Populations
The PH6WC TC population had the highest HKW and YPP across all the environments ( Figure 1, Table 1), followed by the Chang7-2 TC population, suggesting that PH6WC generally had better combining ability with the LPS population than Chang7-2. Both traits of the LPS population had a higher coefficient of variance and broad-sense heritability, indicating that the genetic variation was greater and more stable than that of the two TC populations. Furthermore, HKW had higher heritability than YPP, indicating that HKW was more stable than YPP across environments (Table 1). The correlation of either HKW or YPP across environments was generally significant for each population ( Figure S1), suggesting that the genetic basis plays a major role in determining the two traits across environments. SD-standard deviation, N-population size, CV-coefficient of variance, H 2 -broad-sense heritability.

Genotypic Data Analysis and Genetic Dissection of Yield Traits of the Three Populations
In total, 15,386 SNPs were obtained after genotype processing, and these SNPs were distributed evenly across the maize physical map. The number of SNPs ranged from 1056 on chromosome 10 to 2477 on chromosome 1, and the SNP density ranged from 6.62 SNP/Mb on chromosome 2 to 8.80 on chromosome 9, with a mean density of 7.5 SNP/Mb (Table S1). The SNP density was sufficiently high for MTA analysis [17]. MTA found that the numbers of significant SNPs for HKW were 24, 30 and 121 in the LPS population, Chang7-2 and PH6WC TC populations, respectively (Table S2)

Genotypic Data Analysis and Genetic Dissection of Yield Traits of the Three Populations
In total, 15,386 SNPs were obtained after genotype processing, and these SNPs were distributed evenly across the maize physical map. The number of SNPs ranged from 1056 on chromosome 10 to 2477 on chromosome 1, and the SNP density ranged from 6.62 SNP/Mb on chromosome 2 to 8.80 on chromosome 9, with a mean density of 7.5 SNP/Mb (Table S1). The SNP density was sufficiently high for MTA analysis [17]. MTA found that the numbers of significant SNPs for HKW were 24, 30 and 121 in the LPS population, Chang7-2 and PH6WC TC populations, respectively (Table S2)  the density of significant SNPs within a 1 Mb interval; red asterisk indicates common QTL between LPS and Chang7-2 TC populations; blue asterisk indicates common QTL between LPS and PH6WC TC populations; green asterisk indicates common QTL between Chang7-2 TC and PH6WC TC populations; dark asterisk indicates common QTL among LPS, Chang7-2 TC and PH6WC TC populations. (b-d) PVE of the large-effect SNPs controlling HKW of the LPS, Chang7-2 and PH6WC TC populations, respectively. (e-g) PVE of the large-effect SNPs controlling YPP of the LPS, Chang7-2 and PH6WC TC populations, respectively. In (b-g), only SNPs with PVE larger than 1% are shown; PVE_LPS, PVE_Chang7-2 and PVE_PH6WC indicate phenotypic variance explained by the significant SNPs in the LPS, Chang7-2 and PH6WC TC populations, respectively.

Genetic Features of the Significant SNPs
We calculated the cumulative effects of favorable genotypes of the significant SNPs. The correlation coefficients between the number of favorable genotypes and HKW were 0.28, 0.

Genetic Features of the Significant SNPs
We calculated the cumulative effects of favorable genotypes of the significant SNPs. The correlation coefficients between the number of favorable genotypes and HKW were 0.28, 0.49 and 0.36 for the LPS, Chang7-2 and PH6WC TC populations, respectively ( Figure 3a). The correlation coefficients between the number of favorable genotypes and YPP were 0.30, 0.22 and 0.38 for the LPS, Chang7-2 and PH6WC TC populations, respectively ( Figure 3b). The strong positive correlation indicates that the LPS performance and TC performance increased with the accumulation of favorable genotypes, which further proved the reliability of the MTA results.
To test the effect of these significant SNPs in predicting LPS and TC performance, we performed genomic prediction (GP) and marker-assisted selection (MAS) analysis. The analysis revealed that the prediction accuracies (PAs) of GP models were larger than those of the MAS.Sig model (Figure 4a-c), suggesting that some genetic factors were not identified due to the problem of false negatives for both traits in each population. We also found that the PAs of the MAS.Sig model were larger than those of MAS.Random model (Figure 4d-f) for each trait in each population, further proving that the significant SNPs were in linkage disequilibrium with the genes controlling the tested traits. To test the effect of these significant SNPs in predicting LPS and TC performance, we performed genomic prediction (GP) and marker-assisted selection (MAS) analysis. The analysis revealed that the prediction accuracies (PAs) of GP models were larger than those of the MAS.Sig model (Figure 4a-c), suggesting that some genetic factors were not identified due to the problem of false negatives for both traits in each population. We also found that the PAs of the MAS.Sig model were larger than those of MAS.Random model ( Figure  4d-f) for each trait in each population, further proving that the significant SNPs were in linkage disequilibrium with the genes controlling the tested traits.

Identification of Common QTLs between LPS and TC Populations
Given the significant correlation between the tested traits of the LPS population and the TC population (Figure 5a), we considered that there should be common QTLs controlling LPS and TC performance. To prove this hypothesis, we examined whether the signif-

Identification of Common QTLs between LPS and TC Populations
Given the significant correlation between the tested traits of the LPS population and the TC population (Figure 5a), we considered that there should be common QTLs controlling LPS and TC performance. To prove this hypothesis, we examined whether the significant SNPs of the LPS population take effect in the TC populations. The results showed that the significant SNPs for HKW of the LPS population explained 25.82% and 12.24% of phenotypic variance in the Chang7-2 and PH6WC TC populations, respectively ( Figure 5b). Furthermore, the significant SNPs controlling YPP of the LPS population explained 16.58% and 16.51% of phenotypic variance in the Chang7-2 and PH6WC TC populations, respectively (Figure 5b). The analysis proved that the significant SNPs controlling LPS performance also controlled TC performance.
the LPS (d), Chang7-2 TC (e) and PH6WC TC (f) populations. GP indicates GP models usi SNPs; MAS.Sig indicates MAS models using the significant SNPs; MAS.Random indicates models using the same number of randomly selected SNPs. P values indicate significant lev Wilcox.test.

Identification of Common QTLs between LPS and TC Populations
Given the significant correlation between the tested traits of the LPS population the TC population (Figure 5a), we considered that there should be common QTLs con ling LPS and TC performance. To prove this hypothesis, we examined whether the si icant SNPs of the LPS population take effect in the TC populations. The results sho that the significant SNPs for HKW of the LPS population explained 25.82% and 12.24 phenotypic variance in the Chang7-2 and PH6WC TC populations, respectively (F 5b). Furthermore, the significant SNPs controlling YPP of the LPS population expla 16.58% and 16.51% of phenotypic variance in the Chang7-2 and PH6WC TC populat respectively (Figure 5b). The analysis proved that the significant SNPs controlling performance also controlled TC performance. We further compared the locations of significant SNPs detected for each trait. Two common QTLs on chromosomes 4 and 6 were associated with HKW of the LPS, Chang7-2 and PH6WC TC populations. In addition, one QTL at the end of chromosome 6 was commonly detected in LPS and Chang7-2 TC populations; three common QTLs on chromosomes 2, 3 and 9 were identified in LPS and PH6WC TC populations; and two common QTLs on chromosomes 1 and 7 were identified between Chang7-2 TC and PH6WC TC populations ( Figure 2a, Table S2). For YPP, one common QTL on chromosome 1 was detected in LPS and Chang7-2 TC populations, one common QTL on chromosome 2 was detected in LPS and PH6WC TC populations, and two common QTLs on chromosomes 1 and 2 were detected in Chang7-2 and PH6WC TC populations (Figure 2a). The analysis revealed that there were common QTLs between each pair of the three populations, reflecting their strong phenotypic correlations (Figure 5a).

RNA-seq Analysis Identified the Candidate Genes in the Surrounding Region of the Significant SNPs
To find the candidate genes associated with hybrid performance of Chang7-2 TC lines, we found common DEGs between ZD958 and each of its parents (Zheng58 and Chang7-2) and compared the locations of these DEGs with those of significant SNPs. RNA-seq analysis identified 4593 common DEGs (Figure 6a). According to the candidate genes found in our previous article, the orthologs of 57 and 102 DEGs (Table S3) were related to the control of kernel weight and yield, respectively [18]. Among the 57 DEGs, the locations of nine DEGs were close to four HKW QTLs detected in the Chang7-2 TC population (Figure 6a, Table S3), including GRMZM2G159456, GRMZM2G399072, GRMZM2G445634, GRMZM2G420357, GRMZM2G034876, GRMZM2G092749, GRMZM2G059939, GRMZM2G328988 and GR-MZM2G034647. Meanwhile, five of the 102 DEGs were found in the surrounding regions of one YPP QTL detected in the Chang7-2 TC population, including GRMZM2G095968, GR-MZM2G159456, GRMZM2G399072, GRMZM2G445634 and GRMZM2G420357 (Figure 6a, Table S3). the locations of nine DEGs were close to four HKW QTLs detected in the Chang7-2 TC population (Figure 6a, Table S3), including GRMZM2G159456, GRMZM2G399072, GRMZM2G445634, GRMZM2G420357, GRMZM2G034876, GRMZM2G092749, GRMZM2G059939, GRMZM2G328988 and GRMZM2G034647. Meanwhile, five of the 102 DEGs were found in the surrounding regions of one YPP QTL detected in the Chang7-2 TC population, including GRMZM2G095968, GRMZM2G159456, GRMZM2G399072, GRMZM2G445634 and GRMZM2G420357 (Figure 6a, Table S3). Figure 6. Co-localization of significant SNPs and DEGs identified candidate genes for hybrid performance. (a) Co-localization of significant SNPs detected in the Chang7-2 TC population and the DEGs between ZD958 and its parental lines (Zheng58 and Chang7-2). The five circles, from inner to outer, indicate the location of common DEGs identified between ZD958 and each of its parental lines, the MTA results for YPP in the Chang7-2 TC population (red points indicate significant SNPs), the MTA results for HKW in the Chang7-2 TC population (blue points indicate significant SNPs), SNP density heatmap and the candidate genes in the surrounding regions of significant SNPs, and maize chromosomes. (b) Co-localization of significant SNPs detected in the PH6WC TC population and the DEGs identified between XY335 and its parental lines (PH6WC and PH4CV). The five circles, from inner to outer, indicate the location of common DEGs between XY958 and each of its parental lines, the MTA results for YPP in the PH6WC TC population (red points indicate significant SNPs), the MTA results for HK in the PH6WC TC population (blue points indicate significant SNPs), SNP density heatmap and the candidate genes in the surrounding regions of significant SNPs, and maize chromosomes.
To find the candidate genes in the surrounding regions of the significant SNPs detected in the PH6WC TC population, we first found 1801 DEGs that were commonly detected between XY335 and each of its parents. Among the 1801 DEGs, 12 and 20 DEGs (Table S4) were related to the control of yield traits [18]. Among the 12 DEGs, two candidate genes (GRMZM2G007288 and GRMZM5G875502) were found in the surrounding regions of two HKW QTLs detected in the PH6WC TC population. Meanwhile, three DEGs (GR-MZM2G050305, GRMZM2G034876 and GRMZM2G463904) were found in the surrounding regions of two YPP QTLs detected in the PH6WC TC population (Figure 6b, Table S4). The genes mentioned in this section could be considered candidate genes for the QTLs controlling HKW and YPP.

Discussion
Maize has rich genetic diversity and rapid linkage disequilibrium, and MTA analysis of various traits has been performed in maize [19][20][21][22]. Many methods have been developed to increase the calculation speed and statistical power of MTA, such as the general linear model (GLM), mixed linear model (MLM), etc., but most of them only work for inbred line populations [23]. At present, there are only a few published MTA methods suitable for performing MTA analysis in hybrid populations, including EMMAX software, which uses different encoding schemes to discriminate additive, dominant and recessive effects [10]; the linear mixed model, which fits additive and dominant effects [24]; and the PEPIS pipeline, which contains all genetic effects [12]. However, various genetic effects (additive, dominant and recessive) are dissected in the former two methods, which complicates the results of MTA analysis. Moreover, the epistasis effect is not considered in the former two methods. The PEPIS pipeline comprehensively dissects the main effect and calculates the LRT values of each SNP in a user-friendly manner. Therefore, PEPIS was used for MTA analysis in this study.
Identification of QTLs and candidate genes controlling agronomic traits is the basis for developing functional markers and molecular design breeding in maize [25,26]. Although many QTLs have been identified using family-based QTL mapping or association mapping [18,27], the QTLs or genes identified using family-based or homozygous lines are different from those detected using hybrid populations [26,28], and most of these QTLs were not proven functional in hybrid lines. In this study, we not only detected common QTLs in the LPS and TC populations but also detected specific QTLs in TC populations. The results showed that the genetic basis of LPS and TC populations is not completely different. The two common QTLs for HKW on chromosomes 4 and 6 with effects in all the three populations required further investigation. Although no common QTLs were detected in the three populations for YPP, there are still common QTLs shared between at least two populations, indicating that YPP might have a complex genetic basis [28]. These common QTLs indicated that manipulating QTLs in the LPS population could increase the yield traits of the hybrid population. Additionally, because the four parental lines of the tested populations were parents of the two most popular hybrid varieties in China, the detected QTLs for yield-related traits could explain why the two hybrid varieties are high-yield and popular in China.
Compared with the results of previous studies, it was found that the significant SNPs Chr3_104753320 (on chromosome 3) and Chr4_9699802 (on chromosome 4) associated with HKW were detected in the LPS and TC populations, respectively, which coincides with the results of a previous study [29]. A significant locus, Chr2_130338518, associated with YPP was detected in the LPS population, and this locus was also detected in an RIL population [29]. Because research on genetic mapping is scarce in maize hybrid populations, most QTL detected in this study were specific. This study has some limitations; although some significant epistatic QTLs (additive × additive, additive × dominant and dominant × dominant) were associated with hybrid performance [29][30][31], we only considered the main effect and did not identify their modes of inheritance [12]. However, this drawback did not influence the identification of common QTLs in LPS and TC populations or the process of finding candidate genes by colocalization.
GP has been proven as a reliable method for predicting both LPS and TC performance [32,33]. Because GP relies on the genetic basis of the population [17,34], the distance between the molecular markers and the QTLs of the target traits could influence the PAs of GP models. In this study, we used GP models to prove that the significant SNPs are reliable because the PAs of GP models fitting significant SNPs were larger than those fitting random SNPs [32]. However, the PAs of GP models fitting significant SNPs were lower than those of GP models fitting genome-wide SNPs, indicating that some QTLs were not detected in each population, which might be related to the high threshold level used in MTA analysis. Furthermore, the realness of the detected QTLs was also supported by the results that both HKW and YPP increased with increased favorable genotypes.
RNA sequencing analysis has been used to dissect the genetic basis of crop traits in combination with genetic methods such as association mapping and linkage-based QTL mapping [15,16]. In this study, in order to find candidate genes underlying hybrid performance, we identified the DEGs between F1 and its parental lines and compared the locations of these DEGs with those of the significant SNPs. The DEGs between F1 and the parental lines might be candidate genes, especially the 14 candidate genes for yield traits (Figure 6b, Tables S3 and S4). Among the 14 candidate genes for kernel weight, GRMZM2G159456, GRMZM2G399072, GRMZM2G445634, GRMZM2G034876, GRMZM2G007288 and GRMZM5G875502 are orthologous genes of rice BU1 [35], SNB [36], TIFY [37], GL2 [38], GW2 [39] and OsiEZ1 [40] genes, respectively, which have been reported to regulate rice seed weight; GRMZM2G420357, GRMZM2G092749, GRMZM2G059939 and GRMZM2G034647 are orthologs of Arabidopsis IKU1 [41], FERONIA [42], DPA4 [43] and CYCB1:4 [44] genes, respectively, and these genes are associated with Arabidopsis seed weight. Furthermore, GRMZM2G328988, GRMZM2G463904, GRMZM2G095968 and GRMZM2G050305 are orthologous to UPL3 in oilseed rape [45], to RLK7 in maize [46], to IbEXP1 in sweet potato [47] and to GmMYB14 in soybean [48], respectively. These genes were also reported to be associated with seed weight or overall yield. Therefore, the 14 DEGs for kernel weight might be candidate genes because they were close to the positions of the significant SNPs identified in the TC populations.

Population Construction, Phenotype Evaluation and Phenotypic Data Analysis
Four elite inbred lines were used in this study, including Zheng58, Chang7-2, PH6WC and PH4CV. Zheng58 and Chang7-2 were the female and male parents of ZD958, respectively. PH6WC and PH4CV were the female and male parents of XY335, respectively. ZD958 and XY335 are popular hybrid varieties in China [49,50]. The 481 BC1F3:4 families were introduced in detail in [32]. Briefly, PH4CV was used as the recurrent parent, and Zheng58 was used as the donor parent to develop a BC1F3 population, which was self-pollinated to develop the BC1F3:4 families. The BC1F3:4 families were defined as an LPS population. The 481 BC1F3 plants testcrossed to Chang7-2 and PH6WC in the winter of 2015 in Sanya (Hainan province), producing Chang7-2 and PH6WC TC populations, respectively. These materials were frequently used in maize genetic improvement research (https://maizedata.cn/, accessed on 3 October 2021) [33,51,52]. The materials and populations were provided by the molecular genetic improvement group of the Institute of Crop Sciences, Chinese Academy of Agricultural Sciences.
The LPS and TC populations were sown in Shunyi (Beijing municipality) and Changji (Xinjiang Uygur Autonomous region) in the summer of 2016 and 2017. The two TC populations were also sown in Xinxiang (Henan province) in the summer of 2017. The five environments were identified as 16BJ, 17BJ, 16XJ, 17XJ and 17HN, where BJ, XJ and HN indicate Beijing, Xinjiang and Henan locations, respectively, and 16 and 17 indicate the years 2016 and 2017, respectively. The field experimental design was an incomplete block design, as explained in detail in in our previous publication [33]. The row length and row space were 5 m and 60 cm, respectively, and the planting density was 4444 plants per mu, where mu is a traditional Chinese unit for measuring field size. At the harvest stage, the yield of each plot was measured and adjusted to 14% water content. YPP was calculated by dividing the plot yield into the number of plants in the plot. The hundred kernel weight (HKW) of each plot was measured manually. For HKW, the Chang7-2 and PH6WC TC populations were evaluated in five environments (16BJ, 17BJ, 16XJ, 17XJ and 17HN), whereas the LPS population was evaluated in four environments (16BJ, 17BJ, 16XJ and 17XJ). For YPP, the Chang7-2 and PH6WC TC populations were evaluated in four environments (16BJ, 17BJ, 17XJ and 17HN), whereas the LPS population was evaluated in three environments (16BJ, 17BJ and 17XJ). All experimental research on plants, including collection of plant materials, complied with institutional, national or international guidelines. Field studies were conducted in accordance with local legislation.
The model for calculating BLUEs is as follows [53,54]: where y ikmb is the phenotypic data of the i th genotype in the b th block nested in the m th replication that is nested in the k th environment, µ is the overall mean, g i is the genotype effect, τ k is the environmental effect, gτ ik is the G × E effect, δ (k)m is the replication effect nested in each environment, β (m)b is the block effect nested in the replication effect, and ε ikmb is the residual error. When calculating the BLUEs, the other variables (except the genotype) are treated as random effects and assume to follow normal distributions. All factors are random effects when calculating broad-sense heritability; the formula is as follows [55]: where σ 2 g is the genotype variance; σ 2 gτ is the variance of G × E; σ 2 ε is the error variance; and n and r are the number of environments and replicates, respectively. The models for calculating BLUEs and broad-sense heritability are fitted by using the lme4 package [56]. The phenotypic data of the LPS and TC populations are available in Table S5.

Genotype Processing
Leaf samples of the 481 BC1F3 plants and two testers (Chang7-2 and PH6WC) were sampled, and DNA of these samples was extracted following the cetyltrimethyl ammonium bromide method [57]. DNA was sent to Beijing Capital Bio for genotyping using DNA chips containing 55,000 SNPs [58]. The genotypic data were filtered by following the steps: (1) SNPs with a calling rate lower than 97% were removed; (2) SNPs with no physical position information were removed; (3) SNPs with a missing rate greater than 5% were removed; (4) SNPs with minor allele frequencies lower than 0.05 were removed; and (5) the missing genotypes were input using the codeGen function of the R package "synbreed", the method "beagle" was used and the other settings were default [59,60]. The minor and major alleles were coded as 2 and 0, respectively, and the heterozygous genotypes were coded as 1 [60]. The genotypes of the testcross population were deduced from the testers (Chang7-2 and PH6WC) and 481 BC1F3 parents. Because some loci of the BC1F3 population were heterozygous ( Figure S2), we deduced the genotype of the testcross population as described by Cui et al. [61]. Assuming the genotype code of a loci is defined as aa = 0, Aa = 1, AA = 2, if the genotype of the tester is aa, the genotypes of the testcross progeny would be 0, 0.5 and 1; if the genotype of the tester is AA, the genotypes of the testcross progenies would be 1, 1.5 and 2. The genotypic data of the LPS, Chang7-2 TC and PH6WC TC populations are available in Table S6.

Marker-Trait Association Analysis and Calculation of PVE
In this study, we used PEPIS software to perform MTA analysis. PEPIS software is one of the few public user-friendly tools for performing genetic mapping of hybrid populations [12]. The PEPIS software package is based on a linear mixed model [62], and the statistical method of PEPIS is as follows: First, the genotype of individual j in marker k is encoded into two numerical variables: where Z jk and W jk are indicators of additive and dominant effects, respectively. A (the first homozygous genotype), H (heterozygous genotype) and B (the second homozygous genotype) indicate genotypes of each marker. Then, the following statistical model is used: where y is the n × 1 vector of the phenotypic data (BLUE); Xβ is a non-genetic effect; and a k and d k are the additive and dominance effects, respectively. For markers k and k , (aa) kk , (ad) kk , (da) kk , (dd) kk are additive × additive, additive × dominant, dominant × additive and dominant × dominant epistatic effects, respectively [63]. For each population, we first constructed the additive genotype matrix and the dominance genotype matrix. Then, we input the two matrices and BLUE data into PEPIS software to run marker-trait association analysis (http://bioinfo.noble.org/PolyGenic_QTL, accessed on 7 June 2019). The LRT threshold for declaring significance was −log10(0.05/the number of markers) according to the PEPIS pipeline [12]. The LRT threshold should be 5.49, given that the number of markers is 15,386. The SNPs with LRT values over 5.49 were identified as significant SNPs.
To calculate the phenotypic variance explained by SNPs (PVE), the significant SNPs were fitted in a multiple linear model [64], from which SSreg and SStol for each SNP were computed. SSreg is the sum of square of each SNP, whereas SStol is the sum of square of the linear model. The PVE of each SNP was calculated by dividing SSreg into SStol.

GP and MAS Analysis
We used the ridge regression best linear unbiased prediction (rrBLUP) model to run GP analysis. The rrBLUP model is [65]: where y is the BLUEs, β is a vector of the fixed effects including only the overall mean, u is the vector of random effects including only additive effect, ε is the residual error, X and Z are the design matrices. GP was implemented by running five-fold cross validation for 200 repeats. The effects of genome-wide markers were estimated, and the predicted phenotypic values were calculated by inputting the effects of genome-wide markers into the rrBLUP model. PA was calculated as the Pearson correlation coefficient between the observed and predicted phenotype. The R package rrBLUP was used to implement the GP model [66], and the code is available in File S1.
To calculate the PA of MAS model fitting the significant SNPs (defined as MAS.Sig model), a multiple regression model was fitted using the lm function in R. The phenotype was predicted using the predict function [53]. The PA was calculated by running five-fold cross validation for 200 repeats. In order to prove the effect of the MAS model fitting significant SNPs, we also calculated the PA of the MAS model fitting the same number of randomly selected SNPs. The model was defined as the MAS.Random model. We used Wilcox.test to compare the differences among the PAs of the three prediction models (GP, MAS.Sig and MAS.Random models).

Identification of Common QTLs among LPS and two TC Populations
Previously, all QTLs within a 20 cM interval were considered a single QTL [67]. According to a previous report, the average recombination rate was 1 cM/Mb [68], which is approximately 1 Mb. Therefore, we defined SNPs within 20 Mb as in linkage with one QTL.

RNA-seq Analysis and Identification of Differentially Expressed Genes around Significant SNPs
The six materials, including two hybrids (ZD958 and XY335), and their parents (Zheng58, Chang7-2, PH6WC and PH4CV) were sown in July 2018 in Haidian, Beijing. Decapitated shoot tips at the V7 stage were used for RNA extraction. For each material, three biological replicates were used, with each replicate containing three samples. We extracted RNA using an RNeasy plant mini kit (Qiagen, Germany) and checked RNA purity using a kaiaoK5500 spectrophotometer (Kaiao, Beijing, China). Then, we assessed RNA integrity and concentration using an RNA Nano 6000 assay kit of a Bioanalyzer 2100 system (Agilent Technologies, Santa Clara, CA, USA). After purifying mRNA from total RNA using poly-T oligo-attached magnetic beads, we generated the sequencing libraries using NEBNext ® Ultra™ Directional RNA Library Prep Kit for Illumina (NEB, Ispawich, MA, USA). The libraries were sequenced using the Illumina Novaseq system with a read length of 150 bp (pair end) at Annoroad Gene Technology (Beijing, China).
The RNA sequencing data were analyzed according to the procedure used in our laboratory [69]. Briefly, the raw data were filtered to remove low-quality reads, adaptor-polluted reads and reads with more than 5% N bases. The filtered clean reads were mapped to the B73 RefGen_V3 genome (www.maizegdb.org, accessed on 5 August 2019) using Hisat2 with default settings. The expression level of each sample was estimated using FPKM (fragments per kilobase of transcripts per million fragments mapped), which is calculated by normalizing raw reads. The threshold for identifying the DEGs was false discovery rate (FDR, p value < 0.05), which was computed by using Cufflinks. The code is available in File S1.
Some studies have shown that differentially expressed genes between hybrids and their parental lines contribute to hybrid performance [70][71][72][73]. Therefore, we first identified the DEGs between ZD958 and its parental lines (Zheng58 and Chang7-2) and between XY335 and its parental lines (PH6WC and PH4CV). For the Chang7-2 TC population, the physical positions of common DEGs between ZD958 and each of its parents were compared to those of the significant SNPs identified in the Chang7-2 TC population. Those DEGs located within 20 Mb of significant SNPs were identified as potential candidate genes. In the same way as stated above, common DEGs between XY335 and each of its parental lines were identified, and the physical positions of these DEGs were compared to those of the significant SNPs identified in the PH6WC TC population. In the same way, those DEGs located within 20 Mb of significant SNPs were identified as potential candidate genes. To prioritize the candidate genes for HKW, the DEGs in the surrounding regions of significant SNPs associated with HKW were compared to the candidate genes for kernel weight [18]. In the same way, in order to prioritize the candidate genes for YPP, the DEGs in the surrounding regions of significant SNPs associated with YPP were compared to the candidate genes for yield and yield-related traits [18].  Institutional Review Board Statement: Not applicable.

Informed Consent Statement: Not applicable.
Data Availability Statement: The phenotype and genotype data used in this study are available as supplementary files. The RNA-seq data were deposited in NCBI, (BioProject ID PRJNA766146). The code is available in File S1.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.