A Systemic Investigation of Genetic Architecture and Gene Resources Controlling Kernel Size-Related Traits in Maize

Grain yield is the most critical and complex quantitative trait in maize. Kernel length (KL), kernel width (KW), kernel thickness (KT) and hundred-kernel weight (HKW) associated with kernel size are essential components of yield-related traits in maize. With the extensive use of quantitative trait locus (QTL) mapping and genome-wide association study (GWAS) analyses, thousands of QTLs and quantitative trait nucleotides (QTNs) have been discovered for controlling these traits. However, only some of them have been cloned and successfully utilized in breeding programs. In this study, we exhaustively collected reported genes, QTLs and QTNs associated with the four traits, performed cluster identification of QTLs and QTNs, then combined QTL and QTN clusters to detect consensus hotspot regions. In total, 31 hotspots were identified for kernel size-related traits. Their candidate genes were predicted to be related to well-known pathways regulating the kernel developmental process. The identified hotspots can be further explored for fine mapping and candidate gene validation. Finally, we provided a strategy for high yield and quality maize. This study will not only facilitate causal genes cloning, but also guide the breeding practice for maize.


Introduction
The maize kernel, as in other crops, initially develops from the double fertilization event [1], which leads to the formation of the diploid embryo and triploid endosperm, and finally, grows into a mature grain ( Figure 1A,B). The developing maize kernel consists of three major distinct compartments: embryo, endosperm, and pericarp ( Figure 1C), wherein the embryo and endosperm are wrapped by the pericarp. The embryo, representing the generation of a new plant, is the most critical component of the seed. Plant embryogenesis undergoes a sequential of partitioning events to produce a fully developed embryo with scutellum (cotyledon), coleoptile, leaf primordia, plumule, radicle, and coleorhiza ( Figure 1C) [2][3][4]. Endosperm development starts with the fertilization of the central cell [5]. Following endosperm cellularization, the central cell differentiates into four cell types: aleurone layer (AL), basal endosperm transfer layer (BETL), starchy endosperm (SE), and embryo-surrounding region (ESR) ( Figure 1C). At the later differentiation stage, the four main cell types further differentiate and form new cell types: sub-aleurone, conducting zone (CZ), and basal intermediate zone (BIZ) [6,7]. Each cell type has distinct characteristics in cellular morphology, gene expression pattern, and biological function [7,8]. Mostly, defects As the most relevant yield factor, maize kernel is the main target for breeding. Kernel morphology is crucial in determining kernel size and yield. Maize kernel has variable types of phenotypic variation ( Figure 1B). For example, defective kernel mutants (Dek) are caused by abnormal embryo development and impairments of starch and protein synthesis in the endosperm [9,10]. Compared to wild type, small kernel mutants (Smk) have smaller kernels and delayed kernel development [11]. In embryo specific (Emb) mutants, endosperm develops normally and the embryo shows more or less severe aberrations [12], which is opposite to that in endosperm specific mutants (End) [13]. The empty pericarp (Emp) mutants exhibit empty pericarp or papery seeds in mature ears [14], opaque/floury mutants refer to those with a reduction in the content of zein in the endosperm [15], and shrunken mutants refer to those with the starch-deficient phenotype [16]. The remarkable diversity of kernel morphology in maize provides excellent research systems to explore the underlying genetic basis and molecular mechanisms of kernel development. As the most relevant yield factor, maize kernel is the main target for breeding. Kernel morphology is crucial in determining kernel size and yield. Maize kernel has variable types of phenotypic variation ( Figure 1B). For example, defective kernel mutants (Dek) are caused by abnormal embryo development and impairments of starch and protein synthesis in the endosperm [9,10]. Compared to wild type, small kernel mutants (Smk) have smaller kernels and delayed kernel development [11]. In embryo specific (Emb) mutants, endosperm develops normally and the embryo shows more or less severe aberrations [12], which is opposite to that in endosperm specific mutants (End) [13]. The empty pericarp (Emp) mutants exhibit empty pericarp or papery seeds in mature ears [14], opaque/floury mutants refer to those with a reduction in the content of zein in the endosperm [15], and shrunken mutants refer to those with the starch-deficient phenotype [16]. The remarkable diversity of kernel morphology in maize provides excellent research systems to explore the underlying genetic basis and molecular mechanisms of kernel development.
Grain yield is one of the most significant and complex quantitative traits in maize. It has been demonstrated to be affected by multiple factors, including genetic, environmental, and nutritional factors, and also their interaction with each other [17,18]. Grain size-related

Bibliometric Analysis of Kernel Size-Related Traits in Maize
As kernel size-related traits are the most directly correlative traits for grain yield in maize, they have always been popular research topics in history, especially in the past two decades. As shown in Figure 2A, the total publications in this field gradually increased since 2000, representing the high level of academic interest and popularity. It was also found that the publications on QTL mapping reached a peak in 2016 and began to grow slowly due to the development of GWAS technology. Hot research direction analysis showed that "maize", "quantitative trait loci", and "mapping" were the most relevant topics, and "grain yield", "meta-analysis", and "heterosis" might develop best in the future ( Figure 2B). l. Sci. 2023, 24, x FOR PEER REVIEW 4 of 26 It is believed that big progress has been achieved in the research on QTL of grain yield in maize, which is consistent with the key words of high frequency and centrality analysis by bibliometric analysis ( Figure 3A). Related studies mainly fall into eight clusters, and genetic analysis of yield-related traits is the focus of continuous attention. It is clear that a nonstress environment and heterosis in maize have been the research mainstreams for a long time. After systematic evolvement, the research hotspots now focus on GWAS analysis and the genetic architecture of the agronomic trait ( Figure 3B).
Taken together, genetic architecture and molecular improvement of kernel size-related traits in maize remain active research fields and will be fascinating for researchers in the future. It is believed that big progress has been achieved in the research on QTL of grain yield in maize, which is consistent with the key words of high frequency and centrality analysis by bibliometric analysis ( Figure 3A). Related studies mainly fall into eight clusters, and genetic analysis of yield-related traits is the focus of continuous attention. It is clear that a nonstress environment and heterosis in maize have been the research mainstreams for a long time. After systematic evolvement, the research hotspots now focus on GWAS analysis and the genetic architecture of the agronomic trait ( Figure 3B).
Taken together, genetic architecture and molecular improvement of kernel size-related traits in maize remain active research fields and will be fascinating for researchers in the future.

Characterization of Cloned Genes Controlling Maize Kernel Size-Related Traits
To date, 132 genes have been reported to be involved in kernel development (Table  1), and a large portion of them belong to a pentatricopeptide repeat (PPR) protein family.

Characterization of Cloned Genes Controlling Maize Kernel Size-Related Traits
To date, 132 genes have been reported to be involved in kernel development (Table 1), and a large portion of them belong to a pentatricopeptide repeat (PPR) protein family. Expansin protein, miR164 pathway, participating in kernel expansion [134] Expb15 Zm00001d045861 9 Expansin protein, miR164 pathway, participating in kernel expansion [134] qKW9 We first performed expression pattern and GO enrichment analysis with these cloned genes. Gene expression pattern analysis was carried out based on reported RNA sequencing (RNA-seq) data [156]. We collected expression data of 130 genes and no expression data were available for two genes. It was found that all of the 130 genes were expressed in kernels, suggesting a role in kernel development. Next, fragments of kilobase of exon model per million mapped fragments (FPKM) values were analyzed for all genes, and the results showed that 39 genes had a high expression level in kernels with FPKM values over 500, followed by 11, 15, 14, 24, and 21, and six genes had FPKM values with 200-500, 100-200, 50-100, 20-50, 10-20, and below ten, respectively ( Figure 4A). About 79% (114) of all genes expressed in kernels had higher FPKM values above 50, and few had FPKM values below 50 ( Figure 4A). To better address the expression pattern, the ratios of maximal expression in all tissues and maximal expression in kernels (MaxExp/MaxExpKernel) were further analyzed. A total of 92 genes had MaxExp/MaxExpKernel values of 1, indicating that these genes expressed at the highest level in kernels but not in other tissues. The Max-Exp/MaxExpKernel values with a range of one to three were for 22 genes, and three to five for six genes, over five for ten genes ( Figure 4B). If the ratio of MaxExp/MaxExpKernel ≤ 3 and FPKM value of MaxExpkernel ≥ 50 were used as the filter criterion, 123 genes could be grabbed from all reported cloned genes ( Figure 4C).
Furthermore, GO enrichment analysis was conducted to investigate the functions of the reported cloned genes. In the molecular function category, the most significantly enriched GO terms were "RNA binding", "nuclease activity", "endonuclease activity", "zinc ion binding", and "oxidoreductase activity, acting on paired donors" (Figure 4D). In the biological process category, reported genes were strongly enriched in the terms "RNA processing", "seed development", "embryo development", "RNA splicing", "protein complex biogenesis", and "hormone metabolic process" (Figure 4D). mal expression in all tissues and maximal expression in kernels (MaxExp/MaxExpKernel) were further analyzed. A total of 92 genes had MaxExp/MaxExpKernel values of 1, indicating that these genes expressed at the highest level in kernels but not in other tissues. The MaxExp/MaxExpKernel values with a range of one to three were for 22 genes, and three to five for six genes, over five for ten genes ( Figure 4B). If the ratio of MaxExp/MaxExpKernel ≤ 3 and FPKM value of MaxExpkernel ≥ 50 were used as the filter criterion, 123 genes could be grabbed from all reported cloned genes ( Figure 4C).

Characterization of QTL Clusters for Kernel Size-Related Traits in Maize
Forty-five QTL studies on the regulation of kernel size published from 2006 to 2022 were collected from the published literature (Table S1). A total of 1456 independent QTLs for four kernel size-related traits (KL, KW, KT and HKW) were collected (Table S1). QTL projection was performed using the physical positions of flanking markers of each QTL. A total of 374 QTLs could not be projected due to the incomplete flanking marker information. Finally, 1082 QTLs, including 227 QTLs related to KL, 281 to KW, 206 to KT and 368 to HKW ( Figure 5A,B), were successfully projected and used for further analysis. These QTLs were distributed randomly on the ten maize chromosomes. The total number of QTLs per chromosome ranged from 72 to 179 on chromosomes 10 and 1, respectively ( Figure 5A,B). More QTLs were gathered on chromosomes 1 (179), 2 (134), and 3 (127) and fewer were on chromosomes 10 (72), 6 (77), and 9 (80) (Figure 5A,B).  Next, we conducted an assay for identification of QTL clusters for kernel size-related traits in maize. A densely populated QTL region containing at least three QTLs was defined as a QTL cluster in this study. A total of 187 QTL clusters with multiple QTLs co-localizing were identified for four kernel size-related traits ( Figure 5C and Table S2). Among these QTL clusters, 38 were associated with KL, 51 with KW, 33 with KT, and 65 with HKW. QTL clusters related to each trait, except KT, were distributed on all ten maize chromosomes. Similar to QTL distribution, more QTL clusters localized on chromosomes 1 (35), 2 (27), and 3 (21) and fewer on chromosomes 10 (eight) and 6 (six) ( Figure 5C and Table S2). Forty clusters harbored 67 genes known to kernel size-related traits, while the rest contained no known genes. Over half of the QTL clusters (97 out of 187) harbored ten or more QTLs, 20 QTL clusters contained 20 or more QTLs, and three had 40 or more QTLs. The highest enrichment of QTLs was identified in HKW-qCL2-13 spanning a physical length of 48.5 Mb (20,505,000-69,017,291) on chromosome 1. This QTL cluster harbored 59 QTLs and five known genes (Emp602, Urb2, Ppr22, Dek1, and Ppr27) associated with HKW. Another two enriched regions HKW-qCL7-3 (49) and KW-qCL1-2 (40), were identified for HKW and KW, with three (O5, Dek41, and Dek47) and (Emp602, Urb2, and Ppr22) known genes for each region, respectively ( Figure 5C and Table S2). Thus, QTL clusters are highly informative and may harbor high-confidence genes for controlling kernel size-related traits.

Characterization of QTN Clusters for Kernel Size-Related Traits in Maize
Recently, GWAS has been a powerful and routine approach for identifying causal genetic variants of diverse traits in maize, including agronomic, quality, biochemical, physiological traits, and stress tolerance traits [157][158][159][160][161][162]. Through the collection of QTN data from previous studies, 2515 QTNs associated with four kernel size-related traits were extracted and successfully projected on a reference genome, among which, 515 QTNs were detected for KL, 840 for KW, 556 for KT and 604 for HKW ( Figure 6A,B). These QTNs were located on all ten maize chromosomes, with more QTNs on chromosomes 1 (534) and 10 (522), and fewer on chromosome 8 (89). The common feature was that these QTNs for each trait were distributed on all ten maize chromosomes; however, the distribution density was inconsistent with each other. The highest density QTNs were detected on chromosome 1 for KL (121) and KT (208) and chromosome 10 for KW (203) and HKW (218), respectively ( Figure 6A,B).

Identification of Candidate Genes Controlling Kernel Development in Maize
Owing to the critical roles in maize kernel developmental process, PPR genes were first searched for 31 identified hotspots. A total of 85 new PPR genes were detected, whose roles in kernel development remain to be investigated in further studies (Table S6).
We also tried to screen other regulatory factors for these hotspot regions. As many hotspots were identified in this study, we chose six attractive hotspot regions (OL02, OL06, OL09, OL11, OL26, and OL31), harboring high numbers of QTL/QTN clusters or without reported genes, as examples for further candidate gene analysis. Based on the physical positions, a total of 2634 genes were collected for the six hotspot regions. After filtering by the union condition of MaxExp/MaxExpKernel ≤ 3 and MaxExpKernel ≥ 50, 1314 genes were extracted for further GO enrichment analysis (Table S4). We focused on four GO terms, including "RNA processing", "hormone metabolic process", "starch metabolic process", and "mitochondrial RNA metabolic process", which were involved in kernel development pathways [6,33]. Finally, a total of 148 genes with no PPR genes were hypothesized as candidate genes controlling kernel size-related traits in maize (Table S5). The roles of these genes also required further investigation by experiments.

Discussion
Kernel size-related traits are genetically complex quantitative traits. QTL mapping and GWAS analysis methods have provided a huge amount of information for the traits. However, progress in fine mapping of causal genes and utilization of them in maize breeding programs is limited because of little systematical intergradation and validation of QTLs and QTNs. Hotspot analysis is an effective method for optimization and validation of published QTLs and QTNs, identifying true QTLs and QTNs via accurate consensus regions. Thus, a comprehensive study based on published information is required and was addressed in this study.
The complexity of maize kernel size-related traits refers to not only multiple loci controlled but also intricate regulatory networks involved. It has been well documented that cloned genes controlling maize kernel size largely encode for PPR proteins, belonging to a large family of nucleic acid binding proteins, mainly, RNA-binding proteins. PPR proteins play multiple roles in many biological processes in organelles, including transcription, RNA stabilization, RNA cleavage, translation, RNA splicing, and RNA editing, thereby affecting the expression of organelle genes [163]. Mutations in maize PPR proteins are commonly associated with severe defects in kernel development as summarized in Table 1. Starch is the major component of maize kernels; thus, genes participating in the starch metabolic process may affect kernel filling process, such as Ae1 [126], Bt2 [127], Se1 [128], Sh2 [129], Su1 [130], SWEET4c [106], Dof3 [70], Incw1 [143], Mn1 [11], and Mn6 [13] (Table 1). Plant hormone-related genes have also been found to control the kernel development in maize, including an auxin homeostasis regulatory gene Ehd1 [37] and a brassinosteroid biosynthesis gene Drg10 [133] (Table 1). Moreover, transcription factors also play critical roles in kernel development in maize. OPAQUE11 (O11) functions as a central hub of the endosperm regulatory network connecting storage reserve accumulation and metabolism, stress responses, and endosperm development [6].
Several previous studies have integrated QTL or QTN data to find more informative loci. For example, a QTL consistency and meta-analysis identified that 16 meta-QTLs from 138 QTLs for eight grain yield components in three generations were derived from the same two parents [164]. Wang et al. tried to combine meta-QTL and GWAS raw signals to dissect candidate genes for maize yield [34]. In this study, we first did three integration analysis to characterize QTL clusters, QTN clusters, and consensus hotspot regions of both. The final identified genomic regions were more informative and highly confident for predicting candidate genes.
Candidate gene analysis can be carried out based on GO enrichment annotations [13], expression pattern [34], and homologous genes in species and inter species [165,166]. Here, we established a new approach, a combination of expression pattern and GO annotations, to quickly extract candidate genes for kernel development from large gene pools. This will be helpful for prediction of candidate genes from not only hotspot regions but also newly detected genomic regions. Even though, we still could not exclude the possibility that some genes might indirectly function in controlling kernel size with no expression in kernel tissues. Here, it should be noted that we only performed candidate gene analysis for six representing hotspots. Further analysis is required for the other hotspots, QTL and QTN clusters not within any hotspot region.
The final goal of studies on kernel size-related traits in maize is to identify elite genes for improving maize yield. Here, we proposed a strategy model for the generation of high yield and quality maize, following a path as follows ( Figure 7): Step 1, Collection of various plant materials, such as inbred lines, mutants and segregating populations in diverse genetic backgrounds; Step 2: Screening and identification of candidate genes via QTL mapping, GWAS analysis, and other gene cloning methods; Step 3: Elucidation of regulatory mechanisms controlling morphogenesis, storages, and nutrients to build genetic networks at different regulatory levels; and Step 4: Molecular breeding of high yield and quality maize by molecular marker-assisted breeding, molecular design breeding, genomic selection, genetically modified breeding, and CRISPR/Cas-based genome editing technologies [167][168][169][170][171]. This strategy can be also adopted to develop maize cultivars with other qualities of interest, for instance, biotic and abiotic tolerance and resistance.

Bibliometric Analysis
Bibliometric analysis was conducted based on a series of searching queries in Web of Science, after reviewing the keywords and abstracts of related publications. A total of 952 research articles and reviews were retrieved, with records of authors, affiliated institutions, publication journals, years, titles, and abstracts, spanning literature published between 2000 and 2022 (up to 25 November 2022). Finally, the valid papers were analyzed for specific bibliometric indicators including publication volume, keywords, and highfrequency words, then visualized with CiteSpace (Drexel University, Philadelphia, PA, USA) and Scimago Graphica (SCImago Lab, Granada, Spain).

Gene, QTL and QTN Data Collection
An exhaustive bibliographic review was performed on maize cloned genes, QTLs and QTNs related to four kernel size-related traits (KL, KW, KT, and HKW). A total of 132 cloned genes were summarized in Table 1. A total of 45 QTL studies published were extracted for QTL data collection, and 14 GWAS publications were for QTNs data collection. The basic information of each literature was collected, including "trait type", "population type", "population size", "number of environments", "mapping method", "chromosomal position", "markers", "proportion of variance explained (R 2 )", "confidence interval", and "limit of detection (LOD) value", parts of which were listed in Table S1.

Bibliometric Analysis
Bibliometric analysis was conducted based on a series of searching queries in Web of Science, after reviewing the keywords and abstracts of related publications. A total of 952 research articles and reviews were retrieved, with records of authors, affiliated institutions, publication journals, years, titles, and abstracts, spanning literature published between 2000 and 2022 (up to 25 November 2022). Finally, the valid papers were analyzed for specific bibliometric indicators including publication volume, keywords, and highfrequency words, then visualized with CiteSpace (Drexel University, Philadelphia, PA, USA) and Scimago Graphica (SCImago Lab, Granada, Spain).

Gene, QTL and QTN Data Collection
An exhaustive bibliographic review was performed on maize cloned genes, QTLs and QTNs related to four kernel size-related traits (KL, KW, KT, and HKW). A total of 132 cloned genes were summarized in Table 1. A total of 45 QTL studies published were extracted for QTL data collection, and 14 GWAS publications were for QTNs data collection. The basic information of each literature was collected, including "trait type", "population type", "population size", "number of environments", "mapping method", "chromosomal position", "markers", "proportion of variance explained (R 2 )", "confidence interval", and "limit of detection (LOD) value", parts of which were listed in Table S1.

Projection of QTL, QTNs, and Genes on Reference Genome
QTL projection was carried out using flanking markers of the collected QTLs. QTLs were projected on B73 reference genome sequence V4 (B73_V4, http://maizeGDB.org, accessed on 30 November 2022). All collected QTNs and target genes were also projected on B73_V4 based on their physical positions.

Identification of QTL and QTN Clusters
After projection, QTL cluster analysis was performed by a powerful toolset Bedtools (https://bedtools.readthedocs.io/en/latest/index.html, accessed on 30 November 2022). A genomic region was defined as a QTL cluster if at least three QTLs were co-localized. QTN cluster analysis was done manually by searching in a sliding window of 5 Mb on each chromosome, and a QTN cluster region was approved if this region harbored at least five QTNs. QTL and QTN clusters for each trait were searched on all ten maize chromosomes and designated as "Trait-qCL-chromosome-number" in Table S2 and "Trait-gCL-chromosome-number" in Table S3, respectively.

Integration of QTL and QTN Hotspots
Based on physical positions, QTL and QTN clusters were integrated with the toolkit Bedtools to discover the consensus regions. A QTL/QTN hotspot was defined if at least three QTL or QTN clusters were co-localized. QTL/QTN hotspot was designated as "HSnumber". A total of 31 QTL/QTN hotspots were summarized and listed in Table 2. Gene models in hotspot regions were extracted from MaizeGDB based on the physical positions and submitted to further GO enrichment and gene expression analysis.

GO Enrichment Analysis
GO enrichment analysis was performed using a web-based server agriGO2.0 (http: //systemsbiology.cau.edu.cn/agriGOv2/index.php#, accessed on 30 November 2022) [172]. The investigated genes were assigned to GO categories for Molecular Function and Biological Process.

In Silico Gene Expression Analysis
In Silico Gene expression analysis was performed using an RNA-seq resource published previously [156]. Expression patterns of cloned and candidate genes associated with kernel size-related traits were investigated in this study.

Conclusions and Prospects
Nowadays, maize yield is challenged by population growth, various biotic and abiotic stresses, and climate change. Therefore, it is important to enhance our understanding of the genomic architecture of kernel size-related traits controlling maize yield. The cluster analysis revealed that a total of 187 QTL clusters were identified for KL, KW, KT, and HKW traits, while 84 QTN clusters were detected for the kernel size-related traits. Moreover, 31 consensus hotspot regions contained multiple QTL and QTN clusters for controlling kernel size-related traits. Candidate gene analysis revealed that 85 PPR genes were detected in QTL/QTN hotspots and 148 other candidates were predicted for six attractive hotspots. The characterization of cloned genes in expression patterns could significantly strengthen the exploitation of candidate genes in undeveloped genomic regions. The identified hotspot regions and candidate genes provided useful resources for molecular breeding to improve maize yield.