Rice (Oryza sativa L.) Grain Size, Shape, and Weight-Related QTLs Identified Using GWAS with Multiple GAPIT Models and High-Density SNP Chip DNA Markers

This study investigated novel quantitative traits loci (QTLs) associated with the control of grain shape and size as well as grain weight in rice. We employed a joint-strategy multiple GAPIT (Genome Association and Prediction Integrated Tool) models [(Bayesian-information and Linkage-disequilibrium Iteratively Nested Keyway (BLINK)), Fixed and random model Circulating Probability Uniform (FarmCPU), Settlement of MLM Under Progressive Exclusive Relationship (SUPER), and General Linear Model (GLM)]–High-Density SNP Chip DNA Markers (60,461) to conduct a Genome-Wide Association Study (GWAS). GWAS was performed using genotype and grain-related phenotypes of 143 recombinant inbred lines (RILs). Data show that parental lines (Ilpum and Tung Tin Wan Hein 1, TTWH1, Oryza sativa L., ssp. japonica and indica, respectively) exhibited divergent phenotypes for all analyzed grain traits), which was reflected in their derived population. GWAS results revealed the association between seven SNP Chip makers and QTLs for grain length, co-detected by all GAPIT models on chromosomes (Chr) 1–3, 5, 7, and 11, were qGL1-1BFSG (AX-95918134, Chr1: 3,820,526 bp) explains 65.2–72.5% of the phenotypic variance explained (PVE). In addition, qGW1-1BFSG (AX-273945773, Chr1: 5,623,288 bp) for grain width explains 15.5–18.9% of PVE. Furthermore, BLINK or FarmCPU identified three QTLs for grain thickness independently, and explain 74.9% (qGT1Blink, AX-279261704, Chr1: 18,023,142 bp) and 54.9% (qGT2-1Farm, AX-154787777, Chr2: 2,118,477 bp) of the observed PVE. For the grain length-to-width ratio (LWR), the qLWR2BFSG (AX-274833045, Chr2: 10,000,097 bp) explains nearly 15.2–32% of the observed PVE. Likewise, the major QTL for thousand-grain weight (TGW) was detected on Chr6 (qTGW6BFSG, AX-115737727, 28,484,619 bp) and explains 32.8–54% of PVE. The qTGW6BFSG QTL coincides with qGW6-1Blink for grain width and explained 32.8–54% of PVE. Putative candidate genes pooled from major QTLs for each grain trait have interesting annotated functions that require functional studies to elucidate their function in the control of grain size, shape, or weight in rice. Genome selection analysis proposed makers useful for downstream marker-assisted selection based on genetic merit of RILs.


Introduction
Rice (Oryza sativa L.) remains a staple food crop for more than half of the world's population [1,2], and serves as an important source of calories for human health and fitness [3].Rice consumption is increasing faster than any other cereals [4].Considering the Plants 2023, 12, 4044 3 of 24

Diverging Grain Phenotypes between Parental Lines and RILs
Considering that they belong to the most cultivated subspecies of rice (Oryza sativa L. ssp.indica and japonica), parental lines (Ilpum and Tung Tin Wan Hein 1, TTWH1) exhibited differential phenotypes for all analyzed rice grain size, shape, and weight-related traits, as expected (Figure 1A-D).The japonica parent (Ilpum) showed relatively shorter grains (Figure 1A) and lower LWR (Figure 1B) while having larger grains (grain width, GW) (Figure 1A) and higher thousand-grain weight (TGW, Figure 1C).In contrast, the indica TTHW1 had longer grains (Figure 1A) and higher LWR (Figure 1B), but thinner grains and lower TGW (Figure 1C).However, although we recorded an arithmetic or numerical difference between the grain thickness of Ilpum (thinner grains) compared to that of TTWH1 (thicker grains), a non-significant statistical difference was observed.
Plants 2023, 12, x FOR PEER REVIEW 3 of 23 multiple Genome Association and Prediction Tool (GAPIT) models and high-density SNP Chip DNA markers was used.

Diverging Grain Phenotypes between Parental Lines and RILs
Considering that they belong to the most cultivated subspecies of rice (Oryza sativa L. ssp.indica and japonica), parental lines (Ilpum and Tung Tin Wan Hein 1, TTWH1) exhibited differential phenotypes for all analyzed rice grain size, shape, and weight-related traits, as expected (Figure 1A-D).The japonica parent (Ilpum) showed relatively shorter grains (Figure 1A) and lower LWR (Figure 1B) while having larger grains (grain width, GW) (Figure 1A) and higher thousand-grain weight (TGW, Figure 1C).In contrast, the indica TTHW1 had longer grains (Figure 1A) and higher LWR (Figure 1B), but thinner grains and lower TGW (Figure 1C).However, although we recorded an arithmetic or numerical difference between the grain thickness of Ilpum (thinner grains) compared to that of TTWH1 (thicker grains), a non-significant statistical difference was observed.In addition, we observed a positive (right) skewness for GL (Figure 2A, p < 0.001) and LWR (Figure 2G, p < 0.001), as supported by the Shapiro-Wilk and d'Agostino and Pearson normality tests (Table S1).However, GW (Figure 2D, p < 0.001), GT (Figure 2J, p < 0.001), and TGW (Figure 2M, p < 0.01) exhibited a negative (left) skewness.As displayed in Figure 2B,E,H,K,N, a total shift (Ilpum-like pattern) in GL, GW, LWR, GT, and TGW of the RIL population was observed.Meanwhile, nearly 82.5% and 97.9% of the RIL population exhibited relatively short grains (Ilpum-like) and LWR value, respectively, against In addition, we observed a positive (right) skewness for GL (Figure 2A, p < 0.001) and LWR (Figure 2G, p < 0.001), as supported by the Shapiro-Wilk and d'Agostino and Pearson normality tests (Table S1).However, GW (Figure 2D, p < 0.001), GT (Figure 2J, p < 0.001), and TGW (Figure 2M, p < 0.01) exhibited a negative (left) skewness.As displayed in Figure 2B,E,H,K,N, a total shift (Ilpum-like pattern) in GL, GW, LWR, GT, and TGW of the RIL population was observed.Meanwhile, nearly 82.5% and 97.9% of the RIL population exhibited relatively short grains (Ilpum-like) and LWR value, respectively, against 17.5% and 2.1% having long grains and LWR (TTWH1-like phenotype).Likewise, the assessment of symmetric grain traits phenotypic distribution using the quantile-quantile (Q-Q) plot supports the observed negative or positive skewness of analyzed traits (Figure 2C,F,I,L,O).17.5% and 2.1% having long grains and LWR (TTWH1-like phenotype).Likewise, the assessment of symmetric grain traits phenotypic distribution using the quantile-quantile (Q-Q) plot supports the observed negative or positive skewness of analyzed traits (Figure 2C,F,I,L,O).

Relatedness, Correlation, Heritability, and Genomic Selection
We constructed a kinship matrix to assess the relatedness of the RIL population.As indicated in Figure 3A, based on the color pattern in the heat map, the majority of genotypes of RILs used in this study are diverse and not closely related.The density of SNP Chip DNA markers across all 12 chromosomes of rice is provided in Figure 3B. Figure 3C shows that RILs were grouped into three clusters based on their recorded grain size, shape, and weight phenotypes.Principal Component Analysis (PCA) results suggest that PC1 (55.6%),PC2 (30.8%), and PC3 (9.8%) explain 96.2% of the proportion of phenotypic variance of the RILs population.We constructed a kinship matrix to assess the relatedness of the RIL population.As indicated in Figure 3A, based on the color pattern in the heat map, the majority of genotypes of RILs used in this study are diverse and not closely related.The density of SNP Chip DNA markers across all 12 chromosomes of rice is provided in Figure 3B. Figure 3C shows that RILs were grouped into three clusters based on their recorded grain size, shape, and weight phenotypes.Principal Component Analysis (PCA) results suggest that PC1 (55.6%),PC2 (30.8%), and PC3 (9.8%) explain 96.2% of the proportion of phenotypic variance of the RILs population.Furthermore, to understand the proportion of variance explained by the individuals' breeding values for the target traits, we estimated the narrow sense heritability (h 2 ) of traits.Data in Figure 3D indicate that GL had an h 2 of 0.915, while GW (Figure 3E), GT (Figure 3F), grain LWR (Figure 3G), and TGW (Figure 3H) showed an h 2 of 0.885, 0.852, 0.454, and 0.831, respectively.
To further gain insights and assess the genetic merits of individuals in the RIL population for the target traits, we performed a genomic selection analysis based on the MLM (gBLUP) method, which is known to have a high prediction accuracy for genomic estimated breeding value (GEBV) for traits controlled by a large number of genes.The resulting output of the genomic selection analysis shows the predicted and observed GEBV of individuals in the RIL population for TGW in the reference (Figure 3I) and inference (Figure 3J) groups.
Correlation analysis is useful for understanding the relationship between two variables or identifying possible inputs for testing changes in a dependent variable while holding other variables constant.To explore the relationship between traits, we conducted a correlation analysis using the Pearson correlation method.Results in panels A and D in Figure 4 reveal a weak positive correlation between GL (R 2 = 0.153 ***) or GT (R 2 = 0.205 ***) and TGW.In contrast, panel B in Figure 4 suggests the existence of a strong positive correlation between GW and TGW (R 2 = 0.392 ***).Furthermore, to understand the proportion of variance explained by the individuals' breeding values for the target traits, we estimated the narrow sense heritability (h 2 ) of traits.Data in Figure 3D indicate that GL had an h 2 of 0.915, while GW (Figure 3E), GT (Figure 3F), grain LWR (Figure 3G), and TGW (Figure 3H) showed an h 2 of 0.885, 0.852, 0.454, and 0.831, respectively.
To further gain insights and assess the genetic merits of individuals in the RIL population for the target traits, we performed a genomic selection analysis based on the MLM (gBLUP) method, which is known to have a high prediction accuracy for genomic estimated breeding value (GEBV) for traits controlled by a large number of genes.The resulting output of the genomic selection analysis shows the predicted and observed GEBV of individuals in the RIL population for TGW in the reference (Figure 3I) and inference (Figure 3J) groups.
Correlation analysis is useful for understanding the relationship between two variables or identifying possible inputs for testing changes in a dependent variable while holding other variables constant.To explore the relationship between traits, we conducted a correlation analysis using the Pearson correlation method.Results in panels A and D in Figure 4 reveal a weak positive correlation between GL (R 2 = 0.153 ***) or GT (R 2 = 0.205 ***) and TGW.In contrast, panel B in Figure 4 suggests the existence of a strong positive correlation between GW and TGW (R 2 = 0.392 ***).

Identified QTLs for Grain Size and Shape in Rice through GWAS with Multiple GAPIT Models
We conducted a GWAS employing multiple GAPIT models, with enhanced power and accuracy for genome association, to investigate novel genetic loci for grain size and

Identified QTLs for Grain Size and Shape in Rice through GWAS with Multiple GAPIT Models
We conducted a GWAS employing multiple GAPIT models, with enhanced power and accuracy for genome association, to investigate novel genetic loci for grain size and shape, and TGW, which are essential components of rice yield.GWAS results identified 43 QTLs (all grain traits considered and GAPIT models cumulated), of which 10 QTLs were associated with GL, 14 with GW, 3 with GT, 8 with LWR, and 8 with TGW (Tables 1 and S2).From these results, we were interested to see co-detected QTLs with the highest contribution to the trait values.In this regard, we found that among the detected GL-related QTLs, seven Plants 2023, 12, 4044 7 of 24 out of ten were co-detected by both BLINK and FarmCPU GAPIT models, and mapped on Chr 1-3, 5, 7, and 11 (Figure 5A-E).Among them, four QTLs were co-detected by all GAPIT models, and two QTLs by three GAPIT models, of which AX-95918134 (qGL1-1 BFSG , Chr1, 3,820,526 bp, allelic effect: TTWH1) explains 65.5-72.5% of the observed phenotypic variance (PVE; Figure 5A, Tables 1 and S2).shape, and TGW, which are essential components of rice yield.GWAS results identified 43 QTLs (all grain traits considered and GAPIT models cumulated), of which 10 QTLs were associated with GL, 14 with GW, 3 with GT, 8 with LWR, and 8 with TGW (Tables 1  and S2).From these results, we were interested to see co-detected QTLs with the highest contribution to the trait values.In this regard, we found that among the detected GL-related QTLs, seven out of ten were co-detected by both BLINK and FarmCPU GAPIT models, and mapped on Chr 1-3, 5, 7, and 11 (Figure 5A-E).Among them, four QTLs were codetected by all GAPIT models, and two QTLs by three GAPIT models, of which AX-95918134 (qGL1-1 BFSG , Chr1, 3,820,526 bp, allelic effect: TTWH1) explains 65.5-72.5% of the observed phenotypic variance (PVE; Figure 5A, Tables 1 and S2).Likewise, among the fourteen QTLs associated with grain width (GW) identified here, six QTLs were co-detected by 2-4 GAPIT models, where qGW1-1 BFSG (AX-273945773, Chr1:5,623,288 bp) explains a PVE of about 15.5-18.9%,and the allele from Ilpum contributed to the trait value.In addition, the GW-related QTL qGW6-1 Blink coincides with the qTGW6 BFG locus for TGW, which can be noted as qGW6-1 Blink /TGW6 BFSG (AX-115737727, Chr6: 28,484,619 bp) (Figure 5B, Table 1).Other QTLs for GW are located on Chr1-3, 6, 8, and 12.The qGW6-2 FSG , qGW8 FSG , and qGW12 FSG were co-detected by FarmCPU, SUPER, and GLM.
Thousand-grain weight (TGW) is an important component of rice yield, and is determined by several factors, including GL, GW, and GT, among others.Our data in Table 1 shows that two out of eight QTLs associated with the control of TGW were co-detected by all four GAPIT models; meanwhile, BLINK and FarmCPU co-detected two others.The SNP Chip marker AX-115737727, linked to the qTGW6 BFSG QTL (Chr6: 28,484,619 bp), which coincides with GW QTL qGW6-1 Blink as indicated earlier, is here regarded as the major QTL for TGW identified by this study, considering its co-detection by all GAPIT models used and its high PVE value.The latter is followed by qTGW2-1 BF (AX-279699609, Chr2 (10,805,604 bp, PVE 18.7-27.9%)and qTGW3-2 Farm (AX-123153600, Chr3: 7,887,961 bp, PVE 13.9% (Figure 5E, Tables 1 and S2).Furthermore, Figure 6A,D,G display the plots of LOD values, while Figure 6B,E,H show the estimated effect of detected QTLs associated with the control of GL, GW, and TGW in rice.Likewise, the contribution of the identified QTLs to the trait values is displayed in Figure 6C,F,I.

Putative Candidate Genes Harbored by Grain Traits-Related QTLs
Following the detection of major QTLs associated with the control of target grain size, shape, or weight-related traits, we were interested in unraveling the identity of genes harbored by these QTLs.To achieve that, we used the known physical positions of associated SNP Chip DNA markers co-detected by both BLINK and FarmCPU, in the rice genome database (http://rice.uga.edu/cgi-bin/gbrowse/rice/#search,accessed on 1 September 2023).From data in Table 2, we can see that genes harbored by qGL1-1 BFSG are proposed to be involved in post-embryonic development, reproduction, and/or signal transduction, secondary metabolic (Os01g07880 and Os01g07930, encoding a HY5 and Zinc finger transcription factors, respectively).In the same region, genes associated with transport events (Os01g07870, encoding an ATP binding cassette (ABC) transporter), protein modification process (Os01g07920) or cellular homeostasis (Os01g07950, encoding a glutaredoxin subunit II), protein-binding activities (Os01g07980, encoding an Ankyrin repeat domain), or response to abiotic stimuli (Os01g07910, encoding NADH-cytochrome b5 reductase) are found.
The qGW1-1 BFSG region (associated with the control of grain width in rice) harbors genes with similar annotated functions to those found in qGL1-1 BFSG .The latter includes the Os01g10580 (encoding a B-box (BBX) zinc finger transcription factor protein) proposed to be involved in post-embryonic development, cellular component organization, or secondary metabolic process; the Os01g10590 (OsFTL8, encoding an FT-like 8 homologous to flowering locus T gene) involved in flower development and reproduction; the Os01g10550 (OsDEFL35, encoding Defensin-like DEFL protein), Os01g10600 (Aquaporin), or Os01g10610 (encoding a Brassinosteroids-regulated transcription factor BES1/BZR1 protein) involved in protein binding and transport activity, respectively.
Like in the case of other genetic loci, putative candidate genes were pooled from QTLs co-detected by at least two GAPIT models.Otherwise, independent QTLs, detected by either BLINK, FarmCPU, or GLM, with the highest PVE value were considered.Thus, in the case of LWR, qLWR2-1 BFSG (AX-274833045, Chr2: 10,000,097 bp (PVE 15.2% (BLINK) or 32.9% (FarmCPU)) was retained to uncover the identity of putative candidate genes.In this region (qLWR2-1 BFSG ), a set of genes encoding interesting annotated predicted functions are found.We could mention the VHS (VPS-27, Hrs, and STAM) and GAT (GGA and Tom1, Os02g17350 responsible for ubiquitin binding and ubiquitination); the Os02g17380, encoding pentatricopeptide repeat (PPR) domain-containing protein associated with the restoration of fertility (Cytoplasmic male sterility, CMS); the restoration of fertility 2 (Rf2, Os02g17380, encoding a mitochondrial glycine-rich protein) in LD-CMS; the Os02g17390 (encoding 3-hydroxyacyl-CoA dehydrogenase [45]), involved in flower development or multicellular organismal development; and the Tesmin/TSO1-like transcription factor (Os02g17460).
Considering the genetic variability between the japonica and indica rice subspecies, we were interested to see the degree of similarity of genes found in major QTLs for grain traits identified in Table 2. To achieve that, the coding sequence (CDS) of each gene in the japonica group were aligned with their orthologues in the indica group.Results in Table 2 (Column 7, CDS japonica vs. indica) reveal mutation sites (deletion or substitution) in a set of genes, while others showed a 100% similarity between the two subspecies.Understanding the correlation between factors helps quantify the strength of the direct relationship between them and figure out their affiliation [61].Thousand-grain weight (TGW) is a determinant component of rice yield and is influenced by several factors, including grain length (GL), width (GW), and thickness (GT) [62][63][64].In addition to grainfilling [65], it has been established that grain weight is determined by factors such as GL and GW, which contribute to enhancing the yield of rice [66,67].Hence, the observed strong positive correlation between GW and TGW (R 2 = 0.392 ***), and that between GL (R 2 = 0.153 ***) or GT (R 2 = 0.205 ***) and TGW would partially explain the shift in the TGW of the RIL population as shown in Figure 2M,N.

Genomic Estimated Breeding Value of RILs Population and Traits Heritability
The use of genomic selection (GS) in plant breeding has proven essential to increase the genetic gain of complex traits per unit of time and cost by enhancing the genomic estimated breeding value (GEBV) accuracies, through employing dense markers, and traits heritability [68].GS also estimates the genetic merit of individuals (in this case the RILs) based on a large set of dense markers (here SNP Chip DNA makers) across the whole genome.GS then derives the GEBVs of all individuals in the breeding population based on their genotype and phenotype profiles and predicts those that are suitable for downstream breeding programs, relying on their actual performance [69].Here, data obtained from GS analysis revealed the GEBV profile of RILs for thousand-grain weight, which is useful for downstream breeding using the best-performing RILs and associated SNP Chip DNA markers.It was interesting to see that GL (h 2 = 0.915) and GW (h 2 = 0.885), which were earlier shown to be closely related to TGW (h 2 = 0.852), recorded high heritability scores.A study by Chen et al. [70] observed a high heritability for grain shape and weight, but environmental factors, including temperature, largely influence the phenotypic values of these traits.
In the same region (qGL1-1 BFSG ), a gene encoding a Zinc finger (CCCH) encoding a TF and two others encoding Ankyrin repeat domain-containing protein.Genes encoding the CCCH Zinc-finger protein have been proposed to regulate the adaptation of plants to abiotic stress [83][84][85].Likewise, Ankyrin repeat domain-containing protein-encoding genes are thought to exclusively function to meditate protein-protein interactions and disease response [86].
3.4.The Grain Width, Thickness, and LWR-Associated QTLs qGW1-1 BSFG , qGT1 Blink , qGT2-1 Farm and qLWR2 BSFG Carry Genes Involved in Flower Development, Post-Embryonic Development and Reproduction Several loci controlling GT, GW, and LWR have been reported under various growth conditions, and mapped to almost all chromosomes of rice (Chr1, 2, 3, 6-9, 11, 12) [2,70,71].In the qGW1 BF region, we noticed the presence of a gene encoding the flowering time-like 8 locus (OsFTL8, Os01g10590), associated with flower development and reproduction.A previous report proposed that a member of the FTL family, OsFTL4 (Os09g33850), regulates flowering time in rice in response to changing environmental conditions [87].Likewise, a set of genes encoding a B-box (BBX) zinc finger protein (Os01g40580) or OsNIP1 (Os01g10600, Aquaporin) are located within the qGW1-1 BFSG region.Members of the BBXs family are a class of zinc finger proteins that encode transcription factors, and are mapped across the rice genome [88][89][90].Among them, the OsBBX14 (Os05g11510) was proposed to promote photomorphogenesis in rice [88].In the same way, aquaporin is mainly associated with water movement in-and outside the cell.A study conducted by He et al. [91] revealed that OsPIP1 encoding aquaporin interacts with other proteins to promote water uptake and seed germination.Furthermore, BES1/BZR1, a family of Brassinosteroids transcriptional regulators, were recently proposed to regulate plant development [92], kernel size in rice [50] and maize [51] through interaction with several proteins [93].
As for the qLWR2 BFSG , this QTL harbors genes with interesting annotated functions, including two genes (Os02g17350 and Os02g17380, OsRf2) described as being involved in the restoration of fertility (cytoplasmic male sterility, CMS).The Rf2 gene was earlier suggested to be involved in the mechanism for the restoration of fertility in CMS lines in rice [56].
3.5.The Grain Weight-Related QTL qTGW6 BFSG Harbors Genes Associated with Anatomical Structure Morphogenesis, Cell Differentiation, and Carbohydrate Metabolism Thousand-grain weight (TGW) is controlled by several genetic loci.To date, many quantitative trait loci (QTLs) proposed to control TGW in rice have been identified, and mapped on all 12 chromosomes of rice, and a few genes have been functionally characterized.Multiple genetic and molecular aspects of plants affect grain weight, leading to dynamic changes in cell division, expansion, and differentiation [95].
The marker AX-115737727 is linked to the major QTL for TGW (qTGW6 BFSG , Chr6: 28,484,619 bp) that coincides with the qGW6-1 Blink QTL identified by the present study.We could mention the Os06g46950 encoding the carotenoid cleavage dioxygenase 1 (CCD1) protein, the ZOS6-07 C2H2 Zinc finger TF (Os06g46920) or the Os6bglu25 (Os06g46940, encoding the β-glucosidase homologue).A study by Ren et al. [58] suggested that a member of the β-glucosidase protein family, Os06gGlu24 plays a role in seed germination and root elongation, while interacting with indole-3-acetic acid (IAA) and abscisic acid (ABA) signaling.Likewise, Ilg et al. [59] proposed the CCD1 gene as being involved in the control of endosperm color in rice.
Although grain length, width, thickness, or thousand-grain weight are known to be controlled by multiple loci, genes harbored by qGL1-1 BFSG , qGT2-1 Farm , qGW1-1 BFSG , or qTGW6 BFSG share commonalities such as being involved in multicellular organismal development, flower development or reproduction, cell division or differentiation, among other annotations.It has been evidenced that TGW largely depends on GL, GW, and GT [96], in addition to grain filling ratio.

Plant Materials, Growth Conditions, and Phenotypic Measurements
A hundred and forty-three recombinant inbred lines (RILs), derived from a cross between Ilpum (Oryza sativa L. ssp.japonica) and Tung Tin Wan Hein1 (TTWH1, Oryza sativa L. ssp.indica) were used to conduct the experiments.Initially, pre-germinated seeds of RILs were sown and grown in 50-well trays until transplanting time.Then, healthy and vigorous four-week-old seedlings were transplanted (cropping season May to October 2022) in the experimental field (altitude: 11 m, 35 • 29 31.4N, and 128 • 44 30.0 E), located at the National Institute of Crop Science (NICS), Department of Southern Area Crop Science, Paddy Crop Division, Rural Development Administration, Miryang, Republic of Korea.In the field, a total of 100 seedlings per RIL and parental lines were transplanted in four rows, with 25 plants per row and the spacing between and within the lines of 30 cm × 15 cm, respectively.The panicles for phenotypic observations of parental lines and RILs were harvested from the inside rows, excluding the border rows to avoid the border effects on the traits studied, and the competition between lines.
Soon after harvesting and postharvest processing, the grain size and shape-related phenotypes, including grain length (GL), grain width (GW), grain thickness (GT), grain length-to-width ratio (LWR, calculated as the GL divided by GW), and thousand-grain weight (TGW) were measured.The GL, GT, GW, and LWR were measured or calculated using the SmartGrain v.1.2(copyright© 2010-2012, Takanari TANABATA, Tsukuba, Japan; http://phenotyping.image.coocan.jp,accessed on 12 April 2023).Before analysis, 100 rice seeds, with a label that helps identify the RIL under analysis, were placed on the Canon scanner 5600F model using a typical rectangular rice seed dispenser (Figure S1A-C), and the dispenser was removed thereafter.Seeds were scanned and the image saved in an appropriate folder, for further processing (Figure S1D,E).Prior to analyzing the phenotype of grains, basic settings were performed, such as the selection of seed detection sensitivity strength, picking seed and background colors by right-clicking inside the imported image, determining the scale bar, etc.To analyze, we clicked on "Analyze" in the title bar, selected "Analyze area" in the drop window, and selected the target region on the open image to analyze.Final quality control was performed to ensure the accuracy of the measurement as follows: Set [Disable/Enable] (right click on the mouse) to unselect or select seeds on the image, followed by exporting as Excel "csv."Format (Figure S1F).
The GT was measured manually using a digital Vernier Caliper (CD-20CP, Mitutoyo Corp, Tokyo, Japan).However, the TGW was calculated as the [(average grain weight of 100 dehulled seeds/the number of samples (n)) × 10].

Frequency Distribution, Correlation Analysis, Quantile-Quantile Plots, Kinship Matrix
To assess the frequency distribution of traits, generate the box plots, and investigate the Pearson correlation between the target traits, GraphPad Prism 7.0 (© 1992-2016 GraphPad Software, Inc., Odesa, Ukraine) was used.The pairwise kinship matrix, also known as the co-ancestry or half relatedness, the principal component analysis (PCA) plot, and the Quantile-Quantile (Q-Q) plots were generated from the GAPIT package using R software.The SNP density plot was generated using filtered SNP Chip DNA markers with their relative p-values (GWAS results in .csvfile) using the below script: install.packages('CMplot')library(CMplot) head(my_data) CMplot(my_data,type="p",plot.type="d",bin.size=1e6,chr.den.col=c("darkgreen","yellow", "red"),file="jpg",file.name="",dpi=300, main="SNP Chip Markers Density",file.outpu t=TRUE,verbose=TRUE,width=9,height=6) Genomic Selection or Prediction Analysis To investigate the genetic merit of the RILs for specific target traits, a genomic prediction or selection analysis was conducted as described by Zhang et al. [97].The genomic best linear unbiased prediction (gBLUP), commonly used for the genomic selection based on mixed model (MLM), and having a higher prediction accuracy for traits controlled by a large number of genes was used perform the genomic selection Yu et al. [98].The genotype data were converted from the Haplotype Map (HapMap) format to numerical (see R script below) prior to performing the analysis.
To convert HapMap to numerical format: myG <-fread("file:///D:genotype data location.txt",head = FALSE) myGAPIT <-GAPIT(G=myG, output.numerical=TRUE)myGD= myGAPIT$GD myGM= myGAPIT$GM To conduct a genomic prediction: myY<-read.csv("phenotypefile location pathway.csv",sep = ",") myGD=read.csv("numericalgenotype file location pathway.csv",sep = ",") myGM=read.csv("markersfile location pathway.csv",sep = ",") The GS/GP of the inference groups (based on the ties with corresponding groups in the reference panel) was derived from Henderson's formula as follows: where K RR is the variance-covariance matrix for all groups in the reference panel, K RI is the covariance matrix between the groups in the reference and inference panels, K IR is the covariance matrix between the groups inference and reference panels, and u R is the predicted genomic values of the individuals in the inference group.To assess the reliability of the genomic prediction, the following formula is used: where PEV is the prediction error variance, representing the diagonal element in the inverse left-hand side of the mixed model equation, and σ 2 a is the genetic variance.

Genome-Wide Association Study (GWAS) Analysis
To assess the association between potential genetic loci and the traits of interest at the whole genome level, we performed a Genome-Association Study (GWAS) employing the Genome Association and Prediction Integrated Tool (GAPIT) version 3 [99] with multiple models with enhanced power and accuracy for genome association.The GAPIT models used in this study include the Bayesian-information and Linkage-disequilibrium Iteratively Nested Keyway (BLINK) [100], the Fixed and random model Circulating Probability Uniform (FarmCPU) [101], Settlement of MLM Under Progressively Exclusive Relationship (SUPER), and the General Linear Model (GLM) [102].FarmCPU and SUPER supports genomic selection, while BLINK and GLM are commonly used for breeding through marker-assisted selection (MAS).

In Silico Analysis for Gene Ontology Search
GWAS results provided useful information on novel genetic loci for grain size and shape in rice.The physical positions of associated significant SNP Chip markers were utilized to uncover the identity of genes harbored by the target genetic loci for more insights.To achieve that, we conducted a search using the browser of the Rice Genome Annotation Project database (http://rice.uga.edu/cgi-bin/gbrowse/rice/#search,accessed on 7 September 2023) and PlantPAN 3.0 (http://plantpan.itps.ncku.edu.tw/plantpan3/search.php?#results, accessed on 7 September 2023) for each specific gene locus ID.Genes encoding similar domain-containing proteins were searched in the literature (https:// funricegenes.github.io/geneKeyword.table.txt,accessed on 7 September 2023).
To assess the degree of sequence similarity of genes found in major QTLs for grain traits, the coding sequence (CDS) of each gene in the japonica group was aligned with that of its orthologous gene in the indica group.The respective CDS of target gene locus IDs (i.e., LOC_Os01g10580: Nipponbare database (http://rice.uga.edu/analyses_search_locus.shtml,accessed on 11 September 2023), and BGIOSGA002958: indica database (https://plants.ensembl.org/Oryza_indica/Info/Index,accessed on 11 September 2023) were obtained, and aligned using the ClustalW multiple alignment feature in Bioedit sequence Alignment Editor Software (Copyright © 1997-2013 Tom Hall, USA) [103].

Conclusions
Rice grain-related traits are controlled by multiple genetic loci in plants.Grain length, width, and thickness determine the thousand-grain weight, thus influencing rice yield.A total of 43 QTLs associated with grain size, shape, or weight in rice, are distributed across almost all rice chromosomes.GWAS results show seven SNP Chip makers (co-detected by both BLINK and FarmCPU) with strong association with grain length on Chr1-3, 5, 7, and 11, with qGL1-1 BFSG explaining 65.2-72.5% of the observed phenotypic variance for grain length.In addition, one (qGW1-1 BFSG ) out of fourteen QTLs for grain width was codetected by all four GAPIT models on Chr1.The qGW1-1 BFSG explains 15.5-18.9% of PVE.Likewise, either BLINK or FarmCPU identified three QTLs for grain thickness.Two of them explain 74.9% (qGT1 Blink ) and 54.9% (qGT2-1 Farm ) of the observed PVE.Regarding lengthto-width ratio, the qLWR2 BFSG , detected by all GAPIT models, explains about 15.2-32%) for LWR.As for thousand-grain weight, the qTGW6 BFSG QTL coincided with qGW6-1 Blink for grain width and explained 32.8-54% of PVE.Putative candidate genes pooled from co-detected regions by all four GAPIT models have interesting annotated functions, and either associated with flower development, reproduction, post-embryonic development, carbohydrate metabolisms, or transcription regulation.Downstream functional studies, through the use of genetic engineering approaches or mutagenesis, would help elucidate the molecular functions of the candidate genes.The major QTLs for each grain trait can serve for downstream marker-assisted selection based on genome selection results.

Figure 1 .
Figure 1.Differential phenotypic difference between parental lines.(A) Comparison of rice grain trait values (grain length, width, and grain thickness) of Ilpum (japonica) and Tung Tin Wan 1 (indica), (B) grain length-to-width ratio, (C) thousand-grain weight of parent lines, and (D) visualization of grain phenotypes of parental lines (Ilpum, upper side and TTWH1 downside).Bars (with hatches and those filled in black) are mean values (n = 10) ± SE. *** p < 0.001, ns non-significant.

Figure 1 .
Figure 1.Differential phenotypic difference between parental lines.(A) Comparison of rice grain trait values (grain length, width, and grain thickness) of Ilpum (japonica) and Tung Tin Wan 1 (indica), (B) grain length-to-width ratio, (C) thousand-grain weight of parent lines, and (D) visualization of grain phenotypes of parental lines (Ilpum, upper side and TTWH1 downside).Bars (with hatches and those filled in black) are mean values (n = 10) ± SE. *** p < 0.001, ns non-significant.

Figure 2 .
Figure 2. Frequency distribution of traits, box plots, and Quantile-Quantile (Q-Q) plot.(A) Frequency distribution for grain length, (D) grain width, (G) length-to-width ratio, (J) grain thickness, (M) thousand-grain weight.(B) box plots showing the shift in grain length of the recombinant inbred lines relative to their parental lines, (E) grain width, (H) grain length-to-width ratio, (K) grain thickness, (N) thousand-grain weight, (C) Quantile-Quantile (Q-Q) plot for grain length, (F) grain weight, (I) grain length-to-width ratio, (L) grain thickness, and (O) thousand-grain weight, with the y-axis representing the quantile in the sample (RIL population) and the x-axis denoting data falling in the probability distribution.The blue thick lines in the Q-Q plots represent the observed quantile (set of values of a variate that divides a frequency distribution into equal groups, each containing the same fraction of the total population) plotted against the expected quantile (red line).

Figure 2 .
Figure 2. Frequency distribution of traits, box plots, and Quantile-Quantile (Q-Q) plot.(A) Frequency distribution for grain length, (D) grain width, (G) length-to-width ratio, (J) grain thickness, (M) thousand-grain weight.(B) box plots showing the shift in grain length of the recombinant inbred lines relative to their parental lines, (E) grain width, (H) grain length-to-width ratio, (K) grain thickness, (N) thousand-grain weight, (C) Quantile-Quantile (Q-Q) plot for grain length, (F) grain weight, (I) grain length-to-width ratio, (L) grain thickness, and (O) thousand-grain weight, with the y-axis representing the quantile in the sample (RIL population) and the x-axis denoting data falling in the probability distribution.The blue thick lines in the Q-Q plots represent the observed quantile (set of values of a variate that divides a frequency distribution into equal groups, each containing the same fraction of the total population) plotted against the expected quantile (red line).

Figure 3 .
Figure 3. Kinship matrix, marker density, PCA, heritability, and genome selection results.(A) heat map showing the relatedness or the level of co-ancestry of the population, (B) density map of SNP Chip DNA markers, (C) principal component analysis (PCA), (D) narrow sense heritability of grain

Figure 3 .
Figure 3. Kinship matrix, marker density, PCA, heritability, and genome selection results.(A) heat map showing the relatedness or the level of co-ancestry of the population, (B) density map of SNP Chip DNA markers, (C) principal component analysis (PCA), (D) narrow sense heritability of grain length, (E) grain width, (F) grain thickness, (G) grain length-to-width ratio, (H) thousand-grain weight, and (I,J) results of the genome selection analysis that predict the genomic estimated breeding value (GEBV) of individuals in the RIL population in the reference group (Ref) and the inference group (Inf).

Figure 4 .
Figure 4. Pearson correlation analysis results between traits.(A) Correlation results between grain length and thousand-grain weight, (B) grain width and thousand-grain weight, (C) length-to-width and thousand-grain weight, and (D) grain thickness and thousand-grain weight.The closer the data points come to forming a straight line (around the red line), the higher the correlation between the two variables, or the stronger the relationship.

Figure 4 .
Figure 4. Pearson correlation analysis results between traits.(A) Correlation results between grain length and thousand-grain weight, (B) grain width and thousand-grain weight, (C) length-to-width and thousand-grain weight, and (D) grain thickness and thousand-grain weight.The closer the data points come to forming a straight line (around the red line), the higher the correlation between the two variables, or the stronger the relationship.

Figure 5 .
Figure 5. Detected significant SNP Chip markers by GWAS.Genome-wise Manhattan plots of showing significant SNP Chip DNA markers detected by BLINK, FarmCPU, SUPER, and/or GLM GAPIT models, with their associated with (A) grain length, (B), grain width, (C) grain thickness, (D) grain length-to-width ratio, and (E) thousand-grain weight.

Figure 5 .
Figure 5. Detected significant SNP Chip markers by GWAS.Genome-wise Manhattan plots of showing significant SNP Chip DNA markers detected by BLINK, FarmCPU, SUPER, and/or GLM GAPIT models, with their associated with (A) grain length, (B), grain width, (C) grain thickness, (D) grain length-to-width ratio, and (E) thousand-grain weight.

Figure 6 .
Figure 6.QTL logarithm of the odds, estimated effects and phenotypic variance explained.Logarithm of the odds (LOD) scores for significant SNP Chip DNA markers linked to (A) grain length, (D) width, and (G) thousand-grain weight.(B,E,H) Estimated effects of QTLs, and (C,F,I) phenotypic variance explained (PVE) values of QTLs for grain length, width, and thousand-grain weight, respectively.−log10(p) is the negative logarithm base 10 of the p-value denoting the logarithm of the odds (LOD) of SNP Chip markers linked to detected genetic loci for each trait.MAF plotted in the x-axis of all panels indicates the minor allelic frequency referring to the lower allelic frequency at a given genetic locus in the RIL population.