Fine Mapping and Candidate Gene Analysis of Rice Grain Length QTL qGL9.1

Grain length (GL) is one of the crucial determinants of rice yield and quality. However, there is still a shortage of knowledge on the major genes controlling the inheritance of GL in japonica rice, which severely limits the improvement of japonica rice yields. Here, we systemically measured the GL of 667 F2 and 1570 BC3F3 individuals derived from two cultivated rice cultivars, Pin20 and Songjing15, in order to identify the major genomic regions associated with GL. A novel major QTL, qGL9.1, was mapped on chromosome 9, which is associated with the GL, using whole-genome re-sequencing with bulked segregant analysis. Local QTL linkage analysis with F2 and fine mapping with the recombinant plant revealed a 93-kb core region on qGL9.1 encoding 15 protein-coding genes. Only the expression level of LOC_Os09g26970 was significantly different between the two parents at different stages of grain development. Moreover, haplotype analysis revealed that the alleles of Pin20 contribute to the optimal GL (9.36 mm) and GL/W (3.31), suggesting that Pin20 is a cultivated species carrying the optimal GL variation of LOC_Os09g26970. Furthermore, a functional-type mutation (16398989-bp, G>A) located on an exon of LOC_Os09g26970 could be used as a molecular marker to distinguish between long and short grains. Our experiments identified LOC_Os09g26970 as a novel gene associated with GL in japonica rice. This result is expected to further the exploration of the genetic mechanism of rice GL and improve GL in rice japonica varieties by marker-assisted selection.


Introduction
Grain size, a complex quantitative trait involving grain length (GL), grain width (GW), grain thickness, and the grain length/width ratio (GL/W), is one of the determinants of grain weight, which not only affects the yield, but also the appearance quality of rice [1,2]. As an important factor affecting rice yield and quality, mining grain-shape-related genes is an important means to understanding their molecular mechanism and genetic basis. As of 2023, at least 201 rice grain shape genes have been identified. They are located on all chromosomes of the rice genome, and most are distributed on chromosomes 1, 2, 5, 6, and 7 (https://pubmed.ncbi.nlm.nih.gov/) (accessed on 7 May 2023). Most of these 201 genes directly regulate the rice grain shape, and the remaining genes indirectly regulate the grain size through an interaction network between genes.
Previous studies have found that the major factors affecting grain size include the ubiquitination-protease pathway, G-protein signaling, mitogen-activated protein kinase signaling, phytohormone regulation, and various transcriptional regulators [3]. GRAIN WIDTH 2 (GW2), the first QTL cloned in rice, encodes a RING-type E3 ubiquitin ligase with ubiquitination and autoubiquitination activity located in the cytoplasm and nucleus [4]; Gproteins that regulate grain size in rice include the a-subunit encoded by RGA1/D1 [5]; and the β-subunit encoded by DEP1 [6]. In addition, synthetic-hormone-related genes, such as OsTAR1 [7] and TGW6 [8], are also involved in the regulation of seed size. Moreover, several 2 of 15 other transcriptional regulatory modules, such as OsmiR396-OsGRFs [9] and AP2/ERF modules [10], also play a key role in rice grain shape determination. In sum, rice grain shape is regulated by multiple factors. Even so, the current molecular network of rice grain types is still insufficient to explain all of the genetic variations. It is of great importance to explore new major genes and allelic variations for the high-yield breeding of rice.
It is well known that the grain length of indica rice is longer than that of japonica rice. Therefore, the excellent allelic variations of some important GL genes cloned at present are from indica rice. For example, the loss-of-function variations of GS3 [11] and TGW6 [8] are mostly from indica rice. GS5 is a large grain allelic variation that was retained during the domestication of indica rice [12]. The overexpression of LG3 [13] and GLW7 [14] in indica rice can increase the grain length. Using these allelic variations to improve the grain shape of japonica rice, in addition to the method of breeding offspring through indicia-japonica hybridization, mutants can be directly obtained through gene editing. However, from the perspective of the geographical adaptability of subspecies, when using the offspring of indica-japonica hybridization, it is difficult to obtain materials with an excellent background of japonica rice, therefore, it is difficult to produce and apply unless it is included in a large number of molecular breeding works. The materials obtained by gene editing also cannot be used as cultivated varieties due to a certain degree of growth defects [15]. Considering this comprehensively, it is wise and efficient to clone new grain shape genes from japonica rice and apply them to japonica rice breeding.
Heilongjiang Province, as the main production area of early maturing japonica rice in China, had a planting area of about 6.43 million hectares in 2022, accounting for more than 15.5% of the national rice area. However, the grain length value of japonica rice in this region is low compared to indica rice in southern China, which hinders yield improvement. Therefore, the identification of new alleles controlling grain length from local sources is significant to increasing rice yield. Recent efforts combining QTL-seq and linkage analysis have led to the localization of several candidate genes in rice [16][17][18][19]. In this study, we used two japonica rice varieties, Pin20 and Songjing 15 (SJ15), with significant differences in grain shape, as parents in order to develop the F 2:3 population and the BC 3 F 3 population. Through QTL-seq, linkage analysis, and fine mapping strategies, we identified the longgrain genes in the large-grain-variety of Pin20. Furthermore, the KASP molecular markers were developed to identify plants with different grain types. This will facilitate the in-depth study of grain type improvement and regulatory mechanisms in japonica rice varieties.

Screening and Evaluation of Plant Height
The phenotypic detection showed that there were significant differences in the grain length (GL) and length-width ratio (GL/W) between SJ 15 and Pin 20 (Table 1, Figure 1A,B). For the F 2:3 population, the variation ranges of GL, grain width (GW), and GL/W were 0.68-1.09 cm, 0.33-0.43 cm, and 1.88-3.09 cm (Table 1, Figure 1C-E), respectively. Except for grain width, the absolute values of the skewness and kurtosis of GL and GL/W were less than 1 (Table 1), showing continuous variation and normal distribution, indicating that these two traits conform to the genetic model of quantitative traits.

Phenotypic Analysis of Extreme DNA Pools of Grain Type
The differences in individual traits had an impact on the genotype frequency analysis of the DNA hybrid pool on the whole genome. To clarify the differences in the genetic background between the two DNA pools, 30 long-grain and 30 short-grain lines were analyzed for GL, GW, GL/W, spike number (PN), number of grains per spike (NGS), and spike weight per plant (SW). The results showed that there were highly significant differences in GL and GL/W between the long-grain and short-grain pools, while there were no significant differences in GW, PN, NGS, and SW ( Figure 2). Therefore, the phenotypic differences between these two DNA mixing pools are distributed only in the GL and GL/W; thus, we selected 30 representative long-grain individuals and 30 short-grain individuals to prepare the GL-pool and GS-pool in order to map the candidate genomic loci using bulked segregant analysis (BSA) and re-sequencing analyses, respectively.

Phenotypic Analysis of Extreme DNA Pools of Grain Type
The differences in individual traits had an impact on the genotype frequency analysis of the DNA hybrid pool on the whole genome. To clarify the differences in the genetic background between the two DNA pools, 30 long-grain and 30 short-grain lines were analyzed for GL, GW, GL/W, spike number (PN), number of grains per spike (NGS), and spike weight per plant (SW). The results showed that there were highly significant differences in GL and GL/W between the long-grain and short-grain pools, while there were no significant differences in GW, PN, NGS, and SW ( Figure 2). Therefore, the phenotypic differences between these two DNA mixing pools are distributed only in the GL and GL/W; thus, we selected 30 representative long-grain individuals and 30 short-grain individuals to prepare the GL-pool and GS-pool in order to map the candidate genomic loci using bulked segregant analysis (BSA) and re-sequencing analyses, respectively.

Identification of a Major QTL Controlling GL in Rice Using QTL-Seq
A total of 315,277,920 clean reads and 47,291,688,000 bases were obtained by re-se- ** indicates the significant difference detected at p < 0.01 level. Each black dot on the box plot represents the phenotypic value corresponding to a single independent plant.

Identification of a Major QTL Controlling GL in Rice Using QTL-Seq
A total of 315,277,920 clean reads and 47,291,688,000 bases were obtained by resequencing and data quality control of the two DNA mix pools and both parents (Supplementary Table S1). In addition, the ED (Euclidean distance) and two-tailed Fisher's exact test for each bulk were calculated by aligning the sequence with the Nipponbare reference genome. After calculating a statistical confidence interval of p < 0.01 between the two extreme phenotypic blocks, a 4.21 Mb (14,240,001 bp-18,445,701 bp) genomic region on chromosome 9 was identified by overlapping the results of the three algorithms (Table 2, Figure 3). We designated this QTL as qGL9.1.

Narrowing of qGL9.1 to a Fine Region
For pyramid qGL9.1, eight KASP (kompetitive allele-specific PCR) markers were developed for linkage analysis based on the base information provided by re-sequencing data from Pin20, SJ15, and the two pools, and a significant peak interval was detected in a 448.7-kb region between SNP5 and SNP6 on chromosome 9 when the threshold was 3.0 ( Figure 4). The qGL9.1 contributed to 20.09% of the phenotypic variation for GL ( Table 3). The positive-effect allele of qGL9.1 was derived from Pin20.

Narrowing of qGL9.1 to a Fine Region
For pyramid qGL9.1, eight KASP (kompetitive allele-specific PCR) markers w veloped for linkage analysis based on the base information provided by re-sequ data from Pin20, SJ15, and the two pools, and a significant peak interval was dete a 448.7-kb region between SNP5 and SNP6 on chromosome 9 when the threshold ( Figure 4). The qGL9.1 contributed to 20.09% of the phenotypic variation for GL (T The positive-effect allele of qGL9.1 was derived from Pin20.  In order to finely localize qGL9.1, we constructed the BC3F2 population an typed the BC3F2 population using the linkage markers of qGL9.1 and six KASP m consistent with the genetic background of SJ15 ( Figure 5) and finally screened to combinants and obtained the BC3F3 population containing 1570 lines after self-c For the fine mapping of qGL9.1, three KASP markers between SNP5 and SNP6 w veloped from the re-sequencing data (Supplementary Table S2, Figure 6A). A tot recombinants were identified by scanning the genotypes of 1570 BC3F3 individu these 21 recombinants were classified into seven groups ( Figure 6B). After progen the grain lengths of recombinant groups one and two were biased toward the sho parent SJ15, and the remaining recombinant groups were close to the long-grain Pin20. qGL9.1 was delimited to the 93.0 Kb interval between the SNP10 and SNP1 ers . According to the MSU Rice Genome Annotation Project R [20], there are 15 protein-coding genes on the qGL9.1 locus ( Figure 6C), and infor on SNP/InDel in the qGL9.1 region (upstream, UTR3, downstream, and exonic) is l Supplementary Table S3. We found that 10 of the 15 genes had sequence difference promoter, exon, or downstream regions.  In order to finely localize qGL9.1, we constructed the BC 3 F 2 population and genotyped the BC 3 F 2 population using the linkage markers of qGL9.1 and six KASP markers consistent with the genetic background of SJ15 ( Figure 5) and finally screened to two recombinants and obtained the BC 3 F 3 population containing 1570 lines after self-crossing. For the fine mapping of qGL9.1, three KASP markers between SNP5 and SNP6 were developed from the re-sequencing data (Supplementary Table S2, Figure 6A). A total of 21 recombinants were identified by scanning the genotypes of 1570 BC 3 F 3 individuals, and these 21 recombinants were classified into seven groups ( Figure 6B). After progeny tests, the grain lengths of recombinant groups one and two were biased toward the short-grain parent SJ15, and the remaining recombinant groups were close to the long-grain parent Pin20. qGL9.1 was delimited to the 93.0 Kb interval between the SNP10 and SNP11 markers . According to the MSU Rice Genome Annotation Project Release 7 [20], there are 15 proteincoding genes on the qGL9.1 locus ( Figure 6C), and information on SNP/InDel in the qGL9.1 region (upstream, UTR3, downstream, and exonic) is listed in Supplementary Table S3. We found that 10 of the 15 genes had sequence differences in the promoter, exon, or downstream regions.

Candidate Gene Analysis
Through the qRT-PCR analysis of 10 genes with sequence variation (Figure 7), it was found that significant differences in the relative expression of LOC_O09g26970 between

Candidate Gene Analysis
Through the qRT-PCR analysis of 10 genes with sequence variation (Figure 7), it was found that significant differences in the relative expression of LOC_O09g26970 between

Candidate Gene Analysis
Through the qRT-PCR analysis of 10 genes with sequence variation (Figure 7), it was found that significant differences in the relative expression of LOC_O09g26970 between Pin20 and SJ15 occurred in samples from the 2 cm, 5 cm, and 7 cm panicles, while no significant differences were found in the relative expression of the 13 cm panicles. The results showed that the relative expression of LOC_Os09g26970 in Pin 20 was higher than that in SJ15 at the early stage of panicle development. The expression levels of the other nine genes were not significantly different at the grain development stage. We further analyzed the structural domains of 15 genes through the Pfam database, annotated them using the Ensembl database, and found that LOC_Os09g26970 encodes a cytochrome P450 family protein CYP92A8 (Supplementary Table S4). In addition, using the results of gene annotation based on the re-sequencing data, through pathway significant enrichment analysis, it was found that 616 genes in the 4.2 Mb interval were significantly enriched in arginine and proline metabolism (ko00330), nitrogen metabolism (ko00910), cysteine and methionine metabolism (ko00270), pentose phosphate pathway (ko00030), and glycolysis/gluconeogenesis (ko00010) (Figure 8). Among them, five genes (LOC_Os09g26940, LOC_Os09g26950, LOC_Os09g26960, LOC_Os09g26970, and LOC_Os09g26980) in the candidate interval were significantly enriched in brassinolide biosynthesis (ko00905) (Supplementary Table S5). The genes encoding the cytochrome P450 family proteins have been shown to play an important role in regulating rice grain shape, especially D11 [21], GW10 [22], and other proteins encoding the cytochrome P450 family, which plays an active role in controlling grain size through the BR pathway. Therefore, as a P450 family protein significantly enriched in the BR pathway, we believe that LOC_Os09g26970 is a candidate gene for qGL9.1.
Pin20 and SJ15 occurred in samples from the 2 cm, 5 cm, and 7 cm panicles, while n significant differences were found in the relative expression of the 13 cm panicles. Th results showed that the relative expression of LOC_Os09g26970 in Pin 20 was higher tha that in SJ15 at the early stage of panicle development. The expression levels of the oth nine genes were not significantly different at the grain development stage. We further an alyzed the structural domains of 15 genes through the Pfam database, annotated the using the Ensembl database, and found that LOC_Os09g26970 encodes a cytochrome P45 family protein CYP92A8 (Supplementary Table S4). In addition, using the results of gen annotation based on the re-sequencing data, through pathway significant enrichmen analysis, it was found that 616 genes in the 4.2 Mb interval were significantly enriched arginine and proline metabolism (ko00330), nitrogen metabolism (ko00910), cysteine an methionine metabolism (ko00270), pentose phosphate pathway (ko00030), and glycol sis/gluconeogenesis (ko00010) (Figure 8). Among them, five genes (LOC_Os09g2694 LOC_Os09g26950, LOC_Os09g26960, LOC_Os09g26970, and LOC_Os09g26980) in the can didate interval were significantly enriched in brassinolide biosynthesis (ko00905) (Suppl mentary Table S5). The genes encoding the cytochrome P450 family proteins have bee shown to play an important role in regulating rice grain shape, especially D11 [21], GW1 [22], and other proteins encoding the cytochrome P450 family, which plays an active ro in controlling grain size through the BR pathway. Therefore, as a P450 family protein si nificantly enriched in the BR pathway, we believe that LOC_Os09g26970 is a candida gene for qGL9.1.
In order to obtain a molecular marker that could distinguish the grain length phenotype, we designed a KASP marker for an nSNP of the LOC_Os09g26970. SNP10 accurately divided the genotypes of 92 individuals in the 94 BC3F3 lines into Pin20 and SJ15 genotypes (Figure 9). These clustering results clearly distinguished the two alleles, therefore, the KASP8 marker was used to genotype the rice plants. Of the plants with the Pin20 allele,  Table S5) in the qGL9.1 interval. Horizontal coordinate: enrichment factor (number of differences in this pathway divided by all numbers); vertical coordinate: pathway name; bubble area size: number of genes belonging to this pathway in the target gene set; bubble color: enrichment significance. The redder the color, the smaller the P/Q value. Table S6). The 13 SNPs were previously identified in the 3010 Rice Genome Project and the Rice Functional and Genomic Breeding (RFGB) v2.0 database [23,24]. Among them, 10 SNPs (Chr9-16397163, Chr9-16397736, Chr9-16397760, Chr9-16397792, Chr9-16398197, Chr9-16398200, Chr9-16398479, Chr9-16398989, Chr9-16399274, and Chr9-16399673) constituted nine haplotypes, and Hap1, Hap5, Hap6, Hap8, and Hap9 were mainly distributed in indica rice. Hap2, Hap3, and Hap4 were mainly distributed in japonica rice (Supplementary Table S7). The germplasm of Hap9, consistent with the Pin 20 genotype, and the Hap2, consistent with the SJ15 genotype, differed significantly between GL and GW, and the other haplotypes caused significant phenotypic differences in GL, GW, and GL/W. (Supplementary Table S8). It is worth noting that the haplotype Hap9 contributes to the optimal GL (9.36 mm) and GL/W (3.31), suggesting that Pin20 is a cultivated species carrying the optimal grain length variation of LOC_Os09g26970.

Sanger sequencing analysis identified 13 nSNPs on LOC_Os09g26970 (Supplementary
In order to obtain a molecular marker that could distinguish the grain length phenotype, we designed a KASP marker for an nSNP of the LOC_Os09g26970. SNP10 accurately divided the genotypes of 92 individuals in the 94 BC 3 F 3 lines into Pin20 and SJ15 genotypes (Figure 9). These clustering results clearly distinguished the two alleles, therefore, the KASP8 marker was used to genotype the rice plants. Of the plants with the Pin20 allele, SNP10 identified 89.8% of the plants showing long-grain phenotypes. In contrast, SNP10 was able to identify 86.0% of the short-grain phenotype plants carrying the SJ15 genotype (Supplementary Table  S9). This result implies that SNP10 can effectively distinguish the grain length of rice and can be used as an important molecular marker for breeding improvement.  Table S9). This result implies that SNP10 can effectively distinguish the grain length of rice and can be used as an important molecular marker for breeding improvement.

QTL-Seq Analysis Combined with a Screening of Recombinant Plants Can Efficiently Fine-Map Candidate Genes
Grain length is a significant factor that limits rice yield. Improving and utilizing the large-effect genomic loci associated with GL is essential to increase rice yield. The authors of previous studies have carried out extensive QTL analyses and localized a group of genes that are associated with GL in rice. For example, PGL1 [25] and BG1 [26] positively regulated GL by increasing the cell size, whereas SG1 [27], SDF5 [28], OsGDI1 [29], and TGW6 [8] negatively regulated rice GL by reducing the cell size. However, the strategy of isolating genes by map-based cloning is time-consuming and labor-intensive. In recent years, with the development and application of biological high-throughput sequencing technology and bioinformatics analysis technology, the efficiency of mining QTL has significantly improved. The combination of traditional QTL mapping and QTL-seq can effectively and quickly identify the GL major QTL interval. For example, the GL locus qTGW5.3 was mapped to a 5 Mb physical interval by QTL-seq. Furthermore, the recombinants and the progeny tests delimited the candidate region of qTGW5.3 to 1.13 Mb [30]. Due to the lack of further mapping populations and recombinant plants, the candidate genes of qTGW5. 3 have not been identified. In this study, qGL9.1 was isolated from Pin20 by using the QTL-seq strategy based on the ED, Fisher algorithm, and G value method, and qGL9.1 was associated with a single strong peak in the three calculation models (Figure 3). This shows a significant difference in the allele ratio between the two mixed pools. To fine map the qGL9.1 candidate gene, several approaches have been used to narrow down the genomic region associated with qGL9.1. Firstly, qGL9.1 was fine-mapped to a 93 Kb interval containing 15 annotated genes by using the recombinant plants to optimize the target interval (Figure 3). LOC_Os09g26970 was further anchored as the most reliable candidate for qGL9.1 by expression analysis and functional annotation of the candidate

QTL-Seq Analysis Combined with a Screening of Recombinant Plants Can Efficiently Fine-Map Candidate Genes
Grain length is a significant factor that limits rice yield. Improving and utilizing the large-effect genomic loci associated with GL is essential to increase rice yield. The authors of previous studies have carried out extensive QTL analyses and localized a group of genes that are associated with GL in rice. For example, PGL1 [25] and BG1 [26] positively regulated GL by increasing the cell size, whereas SG1 [27], SDF5 [28], OsGDI1 [29], and TGW6 [8] negatively regulated rice GL by reducing the cell size. However, the strategy of isolating genes by map-based cloning is time-consuming and labor-intensive. In recent years, with the development and application of biological high-throughput sequencing technology and bioinformatics analysis technology, the efficiency of mining QTL has significantly improved. The combination of traditional QTL mapping and QTL-seq can effectively and quickly identify the GL major QTL interval. For example, the GL locus qTGW5.3 was mapped to a 5 Mb physical interval by QTL-seq. Furthermore, the recombinants and the progeny tests delimited the candidate region of qTGW5.3 to 1.13 Mb [30]. Due to the lack of further mapping populations and recombinant plants, the candidate genes of qTGW5.3 have not been identified. In this study, qGL9.1 was isolated from Pin20 by using the QTL-seq strategy based on the ED, Fisher algorithm, and G value method, and qGL9.1 was associated with a single strong peak in the three calculation models (Figure 3). This shows a significant difference in the allele ratio between the two mixed pools. To fine map the qGL9.1 candidate gene, several approaches have been used to narrow down the genomic region associated with qGL9.1. Firstly, qGL9.1 was fine-mapped to a 93 Kb interval containing 15 annotated genes by using the recombinant plants to optimize the target interval (Figure 3). LOC_Os09g26970 was further anchored as the most reliable candidate for qGL9.1 by expression analysis and functional annotation of the candidate genes. Therefore, qGL9.1 can be considered the most significant target for GL in exploring candidate genes. Our study is a good example of using QTL-seq combined with fine mapping to mine candidate genes to obtain major QTL intervals.

LOC_Os09g26970 on qGL9.1 Links to Grain Length in Rice
LOC_O09g26970 is a cytochrome P450 structural domain (PF00067) gene, and the cytochrome P450 gene family is one of the largest supergene families in plants [31]. There are 356 P450 genes in the rice genome, and P450 plays an important role in various biochemical pathways that produce primary and secondary metabolites [32], some of which are essential for controlling plant cell proliferation and expansion. The proteins encoding cytochrome P450 families such as D11 [21], GW10 [22], BSR2 [33], GL3.2 [34], and GE [35] play an important role in regulating rice grain shape. In particular, the P450 family proteins encoded by D11 and GW10 play an active role in controlling the grain size through the biosynthetic pathway of brassinolide. In plants, BR is an essential steroid hormone that regulates many processes during plant development. It is involved in various biological reactions, such as stem elongation and vascular differentiation [36], especially in the regulation of grain size. Based on the re-sequencing data, this study found that LOC_Os09g26970 was significantly enriched by KEGG enrichment analysis, which was related to the biosynthesis of brassinolide (ko00905). Therefore, it is speculated that the pathway of qGL9.1 regulating grain shape is likely to be similar to that of D11 and GW10. We further sequenced the CDS region of LOC_Os09g26970 and found that there were 13 SNPs within it. Some of the haplotypes showed significant differences in grain length and grain width. We speculated that this locus was functional for grain length and grain width, but only showed differences in grain length in the genetic population of this study, which may have been caused by limited genetic variation. Next, we will construct various transgenic materials, such as knockout, overexpression, and complementation of LOC_Os09g26970, to verify its biological function in regulating rice grain length and analyze whether the effect of LOC_Os09g26970 on grain length is affected by the BR pathway by applying exogenous BR.

Breeding Value and Potential of qGL9.1
In general, the cooking and eating quality of japonica rice is better than that of indica rice. While indica rice has longer grains and a better appearance quality than japonica rice, the quality of indica rice with long grains is often inferior to that of japonica rice [2,37]. In recent years, the molecular breeding and utilization of grain shape genes in indica and japonica rice completed several important tasks. New indica hybrid rice varieties, Taifengyou 55 and Taifengyou 208, with an improved grain yield and quality were developed by pyramiding semi-dominant GS3 and GW7 TFA alleles from tropical japonica rice varieties [38]. The GW8 and GS3 alleles were polymerized into HJX74 to produce short and wide grains, resulting in the breeding of Huabiao 1 [12]. Using the deletion of TGW6 and its alleles in the functional region, the functional marker CAPs6-1 of TGW6 was developed and screened in order to quickly screen rice varieties carrying TGW6 [39]. In this study, allelic variation A from japonica Pin20 was present in a small number of indica rice samples, but has not been identified in other japonica rice samples, indicating that LOC_Os09g26970 may be a rare grain shape regulator in japonica rice germplasm. In addition, 10 SNPs in the coding region of LOC_Os09g26970 had nine haplotypes in 3010 rice germplasms. The grain length of the germplasm containing the Pin20 genotype was 9.36 mm, while the average grain length of the other haplotypes was 8.60 mm. Therefore, Hap9 is the optimal haplotype of grain length, and Hap9 has the largest GL/W, which is an ideal allelic variation related to grain length. In addition, the GL/W of the germplasm corresponding to the nine haplotypes showed long-grain characteristics (the minimum GL/W was 2.52). Therefore, the nine haplotypes of LOC_O09g26970 are helpful to determine the grain length of rice germplasm. Moreover, we selected one SNP from Hap9 as a molecular marker to analyze the individuals with significant differences in grain length in the BC 3 F 3 population and found that SNP10 could be used as a target for marker grain length. Next, in India-japonica hybrid breeding, the selection of Hap9 can not only retain the excellent quality traits of japonica rice, but also help to improve its grain length. The KASP marker used in the study is a marker that is closely linked to GL, and it can be directly used for molecular-assisted selection. In addition, this locus can be inherited by offspring by inter-japonica hybridization, and long-grain varieties can be directly selected by conventional breeding methods.

Plant Materials
In this study, two japonica varieties, short-grain female parent SJ15 and long-grain male parent Pin20, were used as parental lines to develop 667 F 2 individuals and the corresponding F 2:3 population. The F 2 population was planted during the normal growing season (from April to October) and the mature seeds were harvested subsequently. All 667 F 2:3 lines were used for grain type identification after maturity. To fine map the target gene, one F 2 individual plant with long grains was selected to obtain BC 3 F 1 seeds by backcrossing with SJ15, and a BC 3 F 1 individual plant with a long-grain phenotype was selfcrossed to generate the BC 3 F 2 (725 individuals) and BC 3 F 3 (1570 individuals) populations. The 667 F 2:3 individuals were used for QTL-seq and QTL mapping, and BC 3 F 2 and BC 3 F 3 were used to fine-map the qGL9.1 candidate gene. All of the lines and their parents were planted at the Northeast Agricultural University experimental station (Heilongjiang Province, China; 47 • 98 N, 128 • 08 E; 128 m above sea level).

Evaluation of Grain Type for Rice
The grain size of the F 2:3 and BC 3 F 3 populations was investigated when the rice was fully mature. We collected all of the spikes of each line in envelopes, placed them in natural light to dry, and then put them in an oven at 37 • C for one week. Three main spikes of each line, with approximately the same appearance, were randomly selected and used to measure the spike length and spike grain number. The grain length (GL) and grain width (GW) of 10 seeds of each line were measured with vernier calipers and the ratio of GL to GW was calculated. The phenotypic data for each line were measured in three replicates, and their average was used for data statistics.

Construction of Segregating Pools and Whole-Genome Re-Sequencing
Young leaves from 667 individuals of the F 2 population were collected separately for total genomic DNA extraction using a modified cetyltrimethylammonium bromide (CTAB) method [40]. Then, the genomic DNA of 30 extremely GL-type and 30 extremely GW-type individuals were selected as two bulked pools. To simplify the following description, we abbreviated the GL-type DNA pool as GL-pool, and the GW-type DNA pool as GWpool. For GL-pool, GW-pool, and the two parents, isolated DNA was quantified using a Nanodrop 2000 spectrophotometer (Thermo Scientific, Fremont, CA, USA). All DNA from the GL-pool and GW-pool was quantified at precise concentrations with a Qubit ® 2.0 Fluorometer (Life Technologies, Carlsbad, CA, USA). Equal amounts of DNA from the GL-pool and GW-pool plants were mixed. The four DNA libraries were sequenced on the Illumina MiSeq platform using the MiSeq Reagent Kit v2 (500 cycles) (Illumina Inc., San Diego, CA, USA).

QTL-Seq Analysis
The raw sequencing data were filtered using an internal Perl script, provided by Biomarker Technology Co. Ltd. (Beijing, China). These high-quality data were then mapped to the Nipponbare-Reference-IRGSP-1.0 [41] using the Burrows-Wheeler aligner [42]. Using the Picard tool (https://sourceforge.net/projects/picard/) (accessed on 8 June 2021), repeat reads were removed based on the clean reads located in the reference genome. The SNP and InDel (1-5 bp) calling was realized with GATK [43], using the default settings. A series of filters were also used to obtain highly accurate SNP and InDel sets [44]. The association analysis was performed using the ED [45], calculation of the G statistic [46,47], and two-tailed Fisher's exact test [48] based on SNP. Finally, the overlapping interval of the three methods was used as the QTL interval.

Further Mapping of the qGL9.1
To further delimit the position of qGL9.1, we developed KASP markers linked to the qGL9.1 interval, and then KASP marker primers were designed with Primer 5 software (Premier Biosoft International, Corina Way, Palo Alto, CA, USA) based on the re-sequencing data of the two parents. The 5 end of each KASP marker forward primer was ligated with FAM (5 -GAAGGTGACCAAGTTCATGCT-3 ) and HEX (5 -GAAGGTCGGAGTCAACGGATT-3 ) linker sequences. All polymorphic markers between the parents were selected and 667 F 2 individuals were genotyped using polymorphic markers to construct linkage maps and narrow down the candidate regions using the inclusive composite interval mapping (ICIM) module of QTL IciMapping 4.2. (http://www.isbreeding.net) (accessed on 19 June 2023) and are listed in Supplementary Table S2. The threshold of the LOD score for declaring the presence of a significant QTL was determined by a permutation test with 1000 repetitions at p < 0.001. Then, 725 BC 3 F 2 individuals and 1570 BC 3 F 3 individuals were used to screen the recombinants across the kompetitive allele-specific PCR (KASP) markers between the target regions. Each KASP marker contained two allele-specific forward primers and one common reverse primer. The reaction mixture was prepared according to the protocol described by KBiosciences (http://www.ksre.ksu.edu/igenomics) (accessed on 19 June 2023). All of the KASP primers are listed in Supplementary Table S2. 4.6. Fine Mapping and Candidate Gene Screening of qGL9.1 To fine map qGL9.1, the plants with interval heterozygous qGL9.1 in the BC 3 F 2 population were identified by the KASP marker, and the BC 3 F 3 secondary population was obtained by selfing. The BC 3 F 3 population was genotyped, and the recombinant plants were screened to achieve fine-mapped qGL9.1. The main methods of mining candidate genes were as follows: (1) Ensembl (http://ensemblgenomes.org/) (accessed on 19 June 2023) was used to annotate the candidate genes, and the possible domains of candidate genes were detected by the Pfam database (http://pfam.xfam.org/) (accessed on 19 June 2023).
(2) Mutant genes were screened according to sequencing information. (3) The genes with sequence variation were analyzed by using qRT-PCR. When the young panicles began to differentiate, those of Pin 20 and SJ15 were sampled at the lengths of 2cm, 5cm, 7cm, and 13cm. The expression characteristics of the candidate genes between the parents were analyzed by using qRT-PCR.
The total RNA of the rice was extracted according to the steps of the GeneCopoeia-BlazeTaq™ SYBR ® Green qPCR Mix 2.0 extraction kit, and RNA purification and reverse transcription were carried out according to the steps of SIMGEN of Hangzhou Xinjing Biological Reagent Co., Ltd (No.8, Xiyuan 1st Road, Xihu District, Hangzhou, Zhejiang, China). Amplification was performed with a Roche LightCycler96 fluorescence quantitative PCR instrument at Northeastern Agricultural University. According to the transcription sequence of the gene, the specific primers of the candidate gene were designed with Premier 5.0 software, and the sequence is shown in Supplementary Table S10. The original Actin1 in the rice was used as the internal reference [49], and the specificity of the primers was based on the standard melting curve. Three replicates were set for each sample, and the relative expression of genes in tissues was calculated using the the 2-∆∆Ct method. qRT-PCR analysis was performed as previously described [50].

Haplotype Analysis of Candidate Genes
According to the RFGB database (Haplotype analysis module of https://www.rmbreeding. cn/index.php (accessed on 19 June 2023)), the differential bases in the coding region of candidate genes between parents were searched, and the haplotypes of these differential bases in 3010 rice varieties and the variation of each haplotype in the different rice germplasms were analyzed. The phenotypic data of the grain length, grain width, and aspect ratio in the RFGB database and their genomic information were used to analyze the differences between the different haplotypes of the candidate genes.

Development of KASP Markers and Validation of GL
To verify the above-identified LOC_Os09g26970 with GL potential, two non-synonymous SNPs (nSNPs) were screened from the exons of LOC_Os09g26970, and the corresponding KASP markers were developed. The upstream and downstream 100-bp sequences of the target nSNPs were extracted from the Nipponbare genome sequence. Each KASP marker contained two allele-specific forward primers and a common reverse primer. The reaction mixture was prepared according to the instructions of KBiosciences (http://www.ksre.ksu.edu/igenomics (accessed on 19 June 2023)), and the KASP primers are shown in Supplementary Table S2.

Conclusions
In this study, we used F 2 and BC 3 F 3 populations to identify a major QTL qGL9.1 controlling rice grain length from long-grain variety Pin20 by re-sequencing and fine mapping. Furthermore, combined with functional annotation, variation detection, and qRT-PCR analysis, the gene LOC_Os09g26970 encoding a P450 protein was identified as a candidate gene for qGL9.1. LOC_Os09g26970 and was divided into nine haplotypes in 3010 rice germplasm, and Hap9, which was consistent with the genotype of Pin20, contributed the most to the grain length among all of the haplotypes. In summary, we found a new grain length gene in early-maturing japonica rice in the northernmost part of China, and the molecular breeding application of this gene will hopefully assist in tackling the difficult situation of improving the yield of early-maturing japonica rice.