WB1, a Regulator of Endosperm Development in Rice, Is Identified by a Modified MutMap Method

Abnormally developed endosperm strongly affects rice (Oryza sativa) appearance quality and grain weight. Endosperm formation is a complex process, and although many enzymes and related regulators have been identified, many other related factors remain largely unknown. Here, we report the isolation and characterization of a recessive mutation of White Belly 1 (WB1), which regulates rice endosperm development, using a modified MutMap method in the rice mutant wb1. The wb1 mutant develops a white-belly endosperm and abnormal starch granules in the inner portion of white grains. Representative of the white-belly phenotype, grains of wb1 showed a higher grain chalkiness rate and degree and a lower 1000-grain weight (decreased by ~34%), in comparison with that of Wild Type (WT). The contents of amylose and amylopectin in wb1 significantly decreased, and its physical properties were also altered. We adopted the modified MutMap method to identify 2.52 Mb candidate regions with a high specificity, where we detected 275 SNPs in chromosome 4. Finally, we identified 19 SNPs at 12 candidate genes. Transcript levels analysis of all candidate genes showed that WB1 (Os04t0413500), encoding a cell-wall invertase, was the most probable cause of white-belly endosperm phenotype. Switching off WB1 with the CRISPR/cas9 system in Japonica cv. Nipponbare demonstrates that WB1 regulates endosperm development and that different mutations of WB1 disrupt its biological function. All of these results taken together suggest that the wb1 mutant is controlled by the mutation of WB1, and that the modified MutMap method is feasible to identify mutant genes, and could promote genetic improvement in rice.

Several methods are currently used for gene isolation. The most commonly used one is positional cloning (map-based cloning), by which many rice genes were isolated. However, map-based cloning is more time-and labor-intensive for isolating genes, especially QTLs. Therefore, many researchers have been exploring new methods for genes isolation. MutMap (Figure 1a) [33], based on next-generation sequencing (NGS) [34], is a recently developed method of rapid gene isolation. The MutMap method has been used to isolate some rice genes, including OsRR22, a gene responsible for the salinity-tolerant phenotype of hst1 [35] and Pii, a gene enhancing rice blast resistance [36]. Several methods are currently used for gene isolation. The most commonly used one is positional cloning (map-based cloning), by which many rice genes were isolated. However, map-based cloning is more time-and labor-intensive for isolating genes, especially QTLs. Therefore, many researchers have been exploring new methods for genes isolation. MutMap (Figure 1a) [33], based on next-generation sequencing (NGS) [34], is a recently developed method of rapid gene isolation. The MutMap method has been used to isolate some rice genes, including OsRR22, a gene responsible for the salinity-tolerant phenotype of hst1 [35] and Pii, a gene enhancing rice blast resistance [36]. method applied to rice following the protocol described as previously reported [33]; (b) The scheme of gene mapping used in this study. The BC1F2:3 progeny formed mapping population. DNA of 50 recessive and 50 dominant plants from mapping population are mixed separately in an equal ratio to form the DNA Pool (A) and Pool (B) followed by the construction of DNA library and Illumina sequencing with 30×coverage, and then the treated sequencing data were aligned with the reference sequence followed by single nucleotide polymorphisms (SNP) calling. The reference sequence is the publicly available Nipponbare rice genome sequence [37]. For each identified SNP, SNP index (A) was obtained from Pool (A) and SNP index (B) corresponds with Pool (B). SNP index (A) minus SNP index (B) is ∆ (SNP index) which is used for Manhattan plot, and we can obtain candidate region followed by SNP annotation. The pink color represents the different steps compared to the common scheme of MutMap. EMS: Ethane methyl sulfonate.
Although the understanding of the molecular mechanisms of the formation of rice endosperm has made great progress, rice endosperm is a very complex agronomic trait. Hence, it is still necessary to identify more functional genes and to describe their molecular mechanisms in order to enable systematic and comprehensive understanding of the inheritance of rice endosperm formation. In this study, we isolated WB1, which controlled rice endosperm development, via the modified MutMap method, and found that WB1 played an important role in the regulation of starch synthesis during rice endosperm development. We also verified the target gene by CRISPR/Cas9 Figure 1. The steps of MutMap method applied to rice. (a) Common scheme of MutMap method applied to rice following the protocol described as previously reported [33]; (b) The scheme of gene mapping used in this study. The BC 1 F 2:3 progeny formed mapping population. DNA of 50 recessive and 50 dominant plants from mapping population are mixed separately in an equal ratio to form the DNA Pool (A) and Pool (B) followed by the construction of DNA library and Illumina sequencing with 30×coverage, and then the treated sequencing data were aligned with the reference sequence followed by single nucleotide polymorphisms (SNP) calling. The reference sequence is the publicly available Nipponbare rice genome sequence [37]. For each identified SNP, SNP index (A) was obtained from Pool (A) and SNP index (B) corresponds with Pool (B). SNP index (A) minus SNP index (B) is ∆ (SNP index) which is used for Manhattan plot, and we can obtain candidate region followed by SNP annotation. The pink color represents the different steps compared to the common scheme of MutMap. EMS: Ethane methyl sulfonate.
Although the understanding of the molecular mechanisms of the formation of rice endosperm has made great progress, rice endosperm is a very complex agronomic trait. Hence, it is still necessary to identify more functional genes and to describe their molecular mechanisms in order to enable systematic and comprehensive understanding of the inheritance of rice endosperm formation. In this study, we isolated WB1, which controlled rice endosperm development, via the modified MutMap method, and found that WB1 played an important role in the regulation of starch synthesis during rice endosperm development. We also verified the target gene by CRISPR/Cas9 system. Our study also played a crucial role in explaining the molecular mechanisms of the formation of rice endosperm and the exploration of new methods for gene mapping.

Phenotypic Characterization and Genetic Analysis of the wb1 Mutant
To identify new regulators of endosperm development, we recovered an endosperm development defective mutant named wb1 from a mutant pool (in the Japonica variety ChangLiGeng background). The wb1 mutant showed no apparent differences from WT throughout the vegetative stage. Plant height and the number of panicles per plant of wb1 plants were similar to those of the WT at the mature stage. The number of spikelets per panicle, number of grains per panicle, seed-setting rate and 1000-grain weight of wb1 were all showed a marked decreased compared with those of the WT (Table S1). Unlike WT, the glume of wb1 grains showed brown color (Figure 2a-c), and wb1 displayed markedly more grain chalkiness in the grain belly (Figure 2d,e). Grain chalkiness rate and grain chalkiness degree were 94.8% and 47.6% in wb1 grains, while those of WT grains were 2.8% and 0.6% (Figure 2i,j). Scanning electron microscope images clearly indicated that the endosperm from grains of wb1 developed abnormally as a result of loosely packed, spherical starch granules, in contrast to the densely packed, irregularly polyhedral starch granules of the normal endosperm from the grains of WT (Figure 2f,g). Grain size measurements showed that the seed length, width, and thickness were all significantly reduced in wb1 grains (Figure 2h), resulting in a smaller grain size than that of WT, even occasionally in a shriveled phenotype. We also measured amylose and amylopectin content of the mature grains of wb1 and WT. Amylose and amylopectin content were remarkably decreased in wb1 grains (Figure 2k,l), suggesting that the starch accumulation in wb1 grains was severely disrupted. All these results collectively reveal that mutation of WB1 caused a defect in the endosperm development, which led to the higher grain chalkiness degree and a significant reduction of 1000-grain weight in wb1 grains. We also analyzed physical properties of wb1 and WT grains, including gel consistency (Figure 2n), brown rice rate (Figure 2o), milled rice rate ( Figure 2p) and head rice rate (Figure 2q). Each of these was significantly reduced in the mutant, suggesting that dramatic physical changes have occurred in the wb1 grains, which will further affect rice processing and eating quality.
system. Our study also played a crucial role in explaining the molecular mechanisms of the formation of rice endosperm and the exploration of new methods for gene mapping.

Phenotypic Characterization and Genetic Analysis of the wb1 Mutant
To identify new regulators of endosperm development, we recovered an endosperm development defective mutant named wb1 from a mutant pool (in the Japonica variety ChangLiGeng background). The wb1 mutant showed no apparent differences from WT throughout the vegetative stage. Plant height and the number of panicles per plant of wb1 plants were similar to those of the WT at the mature stage. The number of spikelets per panicle, number of grains per panicle, seed-setting rate and 1000-grain weight of wb1 were all showed a marked decreased compared with those of the WT (Table S1). Unlike WT, the glume of wb1 grains showed brown color (Figure 2a-c), and wb1 displayed markedly more grain chalkiness in the grain belly (Figure 2d,e). Grain chalkiness rate and grain chalkiness degree were 94.8% and 47.6% in wb1 grains, while those of WT grains were 2.8% and 0.6% (Figure 2i,j). Scanning electron microscope images clearly indicated that the endosperm from grains of wb1 developed abnormally as a result of loosely packed, spherical starch granules, in contrast to the densely packed, irregularly polyhedral starch granules of the normal endosperm from the grains of WT (Figure 2f,g). Grain size measurements showed that the seed length, width, and thickness were all significantly reduced in wb1 grains (Figure 2h), resulting in a smaller grain size than that of WT, even occasionally in a shriveled phenotype. We also measured amylose and amylopectin content of the mature grains of wb1 and WT. Amylose and amylopectin content were remarkably decreased in wb1 grains (Figure 2k,l), suggesting that the starch accumulation in wb1 grains was severely disrupted. All these results collectively reveal that mutation of WB1 caused a defect in the endosperm development, which led to the higher grain chalkiness degree and a significant reduction of 1000-grain weight in wb1 grains. We also analyzed physical properties of wb1 and WT grains, including gel consistency (Figure 2n), brown rice rate (Figure 2o), milled rice rate ( Figure 2p) and head rice rate (Figure 2q). Each of these was significantly reduced in the mutant, suggesting that dramatic physical changes have occurred in the wb1 grains, which will further affect rice processing and eating quality.
To verify that this locus associated with the wb1 phenotype was controlled by a single recessive gene, genetic analysis was conducted to examine the phenotype of all plants from BC1F1 and F1 progeny, and of 1087 and 1000 plants from BC1F2 and F2 progeny, respectively. The results showed that BC1F1 and F1 seeds exhibited the wild-type phenotype, while the segregation model of normal to chalky grains fitted well to the expected ratio of a single inheritance, 3:1 (820:267, 745:255), in the BC1F2 and F2 progeny (Table S2). The asterisks represent statistical significance between WT and wb1, determined by a student's t-test (** p ≤ 0.01). Scale bars: (a-e) 10 mm; (f,g) 30 μm.

Candidate Region of the WB1 Gene Obtained through the Modified MutMap Method
A modified MutMap method (Figure 1b) was applied to isolate the WB1 gene. After re-sequencing for Pool A and Pool B, we obtained 125,252,285 (SRA accession SRP135580) and 120,484,878 (SRA accession SRP135578) cleaned reads for Pool A and Pool B, respectively, corresponding to >20 Gb of total read length with 30× coverage of the rice genome (370 Mb; Table  S3). After these cleaned reads were aligned separately to the Nipponbare reference sequence by the BWA software, we obtained 110,119,455 and 105,488,179 unique mapped reads for Pool A and Pool B, respectively, corresponding to 87.55% and 87.92% coverage of the rice genome (Table S4). Then we calculated ∆ (SNP indices) or Fst value based on the sliding window of the whole genome scan following by plotting the ∆ (SNP indices) for all 12 chromosomes of rice ( Figure 3b). As we expected, ∆ (SNP indices) were distributed randomly around 0 for most parts of the genome (Figure 3b). Finally, we obtained the candidate region of 2.52 Mb (Figure 3b). (f,g) Scanning electron microscope images of WT (f) and wb1 (g) seed endosperm. Magnification, ×2000; (h) Measurements of seed length, width, and thickness of WT and wb1 grains (n = 20); (i) Grain chalkiness rate comparison of WT and wb1 grains (n = 6), and grain chalkiness rate is the rate of chalky grains in total grains; (j) Grain chalkiness degree comparison of WT and wb1 grains (n = 6), and grain chalkiness degree is the grain chalkiness rate multiplied by grain chalkiness area (the percent area of chalk in a grain); (k) Amylose content comparison of WT and wb1 grains (0.05 g grain powder each, n = 3); (l) Amylopectin content comparison of WT and wb1 grains (0.01 g grain powder each, n = 3); (m) Comparison of 1000-grain weight of WT and wb1 (n = 10); (n) Gel consistency comparison of WT and wb1 grains (n = 4); (o,p,q) Comparisons of brown rice, milled rice and head rice rates of WT and wb1 (25 g paddy each, n = 3). Data are given as means ± SD (standard deviation). The asterisks represent statistical significance between WT and wb1, determined by a student's t-test (** p ≤ 0.01). Scale bars: (a-e) 10 mm; (f,g) 30 µm.
To verify that this locus associated with the wb1 phenotype was controlled by a single recessive gene, genetic analysis was conducted to examine the phenotype of all plants from BC 1 F 1 and F 1 progeny, and of 1087 and 1000 plants from BC 1 F 2 and F 2 progeny, respectively. The results showed that BC 1 F 1 and F 1 seeds exhibited the wild-type phenotype, while the segregation model of normal to chalky grains fitted well to the expected ratio of a single inheritance, 3:1 (820:267, 745:255), in the BC 1 F 2 and F 2 progeny (Table S2).

Candidate Region of the WB1 Gene Obtained through the Modified MutMap Method
A modified MutMap method (Figure 1b) was applied to isolate the WB1 gene. After re-sequencing for Pool A and Pool B, we obtained 125,252,285 (SRA accession SRP135580) and 120,484,878 (SRA accession SRP135578) cleaned reads for Pool A and Pool B, respectively, corresponding to >20 Gb of total read length with 30× coverage of the rice genome (370 Mb; Table S3). After these cleaned reads were aligned separately to the Nipponbare reference sequence by the BWA software, we obtained 110,119,455 and 105,488,179 unique mapped reads for Pool A and Pool B, respectively, corresponding to 87.55% and 87.92% coverage of the rice genome (Table S4). Then we calculated ∆ (SNP indices) or Fst value based on the sliding window of the whole genome scan following by plotting the ∆ (SNP indices) for all 12 chromosomes of rice ( Figure 3b). As we expected, ∆ (SNP indices) were distributed randomly around 0 for most parts of the genome (Figure 3b). Finally, we obtained the candidate region of 2.52 Mb (Figure 3b)  Fst value, defined as the proportion of genetic diversity due to allele frequency differences among populations described by the previous report [38]. ∆ (SNP indices) and Fst values have the same meaning in this study.

Screening the SNPs Detected in the Candidate Region
From the ∆ (SNP indices) plot (Figure 3b), we obtained the candidate region of 2.52 Mb where we detected 275 SNPs in chromosome 4 followed by gene annotation (Table S5). To identify the true Fst value, defined as the proportion of genetic diversity due to allele frequency differences among populations described by the previous report [38]. ∆ (SNP indices) and Fst values have the same meaning in this study.

Screening the SNPs Detected in the Candidate Region
From the ∆ (SNP indices) plot (Figure 3b), we obtained the candidate region of 2.52 Mb where we detected 275 SNPs in chromosome 4 followed by gene annotation (Table S5). To identify the true causal SNP, we screened these SNPs with three steps: (i) retaining the SNPs in which ∆ (SNP indices) ranged from 0.6 to 0.8; (ii) removing SNPs located in the intergenic region and SNPs which resulted in synonymous substitutions; and (iii) detecting the SNPs between WT and wb1 by sequencing. As a result, we obtained nineteen SNPs, which were located in twelve candidate genes (Table 1).

Identification of the Casual SNP
We detected the expression levels of twelve candidate genes in endosperm tissues at various development stages (5, 10, 15 and 20 DAF) ( Figure 4). We successfully detected all genes transcript levels except for that of ORF8. ORF6 maintained relatively high expression level in comparison with other genes and its transcript level changed significantly during the four stages of endosperm development between WT and wb1 ( Figure 4). Although some genes demonstrated higher transcript levels at the DAF15 and DAF20 stages (Figure 4c,d) compared with the DAF5 and DAF10 stages (Figure 4a,b), and the transcript levels of several genes were also significantly altered between WT and wb1 (Figure 4), all other genes showed low expression levels on the whole in contrast to the transcript levels of ORF6. Therefore, we may conclude that the mutation of ORF6 (Os04t0413500 or Os04g33740) played a major role in the defect of wb1.
SNP-20423829 were G to A transitions, presumably caused by EMS mutagenesis [39], and it was located at the site 1659 bp of the third exon of ORF6 encoding a glycosyl hydrolase. This SNP led to an A159T mutation (codon GCG to ACG; Figure 5a,b). Moreover, results of digestion of restriction endonuclease Hae II confirmed this mutant site (Figure 5c). Accordingly, we hypothesized that wb1 was caused by a missense substitution in ORF6. We also found that ORF6 was a novel allele of GIF1 (Os04g33740) which controlled rice grain filling and yield [14]. development stages (5, 10, 15 and 20 DAF) (Figure 4). We successfully detected all genes transcript levels except for that of ORF8. ORF6 maintained relatively high expression level in comparison with other genes and its transcript level changed significantly during the four stages of endosperm development between WT and wb1 ( Figure 4). Although some genes demonstrated higher transcript levels at the DAF15 and DAF20 stages (Figure 4c,d) compared with the DAF5 and DAF10 stages (Figure 4a,b), and the transcript levels of several genes were also significantly altered between WT and wb1 (Figure 4), all other genes showed low expression levels on the whole in contrast to the transcript levels of ORF6. Therefore, we may conclude that the mutation of ORF6 (Os04t0413500 or Os04g33740) played a major role in the defect of wb1.  SNP-20423829 were G to A transitions, presumably caused by EMS mutagenesis [39], and it was located at the site 1659 bp of the third exon of ORF6 encoding a glycosyl hydrolase. This SNP led to an A159T mutation (codon GCG to ACG; Figure 5a,b). Moreover, results of digestion of restriction endonuclease Hae II confirmed this mutant site (Figure 5c). Accordingly, we hypothesized that wb1 was caused by a missense substitution in ORF6. We also found that ORF6 was a novel allele of GIF1 (Os04g33740) which controlled rice grain filling and yield [14]. (c) Digestion of restriction endonuclease HaeII. "Before" represents the non-treated PCR product and "After" represents the HaeII-treated PCR product.

Function Verification of the WB1 Gene (ORF6) through the CRISPR/Cas9 System in Reverse
To verify our hypothesis, we created six novel alleles of WB1 through the CRISPR/Cas9 system in Japonica cultivar Nipponbare (NPB). We found two target sequences in the third exon of WB1 corresponding to CRISPR/Cas9 system and obtained six different mutants in T1 lines (Figure 6a). Grains of six mutants displayed brown glumes and grain chalkiness in the grain belly compared (c) Digestion of restriction endonuclease HaeII. "Before" represents the non-treated PCR product and "After" represents the HaeII-treated PCR product.

Function Verification of the WB1 Gene (ORF6) through the CRISPR/Cas9 System in Reverse
To verify our hypothesis, we created six novel alleles of WB1 through the CRISPR/Cas9 system in Japonica cultivar Nipponbare (NPB). We found two target sequences in the third exon of WB1 corresponding to CRISPR/Cas9 system and obtained six different mutants in T1 lines (Figure 6a). Grains of six mutants displayed brown glumes and grain chalkiness in the grain belly compared with the common grains of NPB (Figure 6b). SEM images distinctly showed that endosperm of six mutants developed abnormally compared to the normal endosperm of NPB (Figure 6b). The phenotypes of six mutants were similar to that of wb1 (Figure 2b-g). Those results further proved that WB1 was the target gene responsible for the wb1 phenotype.
In NPB and six mutant lines, we measured 1000-grain weight (Figure 6h) and the main factors affecting 1000-grain weight, including grain length (Figure 6c), width (Figure 6d), and thickness (Figure 6e), grain chalkiness rate ( Figure 6f) and degree (Figure 6g). Duncan's test indicated that the grain chalkiness rate degree were the major factors causing the significant reduction of 1000-grain weight of six mutant lines. The differences in grain length, width, and thickness between six mutant lines and NPB were not similar to the differences between WT and wb1 (Figure 2h). This discrepancy was probably caused by the longer grain length of WT (~9.3 mm, Figure 2h) compared with that of NPB (~6.6 mm, Figure 6c) and the different mutations of WB1 between wb1 (Figure 5a) and the six mutant lines (Figure 6a).
Interestingly, some differences were also found among the six mutant lines (Figure 6c-h). Those results suggested that the grain chalkiness rate and grain chalkiness degree collectively determined the 1000-grain weight (Figure 6f-h), especially in the mutant line nc-3, where grain chalkiness rate and grain chalkiness degree showed significant decreases as compared to the other mutant lines, corresponding to its higher 1000-grain weight. To test whether the differences in the grain chalkiness rate and grain chalkiness degree among the six mutant lines were caused by different mutations of WB1, we performed multiple comparison of WB1 sequences by MEGA 5.0 software (Figure 7). The findings indicated that the six mutant lines showed different mutations which disrupted the substrate binding site and the active site of WB1, except in the mutant line nc-5 (Figure 7).

Expression Analysis of Starch Metabolism-Related Genes in Endosperm
We performed qPCR analysis of total RNA extracted from the seed endosperm of WT and wb1 at various stages (DAF5, DAF10, DAF15, and DAF20) and detected the transcript levels of some genes involved in starch synthesis. As shown in Figure 8, transcript levels of those genes were all altered during development of the rice endosperm. During the critical stages (DAF10 and DAF15) of grain filling, transcript levels of all genes showed a striking contrast between WT and wb1. This suggests that altered transcript levels of starch synthesis-related genes are probably involved in the abnormal development of rice endosperm. The higher transcript levels of WB1, OsAPS1, OsAPL1, OsPPDKB, and FLO6 at the mature stage (DAF20) in the wb1 mutant were probably caused by different maturity of seeds between WT and wb1. Interestingly, some differences were also found among the six mutant lines (Figure 6c-h). Those results suggested that the grain chalkiness rate and grain chalkiness degree collectively determined the 1000-grain weight (Figure 6f-h), especially in the mutant line nc-3, where grain chalkiness rate and grain chalkiness degree showed significant decreases as compared to the other mutant lines, corresponding to its higher 1000-grain weight. To test whether the differences in the grain chalkiness rate and grain chalkiness degree among the six mutant lines were caused by different mutations of WB1, we performed multiple comparison of WB1 sequences by MEGA 5.0 software (Figure 7). The

Expression Analysis of Starch Metabolism-Related Genes in Endosperm
We performed qPCR analysis of total RNA extracted from the seed endosperm of WT and wb1 at various stages (DAF5, DAF10, DAF15, and DAF20) and detected the transcript levels of some genes involved in starch synthesis. As shown in Figure 8, transcript levels of those genes were all altered during development of the rice endosperm. During the critical stages (DAF10 and DAF15) of grain filling, transcript levels of all genes showed a striking contrast between WT and wb1. This suggests that altered transcript levels of starch synthesis-related genes are probably involved in the abnormal development of rice endosperm. The higher transcript levels of WB1, OsAPS1, OsAPL1, OsPPDKB, and FLO6 at the mature stage (DAF20) in the wb1 mutant were probably caused by different maturity of seeds between WT and wb1.

WB1 Controls Rice Endosperm Development
Grain chalkiness is one of the most important factors leading to low grain weight [15] and affecting rice appearance and milling, cooking, and eating quality [40,41]. Grain chalkiness is controlled by complex quantitative trait loci and by climatic conditions during rice grain filling, especially high temperature [42]. Chalk5, a major quantitative trait locus controlling grain chalkiness, and affecting head rice rate, is the only one that has been identified and characterized up to now. Chalk5 is involved in the biogenesis of protein bodies in the endosperm cells [15]. Grain chalkiness can be an indicator of abnormally developed endosperm [2][3][4][5][6][7][8]. The major component of rice endosperm is a starch that is mainly composed of amylose and amylopectin. Many genes directly involved in the biosynthesis of amylose and amylopectin in rice endosperm cells have been identified and characterized, such as Waxy, SSI, SSIIa, OsSSIIIa, ISA1, OsBEIIb and OsPPDKB [23][24][25][26][27][28]30]. Loss of function of these genes can result in an abnormally developed endosperm, displaying more grain chalkiness and low grain weight.
Sucrose is produced by the source organ or photosynthetic organ and used as a carbon source for starch biosynthesis in endosperm cells. Sucrose must be transported from source organs into sink organs, which occurs via apoplast and/or sympast. Accordingly, in addition to the genes directly involved in starch biosynthesis of endosperm cells, other genes involved in this process can also affect endosperm development. In the apoplastic pathway, sucrose can be converted by cell-wall invertases into glucose and fructose, which are transported into cells by hexose transporters. Sucrose

WB1 Controls Rice Endosperm Development
Grain chalkiness is one of the most important factors leading to low grain weight [15] and affecting rice appearance and milling, cooking, and eating quality [40,41]. Grain chalkiness is controlled by complex quantitative trait loci and by climatic conditions during rice grain filling, especially high temperature [42]. Chalk5, a major quantitative trait locus controlling grain chalkiness, and affecting head rice rate, is the only one that has been identified and characterized up to now. Chalk5 is involved in the biogenesis of protein bodies in the endosperm cells [15]. Grain chalkiness can be an indicator of abnormally developed endosperm [2][3][4][5][6][7][8]. The major component of rice endosperm is a starch that is mainly composed of amylose and amylopectin. Many genes directly involved in the biosynthesis of amylose and amylopectin in rice endosperm cells have been identified and characterized, such as Waxy, SSI, SSIIa, OsSSIIIa, ISA1, OsBEIIb and OsPPDKB [23][24][25][26][27][28]30]. Loss of function of these genes can result in an abnormally developed endosperm, displaying more grain chalkiness and low grain weight.
Sucrose is produced by the source organ or photosynthetic organ and used as a carbon source for starch biosynthesis in endosperm cells. Sucrose must be transported from source organs into sink organs, which occurs via apoplast and/or sympast. Accordingly, in addition to the genes directly involved in starch biosynthesis of endosperm cells, other genes involved in this process can also affect endosperm development. In the apoplastic pathway, sucrose can be converted by cell-wall invertases into glucose and fructose, which are transported into cells by hexose transporters. Sucrose can also be directly taken by sucrose transporters into sink cells where it is hydrolyzed into glucose and fructose by sucrose hydrolases, including SUS2, SUS3, and SUS4 [21]. GIF1 encodes a cell-wall invertase that mainly functions in the hydrolysis and uploading of sucrose during early grain-filling. The gif1 mutant shows slower grain filling,~24% lower final grain weight, and lower contents of amylose and amylopectin, and markedly more grain chalkiness as a result of abnormally developed and loosely packed starch granules [14]. OsSWEET4 encodes a hexose transporter that is responsible for transferring hexoses across the BETL (basal endosperm transfer layer) to sustain of rice endosperm, downstream of a cell-wall invertase. The ossweet4-1 mutant shows incomplete grain filling and significantly decreased grain weight [20]. OsSUT2 encodes a sucrose transporter that functions in sucrose uptake from the vacuole. The ossut2 mutant has significantly decreased sugar export ability and 1000-grain weight [19]. In our study, WB1 encoded the cell-wall invertase 2 (OsCIN2) and was a novel allele of the GIF1 (Os04t0413500 or Os04g33740) gene which controlled rice grain-filling and thus affected rice endosperm development [14]. Due to the WB1 gene mutation ( Figure 5) which led to grain incomplete filling [14], the physical and chemical properties of grain endosperm of the wb1 mutant have been altered, like higher grain chalkiness rate (Figure 2i), higher grain chalkiness degree (Figure 2j), markedly more grain chalkiness as a result of loosely packed, spherical granules (Figure 2e,g), lower contents of amylose and amylopectin (Figure 2k,l), and~30.0% lower 1000-grain weight (Figure 2m). In addition, transcript levels of the starch synthesis-related genes in our study varied during rice endosperm development ( Figure 8). All of these observations suggest that the wb1 mutant exhibits a defect in endosperm development, thus leading to the white-belly endosperm with altered phy-chemical property.

Different Mutations of WB1 Can Disrupt Its Biological Function
Many genes make up a large regulatory framework that regulates life activity in higher plants. These genes encode active proteins that are responsible for the major functions in this global regulatory framework. The function of each of active protein is determined by its own primary, secondary, and tertiary structure. In rice, mutations of a gene can result in its encoding protein with structural alterations which can affect its biological function followed by phenotypic changes. The sd1 gene, well known as the genetic basis for the first "green revolution" in rice, encodes a GA 20-oxidase involved in the GA biosynthesis pathway; the sd1 gene controls the plant height of rice, and mutations (sd1-d, sd1-r, sd1-c, sd1-j) in this locus cause the dwarfism of rice to different degrees [43][44][45].
In our study, six mutants of novel alleles of WB1 displayed the same phenotype as the wb1 mutant (Figures 2b-e and 6b), primarily showing higher chalkiness rates (Figure 6f), higher chalkiness degrees (Figure 6g), and lower 1000-grain weight (Figure 6h). The sequence analysis showed that six different mutations have occurred in the WB1 locus (Figure 6a), leading to alterations of the amino acid sequence of the WB1 protein in different types (Figure 7). In the previous research of the gif1 mutant, the GIF1 gene revealed a 1-nt deletion in the coding region, causing the premature GIF1 protein which disrupt its biological function (incomplete grain-filling) [14]. The WB1 gene of nc-1, nc-3 and nc-6 also caused different premature WB1 proteins with altered substrate binding site and active site (Figure 7). The frame shift mutation of the WB1 gene of nc-2 and nc-4 also disrupted its biological function (Figures 6b and 7). Interestingly, the single amino acid substitution (A159T) of WB1 of wb1, and Proline-161 and Arginine-162 deletion of WB1 of nc-5 led to the dysfunction of WB1 without altered substrate binding site and active site (Figures 2, 6 and 7), suggesting that Alanine-159, Proline-161 and Arginine-162 are required for activity of WB1. Moreover, grain chalkiness degrees and 1000-grain weight showed significant differences among some mutants. However, several mutants showed no differences in grain chalkiness degree and 1000-grain weight (Figure 6g,h); these results suggest that different mutations probably affect the formation of grain chalkiness in different degrees, and further research still needs to be conducted to explain the molecular mechanism. In summary, seven novel alleles (including the wb1 mutant) had different mutations which disrupted their biological functions.

The Modified MutMap Method Applied to Isolate WB1 Is Feasible for Gene Mapping
MutMap is a new method used for gene identification [33]. Using MutMap method, researchers can isolate mutant genes and QTLs rapidly, accurately, and conveniently compared to conventional map-based cloning [33,35,36]. Through the MutMap method, researchers only sequence the DNA pool from recessive individuals of F 2 population based on second-generation sequencing, followed by aligning to the assembled whole-genome sequence of wild-type. The population used for the MutMap method is BC 1 F 2 population, which can show unequivocal segregation between the mutant and wild-type phenotype. Notably, the MutMap method requires assembling the whole-genome sequence of wild type accurately used as the reference sequence.
Previously, many genes have been identified by the MutMap method in rice. OsRR22, responsible for the salinity-tolerance phenotype for the hst1 mutant, has been identified by a MutMap method: the sequence depth and the average coverage of wild-type are 28.7× and 59.9%; the number of BC 1 F 2 individuals is 20 and the sequence depth is 18.4× [35]. Two mutant genes regarding pale green leaf have been identified: the sequence depth and the average coverage of wild-type are >12× and~95.5%; the number of BC 1 F 2 individuals is 20 and the sequence depths are 12.5× and 24.1× [33]. Four mutant genes regarding semi-dwarf phenotype also have been identified: the sequence depth and the average coverage of wild-type are >12× and 82.4%~84.2%; the number of BC 1 F 2 individuals is 20 and the sequence depths are 14.2×~16.6× [33].
The MutMap method is subject to high error rates because of multiple factors, including difficulty in determining the number of F 2 progeny showing the mutant phenotype, the average coverage (depth) of genome sequencing, and classification of phenotypes between wild and mutant phenotype [33,46]. Therefore, the greater the number of F 2 progeny showing the mutant phenotype to be bulked, the deeper the average depth of genome to be sequenced, and more accurate classification of phenotypes between wild and mutant type, the lower the rate of false positives [33].
In our study, a modified MutMap method (Figure 1b) was applied to successfully isolate the WB1 gene related to endosperm development in rice. Compared with the MutMap method, our modified MutMap method has some advantages. Firstly, the individuals of bulked DNA Pools used for sequencing were from BC 1 F 2:3 , which has a more stable genetic background than the BC 1 F 2 population; because of the reduced effect from other gene mutations on the target phenotype, it was easier to distinguish plants between mutant type and wild type; Secondly, the appropriately elevated number of BC 1 F 2:3 individuals (50, Figure 1b) and sequence depth (30×, Figure 1b) ensured relatively high coverage (87.92% and 87.55%; Table S4); Lastly, we sequenced the DNA pools not only from recessive individuals but also from dominant individuals followed by aligning to the reference sequence Nipponbare, respectively; therefore, it was not necessary to sequence and assemble the whole-genome sequence of wild type used as reference sequence. We directly used the Nipponbare genomic sequence as our reference sequence, so that we greatly reduced the costs required for sequencing and assembling reference sequence. Therefore, the WB1 gene mapping result showed higher specificity with the single peak in the chromosome 4 ( Figure 3b) compared to that by the MutMap method [33]. Besides, the modified MutMap method also has a deficiency that is the significant difference in the whole genome sequence between Nipponbare and Wild-type (reference sequence). Delightedly, several rice genome sequences have been published, like indica cultivar 9311 [47], Zhenshan97, Minghui63 [48] and Shuhui498 [49] which can be used as reference sequence directly, expanding the application scope of the modified MutMap method. Overall, the modified MutMap method showed a low error rate, a relatively low cost and a high specificity, and could promote the development of rice genetics.

Plant Materials and Growth Condition
The wb1 (mutant) was initially identified from the ethyl methanesulfonate (EMS)-treated Japonica rice variety ChangLiGeng (CLG, Wild Type) M 2 population. The wb1, as the pollen acceptor, was crossed with WT and ZhongHui8015 (ZH8015, Indica), respectively. The resulting first filial generation (BC 1 F 1 , F 1 ) plant was self-pollinated, and the second generation (BC 1 F 2 , F 2 ) was used as the genetic analysis population. We collected seeds of 100 individuals from BC 1 F 2 population, and then cropped the seeds to obtain 100 pedigrees (BC 1 F 2:3 ) which were used as the mapping population for the modified MutMap method (Figure 1b). All plants were grown in an experimental paddy field at China National Rice Research Institute (Hangzhou, Zhejiang province and Lingshui, Hainan province, in China) under natural open-air condition.

Grain Quality Analysis
Scanning electron microscopy was performed as described previously [50]. Measurements of amylose content of mature grains (0.05 g powder) were conducted by HPSEC-MALLS-RI following the method of Fujita et al. (2003) [51]. Quantitative amylopectin content was determined by processing 0.01 g powder of the mature grain, according to a method from a previous report [52]. Each measurement was repeated three times (n = 3).
The paddy rice of WT and wb1 were dried to moisture content of 12-14% and were maintained at room temperature at least three months before measuring the brown rice rate, milled rice rate, and head rice rate by grain polisher AH001151 (KETT, Tokyo, Japan), performed as Zhou et al. (2015) [53]. Each measurement of 25 g paddy rice was performed with three replicates (n = 3, total 75 g paddy rice). Grain chalkiness rate and grain chalkiness degree were determined using SC-E Rice Quality Inspection and Analysis System (WanShen, Hangzhou, China). The white grains from 12 plants (6 plants from WT and wb1, respectively, n = 6) were used for grain chalkiness rate and degree measurements. Gel consistency was measured (n = 4) following the protocol described in Li et al. (2014) [15].

PCR, RNA Isolation and Real-time Quantitative PCR (qPCR)
PCR amplifications of candidate genes were performed using KOD FX DNA Polymerase (TOYOBO). The PCR product of the reaction of restriction enzyme HaeII digestion was amplified by KOD-Plus-Neo (TOYOBO). The primer pairs designed for this study are listed in Table S6.
Total RNA was prepared from grains of WT and wb1 at 5, 10, 15, and 20 DAF (days after flowering) using the TIANGEN RNAprep Pure Plant Kit (Tiangen Biotech, Beijing, China). The first cDNA strand was synthesized from DNase I-treated RNA using Oligo-dT (18) primers in a 20 µL reaction system based on a SuperScriptIII Reverse Transcriptase Kit (TOYOBO). qPCR was performed on a Roche Light Cycle 480 device using THUNDERBIRD SYBR qPCR Mix (TOYOBO). Each reaction was performed with three replicates (n = 3). The primers used in this analysis are listed in Table S6.

DNA Template Preparation, DNA Library Construction, and Re-Sequencing
Genomic DNA was extracted (large scale) from young leaf tissues following the modified hexadecyl trimethylammonium bromide (CTAB) method [54]. Young leaves (total 5 g, 0.1 g per plant) were obtained from 50 plants displaying the mutant or wild phenotype in the BC 1 F 2:3 population and were used to prepare the pooled genomic DNA (Pool A and Pool B, respectively) which was used for illumina sequencing. The DNA concentration was measured by Nanodrop 2000 spectrophotometer. 1 µg, each for both Pool A and Pool B, of total high-quality pooled DNA samples (1.8 ≤ OD260:OD280 ≤ 2.0) was used for re-sequencing library construction. Two libraries with the target insert size of 300 bp were generated by the Illumina Gnomic DNA sample kit according to the manufacturer's instruction. The quality of two libraries was controlled by qPCR. Two libraries were re-sequenced through the Illumina HiSequation 2500 at the BeiJing Berry Genomics Biotechnology Co., Ltd. (Beijing, China) to generate 125 nt paired-end short sequence reads (raw reads) for each pools.

Re-Sequencing Analysis
The FastQC program was used to evaluate the quality of raw reads (http://www.bioinformatics. babraham.ac.uk/projects/fastqc/).The Illumina paired-end adapters' sequence of raw reads was removed using the FASTX toolkit program (http://hannonlab.cshl.edu/fastx_toolkit/index.html). Removal of low-quality bases (Illumina phred quality score Q < 20) [55] and ≤40 bp of reads was completed using SolexaQA software [56]. The cleaned reads from Pool (A) and Pool (B) have been submitted to the SRA database of NCBI (SRA accessions are SRP135580 and SRP135578, respectively).
The cleaned reads were aligned separately with BWA software (Burrows-Wheeler aligner) [57] to the Nipponbare reference sequence. Alignments were filtered based on the Illumina phred quality score of ≥30, corresponding to 0.1% of the error rate, to obtain the unique mapped reads. Alignment files were converted to SAM files through SAMtools [58], and applied to GATK Pipeline [59] to identify reliable SNPs based on the reference genomic sequence.

Calculation of ∆ (SNP Indices)
Average SNP indices of Pool (A) and Pool (B) were estimated via the sliding window method (sliding window 50 Kb; walking 10 Kb) and the ∆ (SNP indices) Manhattan plot was obtained using a custom script written in R version 3.1.1 (https://www.r-project.org/). According to the MutMap method, SNP index (A) would be 1 for the causal SNP or for closely linked SNPs and 0.5 for unlinked loci for each identified SNP in the whole genome sequence, while the SNP index (B) would be 0.333 (1/3) for the causal SNP or closely linked SNPs and 0.5 for unlinked loci. Therefore, ∆ (SNP index) would be 0.667 (2/3) for the causal or closely-linked SNPs and 0 for unlinked SNPs.

Restriction Endonuclease Digestion Analysis
The restriction endonuclease HaeII site of the target gene was identified using the primer premier 5.0 software (Premier, Ottawa, ON, Canada). Two pairs of primers were designed (W-H for WT and M-H for wb1, see Table S6) to generate 337 bp of DNA fragments by polymerase chain reaction (PCR). These PCR products were used for restriction endonuclease HaeII digestion. A total of 150 µL of this reaction system contained 3 µL HaeII (20 units per 1 µL), 15 µL NEBuffer (1×), 50 µL DNA template (2.5 µg), and 82 µL ddH 2 O. Two reaction systems were incubated at 37 • C for 15 min followed by a 2.0% agarose gel electrophoresis.

Vector Construction for CRISPR/Cas9-Mediated Mutation
Six novel allelic mutants were created in Japonica cv Nipponbare by a CRISPR/Cas-targeted genome editing tool.
To generate pBWA(V)H_cas9i2-CRISPR/Cas9 targeting vector, we used the pBWA(V)H_cas9i2 vector containing codon-optimized Cas9 driven by the 35S promoter, the OsU3 promoter and sgRNA scaffolds, as well as the Cas9 expression backbone vector. The targeting sequence primer pair was ACGTGACCTCATCAACTGGGTGG and AACGTGGCGCTGCCGAGGAACGG. The OsU3 promoter was used to drive the sgRNA expression, and the 35S promoter was used to drive the Cas9 expression. Both the OsU3::gRNA and 35S::Cas9 fragments were cloned into pBWA(V)H_cas9i2 binary vector which was introduced into Agrobacterium strain EHA105. Transformed calli were induced from Nipponbare seeds for Agrobacterium-mediated transformation as previously described [61]. The T 0 transgenic mutant plants regenerated from hygromycin-resistant calli were examined for the presence of transgene using primer pair Cas-seq (Table S6).

Conclusions
Breeding of rice with high quality of appearance and high yield is important for rice cultivation. In this study, we isolate and characterize a candidate recessive gene WB1 that regulates rice endosperm development using a modified MutMap method. The candidate gene WB1 is further verified by CRISPR/Cas9 system. The wb1 mutant, as well as six mutants mediated by CRISPR-Cas9 system, all cause a defect in the endosperm development, which lead to the higher grain chalkiness rate and degree and a significant reduction of 1000-grain weight in comparison with that of wild-type plants.
Relative expression analysis of genes associated with starch synthesis by qPCR also suggests that loss of function of WB1 leads to disorder of starch metabolism-related genes expression, resulting in the abnormal endosperm development. In particular, the modified MutMap method used in this study shows a low error rate, a relatively low cost and a high specificity, and could promote the development of rice genetics. Overall, the gene WB1 involved in rice endosperm development affects rice quality of appearance and yield, and therefore, it can be used by rice breeders through molecular breeding to improve rice quality of appearance and yield in Green Super Rice.