Deciphering Haplotypic Variation and Gene Expression Dynamics Associated with Nutritional and Cooking Quality in Rice

Nutritional quality improvement of rice is the key to ensure global food security. Consequently, enormous efforts have been made to develop genomics and transcriptomics resources for rice. The available omics resources along with the molecular understanding of trait development can be utilized for efficient exploration of genetic resources for breeding programs. In the present study, 80 genes known to regulate the nutritional and cooking quality of rice were extensively studied to understand the haplotypic variability and gene expression dynamics. The haplotypic variability of selected genes were defined using whole-genome re-sequencing data of ~4700 diverse genotypes. The analytical workflow identified 133 deleterious single-nucleotide polymorphisms, which are predicted to affect the gene function. Furthermore, 788 haplotype groups were defined for 80 genes, and the distribution and evolution of these haplotype groups in rice were described. The nucleotide diversity for the selected genes was significantly reduced in cultivated rice as compared with that in wild rice. The utility of the approach was successfully demonstrated by revealing the haplotypic association of chalk5 gene with the varying degree of grain chalkiness. The gene expression atlas was developed for these genes by analyzing RNA-Seq transcriptome profiling data from 102 independent sequence libraries. Subsequently, weighted gene co-expression meta-analyses of 11,726 publicly available RNAseq libraries identified 19 genes as the hub of interactions. The comprehensive analyses of genetic polymorphisms, allelic distribution, and gene expression profiling of key quality traits will help in exploring the most desired haplotype for grain quality improvement. Similarly, the information provided here will be helpful to understand the molecular mechanism involved in the development of nutritional and cooking quality traits in rice.


Introduction
Rice, a staple food in many countries, contributes around 21% to the total per capita calorie intake across the globe [1,2]. Rice is consumed in several forms such as processed and fermented products. Many rice preparation methods depend on the grain size and stickiness of the cooked grains. In addition to being rich in carbohydrates, rice grains contain a low amount of essential micronutrients like calcium, zinc, iron, and phosphorus [3]. Hence, tremendous efforts have been made to improve the palatability and nutrient content breeding pave the way for combining superior haplotypes with high-yielding rice accessions to meet consumer-centric nutritional demands. One of the important outcomes of allele mining and genome sequencing is the development and utilization of sequence-based markers for breeding programs. Several marker assays based on such SNPs have been developed which include KASP (Kompetitive Allele-Specific PCR), TaqMan, SNPlex, Illumina Infinium BeadChip, Affymetrix Axiom [23].
In the present study, we have selected functionally characterized genes that are important for the nutritional and cooking quality-related traits in rice. Utilizing the available re-sequencing data for approximately 4700 rice accessions, we have studied the allelic variations in the key genes and deciphered the haplotype combinations. The utility of this approach and the generated information were demonstrated by showing the association of various haplotypes of genes governing the grain length and chalkiness traits, with different patterns and degrees of grain phenotypes. In addition, we have developed a transcriptomic atlas to explore the gene expression and co-expression networks for the selected traits. We have identified several functionally important SNPs that can be further utilized for the development of marker assays associated with cooking and nutritional quality traits in rice.

Selection of Genes Governing Cooking Quality and Nutritional Value Related Traits
A total of 80 functionally characterized genes governing cooking quality and nutrient content were selected from the literature search and by using tools viz. FunRiceGenes [24] and OGRO (Overview of Functionally Characterized Genes in Rice Online database) on Q-TARO [25] (Supplementary Table S1). The SNP data for selected genes was retrieved for the set of 4726 rice accessions including 595 Indica I (IndI), 465 Indica II (IndII), 913 IndicaIII SNPs from wild Asian rice and cultivated rice accessions were used for diversity analysis. The nucleotide diversity (π), expected nucleotide diversity (θ), and Tajima's D were estimated with a sliding window approach by using TASSEL v5.0 (http://www. maizegenetics.net) (accessed on 1 September 2021) [26]. Diversity analysis of the selected genes in case of wild and cultivated rice was studied using an R-based Pegas package (Population and Evolutionary Genetics Analysis System) (https://cran.r-project.org/web/ packages/pegas/index.html) (accessed on 1 September 2021) and 'APE' (Analyses of Phylogenetics and Evolution) implemented in Ecogems online tool (http://150.109.59.144: 3838/ECOGEMS) (accessed on 1 September 2021).

SNP Variation and Effect Prediction
Variants Effect Predictor tool [27] at Ensembl Plants was utilized to check the impact of variations (SNPs and InDels) on gene function. The distribution of variants was obtained in terms of intron, exon, upstream, downstream, splice region, 3 prime and 5 prime UTR variants. Protein Variation Effect Analyzer (PROVEAN) tool was used to gauge the deleterious effect of SNPs [28]. Prediction of the SNP impact on the biological function was obtained for the amino acid changes and PROVEAN score at threshold −2.5. If SNP scores ≤−2.5, it is predicted to be 'deleterious' to the protein function, whereas values > 2.5 predict 'neutral' effects of sequence variations.

Haplotype Variation in Nutritional Quality-Related Genes
Haplotypes were deduced for sequence variations in the selected genes using the RiceVarMap v2.0 tool [29]. Sequence variations were retrieved within the gene along with 1 kb upstream and downstream of the gene's open reading frame (ORF). The SNPs that caused non-synonymous, missense, splice region, frameshift, start lost, stop gained, or stop gained variations were used for haplotype analysis using RiceVarMap2.0. The 'haplotype' module from RMBreeding, Rice Functional Genomics and Breeding v2.0 was employed for mining associations between SNPs and the phenotypic data [30]. The analysis was conducted against the coding sequence of genes and maximum allele frequency of ≥0.01.

Haplotype Network Analysis
Haplotype network analysis was performed using re-sequencing data available for~4000 rice accessions. The haplotypic networks were developed using the online RiceVarMap v2.0 database (RiceVarMap2 (ncpgr.cn)) (accessed on 4 September 2021). The haplotypes found in more than 10 rice accessions were used to construct the haplotype network and the plots were build using the pegas functions in R package (https://cran.r-project.org/web/packages/ pegas/index.html) (accessed on 4 September 2021). The classification system based on rice isozyme classification was used to group the accessions into nine categories including Indica I, Indica II, Indica III, Indica Intermediate, Aus, Temperate Japonica, Tropical Japonica, Japonica Intermediate, and Intermediate (including aromatic and rest of the accessions).

Gene Expression Dynamics and Co-Expression Network for Grain Quality-Related Traits in Rice
The raw sequencing reads were retrieved in the form of fastq format from the NCBI Sequence Read Archive (SRA) database (https://www.ncbi.nlm.nih.gov/sra) (accessed on 10 September 2021). The retrieved dataset possess RNAseq transcriptomic data from different studies representing 102 independent libraries. The dataset includes transcriptome of different genotypes, seed development stages and different tissues like leaves, inflorescence, seeds, embryo, endosperm [31,32]. The details of the SRA datasets used in this study are provided in Supplementary Table S2. The raw reads were processed based on quality and other parameters and then mapped to the reference genome of rice assembly build 4.0 using CLC genomics Workbench [33]. The expression data in the form of read count for all the rice genes were normalized in the form of Reads Per Kilobase of transcript per Million mapped reads (RPKM). The RPKM values for selected genes were further analyzed and visualized using the tool Multiple Experiment Viewer (MeV-4.9.0) [34]. The data was adjusted with log2 transformation, followed by hierarchical clustering analysis. Euclidean distance metric was further used to cluster genes. Gene-specific spatio-temporal expression across tissues such as leaf blade/sheath, root, stem, inflorescence, anther, pistil, lemma, palea, ovary, embryo, endosperm was obtained at Rice Expression Profile Database (RiceXPro v3.0) [35].
In addition to the 102 RNA-Seq libraries which were processed in-house, recently published 11,726 transcriptome libraries were queried for all the selected 80 genes, which are available at the Plant Public RNA-seq Database (PPRD, http://ipf.sustech.edu.cn/ pub/plantrna/) (accessed on 15 February 2022) [36]. Subsequently, the gene co-expression network for data derived from these libraries was generated. The co-expression network was developed by CoExpNetViz plugin of Cytoscape [37] and visualized using Cytoscape software v3.7.2 [38]. The Pearson product-moment correlation coefficient was used as the correlation method at thresholds 5 and 95 for lower and upper percentile ranks, respectively. Furthermore, Weighted Correlation Network Analysis (WGCNA) v1.68 [39] was utilized to identify functional modules and groups of 80 query genes. RPKM values of these genes calculated from RNAseq experimental data were used as an input for the package. Module size varied for 3 genes, 4 genes, and 5 genes were considered to obtain an optimum number of modules. Module size 3 was selected for further analysis.

Quantitative Real-Time PCR Analysis
To study the expression profile of selected candidate genes, three different tissues including root, stem, and leaves were considered. Different tissues were harvested and immediately flash-frozen in liquid nitrogen and subsequently used for total RNA extraction. Spectrum™ Plant Total RNA Kit was used to extract RNA. The quantity and quality of the total RNA were evaluated using Nanodrop and agarose gel electrophoresis. Subsequently, high-quality RNA samples were used for cDNA synthesis using the RevertAid First Strand cDNA Synthesis Kit (Thermo Scientific™, Waltham, MA, USA). SYBR ® Green Master Mix (Bio-Rad, Hercules, CA, USA) 2× was used for the quantitative real-time PCR (qPCR) reaction. Each reaction comprises of 5 µL SYBR ® Green master mix, 1 µL template cDNA, 2 µL water, and 1 µL of each primer (10 mM). The qPCR thermal profile an initial denaturation at 95 • C for 1 min, followed by 40 cycles of denaturation at 95 • C for 10 s, and annealing at 55 • C for 20 s, and extension at 72 • C for 20 s. The qPCR was performed in 96-well plates. After 40 cycles, a melting curve analysis was performed with a stepwise increase in temperature over 65 • C to 95 • C by an increment of 0.5 • C every 5 s. The mean CT and standard deviations were calculated.

Transcription Factors and Their Interaction with Nutritional Quality-Related Genes
Interactions between transcription factors and 80 genes governing grain quality and nutritional traits were predicted using PlantRegMap (Plant Transcriptional Regulatory Map) server [40]. Rice promoter sequences were first retrieved from EnsemblPlants through the Biomart tool [41]. The promoter sequences were used as an input for the 'Regulation Prediction' tool of PlantRegMap. A p-value threshold of 1.00 × 10 −5 was set for the prediction of significant binding sites.

Phenotyping of Rice Grains for Chalkiness Trait
Grain chalkiness and translucence were visualized for rice accessions representing different haplotypic groups. Whole rice grains were dried, de-husked, polished, and multiple transverse sections were obtained using a handheld microtome cutter with a changeable razor blade. The cross-sections were mounted on the specimen holder and coated with 5 NM Chromium Gold for visualization with a field emission scanning electron microscope (FESEM). The images were captured at 10,000× magnification and 10 µm resolution.

Haplotypic Diversity in Nutritional and Cooking Quality-Related Genes in Rice
Significant haplotypic diversity was observed in 80 grain quality related genes (Supplementary Table S1). A total of 6,572,189 SNPs were retained after filtering for missing calls per sample (<31%), missing calls per variant (<20%), and minor allele frequency per variant (>1%). The maximum numbers of SNPs were obtained for gene Kala4|OsS1, which is responsible for pericarp color in rice (Supplementary Table S1). A maximum of 56 missense mutations were observed in the qCdT7|OsHMA3 gene, which is responsible for the grain zinc content in rice. No missense mutations were observed in nine genes (Supplementary Table S1). The functional amino acid change prediction revealed a total of 133 deleterious mutations in 39 genes (Table 1). A maximum of 30 deleterious mutations were observed in the FLO2 gene, which is known to regulate grain size (Tables 1 and S1).    SLG7 Os07g0603300 Gene names are italicized. * indicates deleterious mutations.
The highest numbers of InDel (607) were observed in gene SUG1, which is involved in the process of seed starch biosynthesis. Similarly, gene Kala4, which is responsible for grain pericarp color in rice, showed 448 InDels (Supplementary Table S1). In the case of the chalk5 gene, which regulates grain chalkiness, a total of 212 SNPs and 19 missense mutations were observed, and most of the missense mutations were present in exon 4.
The representative haplotype groups among 3024 rice accessions for the chalk5 gene were specific to rice sub-populations. The indica types have reference type alleles and mostly belonged to the same haplotypic group (Figure 1). The tropical and temperate rice types mostly possessed alternate alleles and were grouped in distinct haplotypic groups when compared to the indica type. Similarly, the aromatic group showed mutations in the third exon specifically and were grouped separately compared to other rice sub-populations ( Figure 1). In the case of the grain length GL7 gene, a total of 10 missense mutations and ten haplotypes were observed with the maximum number of missense mutations in the third exon (Supplementary Figure S1A,B). Most of the accessions belonged to the haplotype Hap-III (1667) and only 13 rice accessions belonged to the Hap-X group (Supplementary Figure S1B,C).
The rice accessions belonging to different haplotypic groups showed varying gradations of grain chalkiness and translucence. A total of 19 missense mutations and 11 haplotypes were observed for the chalk5 gene (Table 1). After removing the missing data and heterozygous calls, five major haplotypes of the chalk5 gene were found most promising. These five haplotypes included distinctly differentiated chalkiness and translucent accessions (Figure 2A,B). The haplotypic network for the chalk 5 gene revealed five major haplotypes and large number of accessions belonged to haplotype group 3 (AACG-GTATGCC) ( Figure 2C). Among the five haplotypic groups, haplotype 1 (CTCGTACCCAT) and 2 (CACGGTATGCC) were most pronounced for translucent phenotype whereas haplotype 4 (CACTGTATGCC) was found to be associated with rice grains with varying degrees of chalkiness ( Figure 2D). Most of the accessions in haplotype groups 1, 2, and 4 belonged to japonica, aromatic, and indica subpopulations, respectively. Field emission scanning electron microscopy (FESEM) for chalky and translucent grains revealed contrasting structures of starch granules ( Figure 2E). The translucent rice had distinct polyhedral structures with no air pockets nearby and the chalky rice grains show round starch granules with neighboring air gaps ( Figure 2E). It was also observed that starch granules in chalky grains had varying sizes. Gene expression dynamics for chalk5 showed a higher gene expression in the ovary and endosperm tissues as compared to leaf, root, stem, and other tissues ( Figure 2F).

Haplotype Network Defining the Evolution of Important Rice Genes
Haplotypes found in at least ten accessions were considered for the development of haplotype networks (Supplementary Dataset S1). Based on the sequence variations prevalent in these genes, a total of 791 haplotyping groups were formed across the 80 genes (Supplementary Table S1). A maximum number of haplotypes (23) was observed in the OsABCC1|MRP1 gene, which is an ABC transporter and reduces arsenic uptake (Supplementary Dataset S1). The lowest frequency was found in OsHAC1;1 gene (4), which is known to regulate arsenic accumulation in rice (Supplementary Table S1). The DU3 and BADH2 genes showed 21 haplotypes, each distributed across nine rice isozyme categories (Supplementary Dataset S1). Both the OsABCC1|MRP1, and OsHAC1;1 genes showed the highest number of accessions belonging to TeJ group. In the case of the GL7 gene, ten haplotypes were observed and most of the accessions belonged to the indica rice types (IndII, IndIII, and Ind_admix), whereas accessions belonging to IndI group were mostly present in haplotype 4 (Supplementary Figure S1D). Most of the accessions belonging to TrJ, TeJ, and Jap_admix were more prevalent in the haplotypic group 6 in the case of the GL7 gene (Supplementary Figure S1D). In the case of the DU3 gene, haplotype 1 possessed a larger number of accessions, which were mostly grouped under the TeJ group (519) followed by TrJ (370) (Supplementary Dataset S1). Similarly, for the BADH2 gene, haplotype 2 had the highest number of accessions with the TeJ group (535) followed by TrJ (245) ( Supplementary Table S1). Similarly, for gene OsCAO1|PGL (chlorophyllide a oxygenase), haplotype 1 had the highest frequency with a larger number of accessions belonging to the IndIII subgroup followed by TeJ (Supplementary Dataset S1). The haplotypic networks for all the 80 genes are given in Supplementary Dataset S1.   Table S3). The genes that show purifying selection include OsPT2 (phosphate transporter), which is responsible for selenite transport, RINO1 (low phytate), OsIRT1 (Iron and Zinc accumulators), SPDT (phosphorous accumulation), OsVIT2 (Iron transporter), OsCAO1|PGL (leaf senescence, grain yield), and Rab5a (storage protein transporter).
The nucleotide diversity in the genomic region harboring nutritional and cooking quality-related genes was significantly reduced in cultivated rice compared to wild rice, particularly genes such as OsCAO1|PGL, OsVIT2, and OsGZF1. However, some genes like OsABCC1|MRP1, OsMATE2, and SSIIIa showed higher diversity in cultivated rice (Supplementary Dataset S2). The chalk5 gene, which governs the grain opaqueness/transparency showed similar nucleotide diversities for the cultivated and wild-type rice accessions ( Figure 3A). However, significantly lower diversity values were observed for indica and japonica rice accessions. The comparative nucleotide diversity between for the OsIRT1 gene, which governs the iron and zinc accumulation in rice shows a higher diversity in wild-type accessions when compared to the cultivated accessions ( Figure 3B). It was noted that the diversity values for japonica sub-varieties were significantly decreased when compared to the cultivated and indica sub-types of accessions. The phylogenetic analysis for the chalk5 gene showed that the indica and japonica sub-groups grouped separately and most of the Oryza rufipogon OR-III accessions clustered with the japonica subtype and Or-I with the indica subtype ( Figure 3C). The phylogenetic analysis for the OsIRT1 gene shows a closer clustering of the indica and japonica sub-group and most of the Oryza rufipogan accessions were grouped separately and with more diversity than indica and japonica accessions ( Figure 3D).

Gene Expression Dynamics for Grain Quality-Related Traits in Rice
Expression analysis of grain quality-related genes was evaluated using publicly available transcriptomic data. A diverse expression profile across different tissues and developmental stages was observed for the quality-related genes in rice (Figure 4). Only 64 out of 80 genes showed expression in the selected tissues and developmental stages. Tissuespecific expression was observed in nutritional and cooking quality-related genes and about 29 genes showed higher expression in the endosperm (Supplementary Dataset S3). Clustering analysis showed tissue-specific and developmental stage-specific expression patterns for most of the genes. Some genes were constitutively expressed like Rab5a and OsALDH7, and a few genes were expressed only in the later stages of seed development such as OsGZF1 ( Figure 4A) Three genes, GSE5, GW5L, and qGW8|OsSPL16|GW8, were expressed only in early inflorescence and pistil tissue ( Figure 4B). Three genes, OsIRO2, OsNAS2, and OsHAC1;1 showed leaf-specific expression. Two genes, OsUgp2|UGP2 and WX showed expression only in GSK5 mutants compared to wild type and ARF4 mutant. Two genes, qCdT7|OsHMA3 and chalk5, were expressed in GSK5 and ARF4 mutants compared to wild type and GLU4A showed expression only in ARF4 mutant ( Figure 4C). Gene OsPht1;2 responsible for selenite uptake in rice showed higher gene expression in the endosperm tissue of black rice as compared to red and white rice. Similarly, the gene SPDT which governs phosphorus accumulation and OsPCR1 showed significantly higher expression in grains of red rice as compared to white and black rice tissues ( Figure 4D). Higher expression of the GL7 gene was mostly observed in pistil followed by palea and lemma (Supplementary Figure S1E). The qPCR analysis showed higher expression of the FLO16 gene in seedling and leaf as compared to the roots. In the case of OsLTPL36 and GLU4A genes, comparatively higher expression was observed in roots (Supplementary Figure S2).

Hub Genes Identified through Gene Co-Expression Network Analysis in Rice
A co-expression network for 80 genes was developed using gene expression data from 11,726 libraries from the 'PPRD' database. A total of 16 genes were removed after filtering the expression data, based on missing values using the WGCNA tool. The remaining 63 genes were used for co-expression network construction and module detection. Varying module sizes were selected to obtain an optimum number of modules. For network generation, the minimum number of genes to define a module was selected as 3. Among various modules, genes for grain protein content clustered separately, as did genes responsible for grain starch content and chalkiness ( Figure 5A). Genes placed in module 0 were not grouped under any module.
A total of 19 hub genes showed a high degree of correlation with other genes (Table 2, Figure 5B). Among these, gene OsCAO1, which is involved in grain yield and quality, showed the highest degree of correlation with 20 other quality-related genes (Supplementary Table S4). The OsSSI gene involved in the starch biosynthesis pathway and FLO2 which controls rice grain size and starch quality showed a high degree of correlation with 19 genes, respectively. The OsSULTR3;3 gene showed 18 interactions with the rest of the genes. Among the hub genes, the Ospho1 gene, which is responsible for starch structure within the endosperm, showed higher expression (1237.6 RPKM value) in seed ( Table 2,  Supplementary Table S4). The Ospho1 gene was present in the WGCNA module 1, which consists of genes related to grain starch, sucrose and chalkiness ( Figure 5A).

Interaction of Transcription Factors and Nutritional Quality-Related Genes
A total of 1413 regulatory interactions were obtained between 211 transcription factors and 80 genes governing rice cooking and nutritional quality (Table 3). Among the total interactions, it was observed that 87 transcription factors possessed over-represented targets in the input gene set under the cutoff p-value ≤ 0.05. As a result, transcription factor LOC_Os05g03020 with target gene OsGZF1 had the highest number of binding sites with 24. Similarly, transcription factor LOC_Os05g03020 belongs to the C2H2 family and interacts with gene OsHAC1;1. Likewise, transcription factor enrichment analysis revealed that LOC_Os03g60630 and LOC_Os07g13260 had the lowest p-values and most significant results in terms of over-represented targets in the input gene set under cutoff <= 0.05. Here, both the transcription factors belong to the Dof family. Table 3. Details of transcription factors predicted to have interaction with cooking and nutritional quality-related genes. Plant Transcriptional Regulatory Map (PlantRegMap) server [40] was used to predict the interaction. Os04g0594100 MYB 80 9 7.53 × 10 −4 1.26 × 10 −2 # 'Query_all' stand for the number of gene promoters that were examined for the existence of transcription factor binding sites; $ 'Query_bind' represents the number of genes with a binding site for a specific transcription factor in their promoter; ¥ p-value cutoffs of ≤0.05 was used to claim significant interaction.

Discussion
The increasing human population has led to increased demands for high-calorie and nutrient-rich foods, which in turn have necessitated in-depth studies targeted at exploring the available genomic and transcriptomic resources for developing high-yielding cropvarieties. Several genes responsible for crop yield, abiotic and biotic stress resistance, grain quality, palatability, and cooking and nutritional quality-related traits have been characterized [42,43]. The development of novel varieties with improved traits is hindered by the lack of information about the sources of desirable alleles and the genetic background of the donor lines. The whole-genome re-sequencing data available for over 4500 diverse rice genotypes is an excellent resource for understanding the allelic diversity of wellcharacterized genes governing important traits in rice. In the present study, based on the analysis of 80 genes linked to nutritional and cooking quality traits, 133 deleterious mutations within 39 genes were identified. The rest of the genes either did not show any sequence variations or showed polymorphisms that were not functionally important for the selected traits. Based on these sequence variations, haplotype analysis was conducted to deduce the patterns of inheritance for the effective selection of informative SNPs. Thirtynine genes showed more than or equal to 10 haplotypes. Among these, the highest number of haplotypes was observed in OsABCC1|MRP1, which is an ABC transporter followed by two genes, DU3 and BADH2, which are involved in starch content regulation and aroma, respectively. OsHAC1;1 showed the least number of haplotypes (4) and is involved in the regulation of arsenic accumulation. Similarly, four haplotypes were observed for OsLCT, which regulates cadmium accumulation in rice. In addition to this, haplotype networks were generated to study the evolution of important haplotypes across nine rice isozyme classification groups. To understand the expression of these genes across selected rice tissues and different developmental stages, transcriptome data from 102 experiments in four studies on rice have been analyzed. In terms of gene co-expression, we could finalize a set of 19 genes with the highest gene co-expressions and these were termed as 'hotspot' genes. The hotspot genes depict both a strong positive as well as a negative correlation with the rest of the genes. In addition, we have predicted the genetic regulation and a set of 87 transcription factors that were over-represented in the dataset. The transcription factors were predicted to regulate the expression of 78 genes related to nutritional and cooking quality-related traits. Gene sequence polymorphism, expression dynamics, and regulation provide a multi-tiered approach in the selection of SNPs for crop improvement studies. Recently, Angira et al. [44] identified two haplotypes of the SD1 gene, which is commonly observed in the rice germplasm of the United States. Angira, Addison, Cerioli, Rebong, Wang, Pumplin, Ham, Oard, Linscombe and Famoso [44] first identified six SNPs that could differentiate all seven haplotypes present in the SD1 gene. Subsequent haplotypebased marker development and screening of the rice germplasm of the US revealed that the first and the third haplotypes are predominant. In the present study, 66 SNPs, nine missense mutations, and 11 haplotypes including seven reported by Angira, Addison, Cerioli, Rebong, Wang, Pumplin, Ham, Oard, Linscombe and Famoso [44] for the SD1 gene were identified (Supplementary Table S1). Previously, seven haplotypes were identified for the gene chalk5 on chromosome 5, based on sequence variations across a panel of 191 rice accessions of indica and japonica type [45]. The haplotype groups were categorized into two sub-groups: class A (hap 1-4) and class B (hap 5-7), based on phylogenetic analysis. Class A haplotype groups represented accessions with white belly chalkiness trait. In our study, we have identified 11 haplotype groups based on 19 missense mutations within the chalk5 gene across indica, japonica, aus and intermediate subgroups. Our analysis revealed five promising haplotypes for chalkiness trait based on the 11 nonsynonymous SNPs after excluding heterozygous and missing SNPs in the haplotype groups. Among the five haplotypes, haplotypes 1 and 2 were associated with transparent rice grains, whereas haplotype 4 was associated with chalkiness of seeds. The present study demonstrated haplotype and phenotype association for the chalkiness traits, which can be explored further for haplotype-based breeding. Another example where haplotypic information was used to identify the desired allele for the GS9 gene is reported by Zhao et al. (2018). Here, a set of 114 diverse rice genotypes were used to identify five SNPs in GS9 which categorizes the set in five haplotypes. The null mutation in the GS9 gene was found to improve grain shape to a slender form whereas its overexpression resulted in round grains. However, 75 SNPs, including the eight nonsynonymous SNPs identified herein for GS9 provide an opportunity to identify novel alleles with the desired effects and consequently more genetic resources (donor lines) for breeding (Supplementary Table S1). Several studies similar to that of Zhao et al. [46] have explored of genetic variations to identify desired alleles and to understand haplotypic variation; however, those are mostly focused on a single gene (Supplementary Table S5). Notably, in the present study, haplotypic analysis of 80 genes regulating nutritional and cooking quality-related traits provides numerous advantages, including an understanding of allelic variability and evolution, the identification of desired alleles and their sources simultaneously for different genes, and providing an opportunity for haplotype-based precision breeding.
The non-synonymous SNPs identified in the candidate genes were further evaluated with PROVEAN, which helped to associate the functional prediction to the haplotypes. The probable functional impact of non-synonymous SNPs is higher than that of synonymous SNPs. In this regard, PROVEAN scores were calculated based on the sequence level conservation and properties of the amino acids, which have helped to predict the effect of non-synonymous SNPs. An earlier study performed by Deshmukh et al. [47] has demonstrated the use of PROVEAN to accurately predict the effect of amino acid change. They have used site-directed mutagenesis to verify the deleterious and neutral effects predicted for the silicon transporter genes from tomato, rice, and poplar [47]. Similarly, desired alleles for negative regulators can be efficiently identified based on the PROVEAN score.
In addition to genomic information, the present study has explored extensive transcriptomic data. The transcriptomic information provided here for the important genes related to nutritional and cooking quality in rice will be helpful to better understand gene regulation. Many of these genes might have pleiotropic effects which regulate other important traits. The OsFAD2|OsFAD2-1 gene, which affects lipid accumulation, has high expression in 20-day-old leaves. Moreover, the gene OsMADS34|PAP2, which regulates grain yield and quality, shows high gene expression in anther. Similarly, a co-expression network developed here for the nutritional and cooking quality-related genes helped to identify hub genes. The exploration of haplotype diversity for such genes could provide a relatively high level of variation in the grain quality traits for future crop breeding and improvement programs. In addition, such a network will be helpful for understanding the interactions among the selected genes. Recently, a gene co-expression network has been successfully used to identify substructures of gene modules responding to salt stress in rice [48]. Similarly, a co-expression network developed for strawberry has been used to identify genes regulating flower and fruit-related traits [49]. Additionally, a co-expression network developed in rice identified modules associated with temperature-inducible and photoperiod sensitive genes, which are important for the sterility transition [50]. Similarly, 496 hub genes and four modules showed a significant correlation with photo-sensitive differentially expressed genes in rice [51]. Furthermore, weighted gene co-expression analysis has identified differentially expressed genes such as OsHSPs, OsHSFC2A, and OsDJA5 upon cadmium treatment in stem nodes of different rice genotypes [52]. Many of such examples utilizing co-expression network information suggest the potential applications of the network developed in the present study for the rice quality-related genes.

Conclusions
In this study, we aimed to understand the genetic variations, identify haplotypes, and analyze the expression patterns of genes known to have a significant role in rice nutritional and cooking quality-related traits. We have detailed the functionally important SNPs in 80 previously cloned genes known to regulate grain quality-related traits. The predicted functional impact of the sequence variants will help to understand gene regulation as well as the effect of haplotype on trait development. The generated resource will serve as a basis for haplotype-based breeding programs of rice. The detailed haplotypic information provided here will facilitate the identification of donor lines harboring the most desired haplotype. In addition, starting with 80 functionally characterized genes, we have narrowed down a list of 19 hotspot genes based on the co-expression network developed using extensive transcriptomic data. The information of hotspot genes can be further explored to understand the gene interaction and interdependency of grain quality-related traits. Subsequently, the efficacy of the approach was demonstrated by showing haplotypic association with key nutritional and cooking quality traits like grain length, grain weight, and chalkiness. The adopted approach and the information provided in the present study will be helpful for understanding genetic variation for the nutritional and cooking quality in rice and for accelerating haplotype-based breeding programs aimed at customizing high-quality rice with the desired nutritional value.
Supplementary Materials: The following supporting information can be downloaded at: https://www. mdpi.com/article/10.3390/cells11071144/s1, Table S1. Details of Single nucleotide polymorphism (SNPs), Insertions/deletions (InDels), and their haplotyping information for nutritional and cooking quality related genes. The deleterious effect mutations are represented with an asterisk; Table S2. Details of RNASeq datasets used in the present study. All the raw data was retrieved from NCBI SRA database and analyzed using CLC Genomics workbench; Table S3: Diversity analysis of 80 nutritional and cooking quality-related genes in terms of average pairwise divergence (π), estimated mutation rate (θ) and Tajima's D values; Table S4. Details of hub genes and their interactions identified by coexpression network using RNAseq data; Table S5. Details of genetic variations previously reported for nutritional and cooking quality related genes; Figure  Acknowledgments: Authors are highly thankful to Leonidas Dagostino and Sagnik Sengupta for English editing.

Conflicts of Interest:
The authors declare no known financial or personal conflict of interest to influence research presented in the paper.