Genome-Wide Identification and Expression Profiles of 13 Key Structural Gene Families Involved in the Biosynthesis of Rice Flavonoid Scaffolds

Flavonoids are a class of key polyphenolic secondary metabolites with broad functions in plants, including stress defense, growth, development and reproduction. Oryza sativa L. (rice) is a well-known model plant for monocots, with a wide range of flavonoids, but the key flavonoid biosynthesis-related genes and their molecular features in rice have not been comprehensively and systematically characterized. Here, we identified 85 key structural gene candidates associated with flavonoid biosynthesis in the rice genome. They belong to 13 families potentially encoding chalcone synthase (CHS), chalcone isomerase (CHI), flavanone 3-hydroxylase (F3H), flavonol synthase (FLS), leucoanthocyanidin dioxygenase (LDOX), anthocyanidin synthase (ANS), flavone synthase II (FNSII), flavanone 2-hydroxylase (F2H), flavonoid 3′-hydroxylase (F3′H), flavonoid 3′,5′-hydroxylase (F3′5′H), dihydroflavonol 4-reductase (DFR), anthocyanidin reductase (ANR) and leucoanthocyanidin reductase (LAR). Through structural features, motif analyses and phylogenetic relationships, these gene families were further grouped into five distinct lineages and were examined for conservation and divergence. Subsequently, 22 duplication events were identified out of a total of 85 genes, among which seven pairs were derived from segmental duplication events and 15 pairs were from tandem duplications, demonstrating that segmental and tandem duplication events play important roles in the expansion of key flavonoid biosynthesis-related genes in rice. Furthermore, these 85 genes showed spatial and temporal regulation in a tissue-specific manner and differentially responded to abiotic stress (including six hormones and cold and salt treatments). RNA-Seq, microarray analysis and qRT-PCR indicated that these genes might be involved in abiotic stress response, plant growth and development. Our results provide a valuable basis for further functional analysis of the genes involved in the flavonoid biosynthesis pathway in rice.


Introduction
Flavonoids, synthesized by the phenylpropanoid pathway, one of the largest families of polyphenolic secondary metabolites in the world, are extensively distributed in all different organs and tissues depending on the plant developmental and environmental conditions [1]. Nearly all flavonoids contain the common diphenylpropane (C6-C3-C6) carbon framework with two aromatic rings (A-ring and B-ring) interconnected by a three-carbon heterocyclic pyran ring (C-ring chain) [2,3]. Based on the degree of heterocyclic C-ring oxidation, the position of hydroxyl and methyl groups on the A-and B-rings and the degree of modifications (including glycosylation, acylation and polymerization ), flavonoids can be chiefly categorized into six classes: flavones, flavanones, flavonols, flavanols, anthocyanins and isoflavones [3,4]. An enormous amount of research during the last few decades has revealed that flavonoids, as the most bioactive plant secondary metabolites, might exhibit a wide range of physiological functions in plant growth, development, reproduction and stress defense [5][6][7]. As co-pigmentation, flavonoids in flowers and seeds can facilitate their pollination and seed dispersal through attracting insects [8,9]. As important developmental regulators, flavonoids have been shown to be involved in pollen fertility and pollen germination [10][11][12][13][14] and in phytohormone transport and catabolism regulation [11,[15][16][17]. As antioxidants [18,19], flavonoids also play apparently important roles in abiotic and biotic stress responses, such as low phosphate and nitrogen stress [3,20], temperature stress [21][22][23], drought and salt stress [24][25][26], protection against ultraviolet-B (UV-B) radiation [27][28][29], as well as defense against pathogens and herbivores [30][31][32][33][34]. Furthermore, flavonoids also have demonstrated major benefits for human nutrition and health due to their diverse biological activities, such as anticancer, antiviral, anti-allergic, anti-inflammation and protection against coronary heart diseases [35][36][37]. Suffice it to say, it is increasingly important for crop yield security and human nutrition due to the antioxidant and healthy benefits of flavonoids to enhance flavonoid content in the future.
To date, over 9000 flavonoid molecules have been identified from various plant species [4], and the molecular mechanisms of enzymes catalyzing flavonoid biosynthesis have also been well established, such as in Arabidopsis thaliana, Solanum lycopersicum (tomato), Glycine max, Phaseolus vulgaris and Vitis vinifera [2,38,39]. O. sativa (rice), as an experimental system for monocots, is the most economically important food staple crop for human nutrition [35]. Although the understanding of flavonoid biosynthesis in rice is relatively limited when compared to dicot plants, such as Arabidopsis and tomato [2], a nearly complete flavonoid biosynthetic pathway for flavonoid scaffolds, including flavanones, flavones, flavonols and anthocyanins in rice is elucidated in Figure 1, based on several pioneering works [29,[40][41][42][43][44][45][46][47]. The flavonoids are biosynthesized by a complex series of reactions, including condensation, isomerization, oxidations and reductions [2]. The first two processes in the flavonoid biosynthesis pathway are facilitated by chalcone synthase (CHS) and chalcone isomerase (CHI) to form the sequential production of chalcone [48] and flavanone (including naringenin) [49]. Flavanone is as a universal precursor for the biosynthesis of various flavonoid classes, including flavones, flavonols, anthocyanins and proanthocyanidins. Among them, flavones are synthesized by flavone synthase (FNS), which is divided into two types in plants, including FNS I (mainly in Apiaceae plants) and FNS II (more widespread in plants) [50]. In rice, flavanones are converted into corresponding flavones by OsFNS I-1 or OsFNS II in vitro or in vivo [42,51,52]. In addition, flavones can be also synthesized by flavanone 2-hydroxylase (F2H). F2H catalyzes the hydroxylation of flavanone to produces 2-hydroxyflavanones in the C-glycosylflavone biosynthesis pathway [40,46]. Dihydrokaempferol is synthesized by flavanone 3-hydroxylase (F3H) [53]. Then flavonol synthase (FLS) converts dihydroflavonols into flavonols by the desaturation of dihydroflavonol [54]. In rice, OsFLS is a bifunctional dioxygenase-exhibiting FLS and F3H activities to exercise its functions in flavonol production [44]. Based on the hydroxylation pattern of the flavonoid B-ring, flavonoid 3 -hydroxylase (F3 H) and flavonoid F3 5 H-hydroxylase (F3 5 H) catalyze hydroxylation at the C3 and C3 C5 positions of flavonoid for the final production of two-hydroxy and three-hydroxy, respectively [55,56]. F3 5 H may represent the later gained functional divergence from F3 H [57,58]. To date, two rice F3 Hs, CYP75B3 and CYP75B4, were proven as bona fide flavonoid 3 -hydroxylase with substrate-specific specificity, both of which can restore Arabidopsis transparent testa 7 mutants by catalyzing 3 -hydroxylation of flavonoids into 3 -hydroxylated flavonoids [41,43]. Notably, CYP75B4 (also termed chrysoeriol 5 -hydroxylase) can catalyze not only hydroxylation at the C3 position of flavonoid but also the C5 hydroxylation of chrysieriol, which is unique and indispensable for tricin formation in rice [41]. Finally, anthocyanin and proanthocyanidin biosynthesis begin with the entry step enzymatic activity of dihydroflavonol 4-reductase (DFR) [59,60]. DFR competing with FLS for the same substrate can catalyze the dihydroflavonols to produce corresponding leucoanthocyanidins, which are further converted into corresponding anthocyanidins with the aid of the enzymes leucoanthocyanidin dioxygenase/anthocyanidin synthase (LDOX/ANS) [61,62]. Leucoanthocyanidin reductase (LAR) and anthocyanidin reductase (ANR) convert leucoanthocyanidines and anthocyanidins, respectively, into their corresponding proanthocyanidins [63,64]. The flavonoid scaffold molecules mentioned above are further highly diversely modified, including through hydroxylation, C-or O-glycosylation, O-methoxylation and various other biochemical conversions with the aid of enzymes such as glycosyltransferases (GTs), acyltransferases and methyltransferases [38,57]. The carbon flux of the flavonoids pathways through a typical cell can represent about 20% of the total carbon flux in plants [2]. These ubiquitous flavonoid metabolic diversification processes in plants have been demonstrated to play essential roles in plants' diverse environmental adaptation [57,58,65].
Gene duplication is a prevailing phenomenon in plant genomes and is considered as the major mechanism for forming gene families during evolution [67,68]. In some cases, increases in the total number of enzymes and production may be the result of the presence of multiple gene family members. In other cases, new functions can be formed by gene duplication. Both cases for the generation of new genetic material have been suggested to play critical roles in environmental adaptation of plants [69][70][71]. Therefore, genome-wide analysis of diverse gene families in plants can provide us with more new information and lay a foundation of important data for studies in future. Unlike Arabidopsis' enzymes, which are encoded by single genes [38,72], several of the flavonoid-related biosynthesis enzymes in rice are encoded by multiple genes. The rice genome has been completely sequenced since 2005, and rice has been used universally as a species for functional genomics research due to its relatively small genome size (about 389 Mb) [73]. However, no systematic and comprehensive identification and analysis have been performed on the whole key structural gene families involved in the biosynthesis of the flavonoid scaffold molecules in the rice genome, except for studies on the CHS gene family [74,75]. Here, based on the recently released rice genome from the Rice Genome Annotation Project (RGAP) and the Rice Annotation Project (RAP) Network, genome-wide identification and  [43,57,66]. Arrows represent enzymatic steps: CHS, chalcone synthase; CHI, chalcone isomerase; F3H, flavanone 3-hydroxylase; F2H, flavanone 2-hydroxylase; FNSII, flavone synthase II; F3 H, flavonoid 3 -hydroxylase; FLS, flavonol synthase; F3 5 H, flavonoid 3 5 -hydroxylase; DFR, dihydroflavonol 4-reductase; ANS, anthocyanidin synthase; LDOX, leucoanthocyanidin dioxygenase; LAR, leucoanthocyanidin reductase; ANR, anthocyanidin reductase; modification means the flavonoid scaffolds modification of hydroxylation, glycosylation, and methoxylation with the aid of enzymes including glycosyltransferases, acyltransferases and methyltransferases and so on.
Gene duplication is a prevailing phenomenon in plant genomes and is considered as the major mechanism for forming gene families during evolution [67,68]. In some cases, increases in the total number of enzymes and production may be the result of the presence of multiple gene family members. In other cases, new functions can be formed by gene duplication. Both cases for the generation of new genetic material have been suggested to play critical roles in environmental adaptation of plants [69][70][71]. Therefore, genome-wide analysis of diverse gene families in plants can provide us with more new information and lay a foundation of important data for studies in future. Unlike Arabidopsis' enzymes, which are encoded by single genes [38,72], several of the flavonoid-related biosynthesis enzymes in rice are encoded by multiple genes. The rice genome has been completely sequenced since 2005, and rice has been used universally as a species for functional genomics research due to its relatively small genome size (about 389 Mb) [73]. However, no systematic and comprehensive identification and analysis have been performed on the whole key structural gene families involved in the biosynthesis of the flavonoid scaffold molecules in the rice genome, except for studies on the CHS gene family [74,75]. Here, based on the recently released rice genome from the Rice Genome Annotation Project (RGAP) and the Rice Annotation Project (RAP) Network, genome-wide identification and functional analysis of the key structural enzymes involved in the biosynthesis of the flavonoid scaffold molecules are studied.

Gene Structure and Conserved Motifs
Gene structures (including exon/intron and domain) and their length data were extracted based on genome annotation GFF3 files and NCBI CDD, respectively. Conserved motifs were identified by using the online MEME program (http://meme-suite.org/tools/ meme, accessed on 13 December 2021) with the following parameters: the maximum number of 20 motifs and motif width between 6 and 200 amino acids-other parameters were set at their default values [83]. Then, gene structure and conserved motifs were analyzed and visualized using TBtools [84].

Chromosome Location, Gene Duplication Events and Microsynteny Analysis
Chromosome location was obtained from genome annotation GFF3 files. Gene duplication events, including segmental and tandem duplication events, are the two principle means by which gene families are expanded [85]. Genes in the same subfamily were regarded as co-paralogs. Genes that were co-paralogs and located on duplicated chromosomal blocks within their genomes through polyploidy followed by chromosome rearrangement were defined as segmental duplicates. We defined tandem duplicates as adjacent co-paralogs within the same or neighboring intergenic region on a single chromosome. The gene duplication events were analyzed by using MCScanX software [86]. Circos software (http://circos.ca/, accessed on 3 December 2021) was used to visualize chromosome location and gene duplication events [87]. In order to analyze their selective force during the evolutionary process, the nonsynonymous (Ka) and the synonymous (Ks) substitution rates (that is Ka/Ks ratios) of duplicate gene pairs were calculated via DnaSP 5.0 (http://www.ub.edu/dnasp/, accessed on 3 December 2021) [88]. Divergence time of all duplicate events was estimated by T = Ks/(2 × 9.1) × 10 3 million years ago (Mya) [89].

Plant Materials and Growth Conditions
Rice 'RPY geng' (O. sativa ssp. japonica) was used for quantitative real-time transcription polymerase chain reaction (qRT-PCR) analysis. Seedlings were germinated in water at 37 • C for 2 days then grown in containers with Yoshida solution under 14 h light (28 • C)/10 h dark (22 • C) with 65% relatively humidity. After two weeks, seedlings were transferred to Yoshida solution containing 200 mM NaCl or at 4 • C for analyzing salt and cold stress, respectively. Shoots were collected directly into liquid N 2 and then stored at −80 • C prior to RNA isolation with three biological replicates (at least 15 seedlings/each replicate) at 0, 3 and 24 h for RNA extraction. Total RNA was extracted using TRIzol reagent (Invitrogen, Beijing, China), and first-strand cDNAs were synthesized by using an M-MLV reverse transcriptase reagent kit (Promega, Beijing, China) according to the manufacturer's instructions.
The microarray data of rice seedlings treated with cold (GSE57895) [93], salt (GSE76613) and various plant hormones (GSE39429) [94,95] were downloaded from the Gene Expression Omnibus (GEO, https://www.ncbi.nlm.nih.gov/geo/, accessed on 16 December 2021) database. R was used to perform the whole raw microarray data analysis to obtain the differentially expressed genes (DEGs) based on the conditions with a |log 2 fold change| > 1.5 and with significant results for the t-test (p-value < 0.05) after normalization to the control [96,97].
Rice salt-and cold-response genes (OsCHS12, OsCHS28, OsF3H2, OsDFR6 and OsL-DOX2) from rice RNA-Seq datasets were further examined through qRT-PCR. Primers of these five genes were designed by Primer 5.0 (Table S7). The qRT-PCR reaction (10 µL) was formulated with three biological replicates in a 96-well plate by using the 2 × SYBR Green Master Mix reagent (Bio-Rad, Hercules, CA, USA). The rice actin was selected as an endogenous reference to normalize the expression levels of the target genes of all samples. The 2 −∆∆CT method was used to calculate the relative amount of target gene expression.

Identification and Characterization of Flavonoid Biosynthesis-Related Genes in Rice
The key genes encoding enzymes involved in flavonoid biosynthesis pathways were examined with a systematic computational approach through using BLASTP and the HMMER algorithm on the rice genome. As a result, a total of 85 genes were identified, belonging to 13 gene families (Table 1 and Table S6), including 28 CHS, 10 F3 H, 9 LDOX, 8 F3H, 8 DFR, 7 CHI, 6 ANR, 3 FLS and 2 ANS genes, and LAR, F3 5 H, FNSII and F2H encoded by a single gene. All gene candidates were characterized and named according to their corresponding chromosomal order, and more detail information regarding their corresponding chromosome locations, number of amino acids (aa), molecular weight (MW) and theoretical isoelectric point (pI) are summarized in Table 1. Then, the subcellular localizations were also predicted, among which, the majority (43/85) were targeted to the cytoplasm, second (29/85) was the chloroplast and 4, 3, 2, 2 and 2 genes were located in the nucleus, extracellular, mitochondrion, plasma membrane and vacuole, respectively.

Analysis of Phylogenetic Relationship, Gene Structure and Motif Composition
In order to investigate evolutionary relationships, a NJ phylogenetic tree was constructed for the 85 putative genes in the rice flavonoid biosynthesis-related pathway, which revealed they were clustered into five main lineages ( Figure 2). Among them, all CHS genes and CHI genes were separately clustered into single lineages, while the F3H, FLS, LDOX and ANS genes, F3 H, F3 5 H, FNSII and F2H genes, and DFR, ANR and LAR genes formed separate groups of lineages. Gene structure diversity has been proven to act as the primary driving force for multigene families' evolution [98]. As expected, the gene structures and motif compositions were highly conserved in the same lineages, although they might show some differences in the number and the length of gene structure and motif composition in some cases. For instance, the Chal_sti_synt_N and Chal_sti_synt_C domains are conserved in all CHS genes. CHI genes possess a chalcone domain. F3 H, F3 5 H, FNSII and F2H genes contain the p450 domain. The adh_short or Epimerase or 3Beta_HSD domains are found in DFR, ANR and LAR genes, and the 2OG-FeII_Oxy or DIOX_N domain are contained in F3H, FLS, LDOX and ANS genes. These results suggest that 85 genes belonging to 13 gene families in the rice flavonoid biosynthesis pathway could be further divided into 5 five distinct lineages based on the analysis of gene structural features, motif compositions and phylogenetic relationships. Among them, F3H, FLS, LDOX and ANS are members of the 2-oxoglutarate dependent dioxygenase (2-ODD) superfamily, F3 H, F3 5 H, FNSII and F2H belong to the cytochrome P450-dependent monooxygenase (CYP450) superfamily, and DFR, ANR and LAR are subordinated to the short-chain dehydrogenase/reductase (SDR) superfamily ( Figure 2).

Chromosome Locations, Duplication Events and Microsynteny Analysis
To understand the expansion mechanism, chromosome locations and duplication modes of the rice genes involved in the flavonoid biosynthesis were investigated by using MCScanX (Figure 3 and Table 2). We found that the majority of 85 flavonoid biosynthesisrelated genes were distribute across Chrs 4, 6, 7, 9, 10 and 11. Among them, Chr4 and Chr10 both had 12 genes (14.12%), Chr7 owned 11 genes (12.94%), Chr9 included 9 genes (10.59%) and Chr6 contained 8 genes (9.41%). Remarkably, most of the genes, including not only those of the gene family members with high sequence similarity but also the different gene family members, were located close to each other on the Chrs. Some gene clusters with high sequence similarity were located in the same genome duplication zone as duplicate gene pairs and were found to be tandem duplicates ( Figure 3 and Table 2), for instance OsCHS4-7 on Chr5, OsCHS11-12 and OsLDOX24-26 in the region proximal to the telomere of Chr7 and Chr11, OsANR1-5 on Chr4, OsLDOX4-5 on Chr6 and OsLDOX6-9 in the region proximal to the telomere of Chr9. In addition, gene clusters with a relatively high density among some distinct gene family members were also observed on Chrs 4, 6, 7, 8, 9 and 10. Based on previous reports [99][100][101], we assumed that a cluster of genes

Chromosome Locations, Duplication Events and Microsynteny Analysis
To understand the expansion mechanism, chromosome locations and duplication modes of the rice genes involved in the flavonoid biosynthesis were investigated by using MCScanX (Figure 3 and Table 2). We found that the majority of 85 flavonoid biosynthesisrelated genes were distribute across Chrs 4, 6, 7, 9, 10 and 11. Among them, Chr4 and Chr10 both had 12 genes (14.12%), Chr7 owned 11 genes (12.94%), Chr9 included 9 genes (10.59%) and Chr6 contained 8 genes (9.41%). Remarkably, most of the genes, including not only those of the gene family members with high sequence similarity but also the different gene family members, were located close to each other on the Chrs. Some gene clusters with high sequence similarity were located in the same genome duplication zone as duplicate gene pairs and were found to be tandem duplicates ( Figure 3 and Table 2), for instance OsCHS4-7 on Chr5, OsCHS11-12 and OsLDOX24-26 in the region proximal to the telomere of Chr7 and Chr11, OsANR1-5 on Chr4, OsLDOX4-5 on Chr6 and OsLDOX6-9 in the region proximal to the telomere of Chr9. In addition, gene clusters with a relatively high density among some distinct gene family members were also observed on Chrs 4, 6, 7, 8, 9 and 10. Based on previous reports [99][100][101], we assumed that a cluster of genes among different gene families may provide convenience in the form of physical position or space for distinct flavonoid-related genes to form macromolecular complexes during rice flavonoid biosynthesis. They included OsF3H2-5 and OsANR1-5 in the region proximal to the end site of Chr4, OsF3H6 and OsLDOX4-5 in the region proximal to the beginning site of Chr6, OsDFR3 and OsANS2/5 in the region proximal to the end site of Chr6, OsCHI5 and OsDFR4-5 in the region proximal to the end site of Chr7, OsF3H7, OsDFR7 and OsF3 H3/4 in the region proximal to the end site of Chr8, OsDFR8 and OsF3 H6-8 in the region proximal to the end site of Chr9, OCHS16-21 and OsF3 H9-10 in the region proximal to the beginning site of Chr10 and OsF3H8 and OsFLS2-3 in the region proximal to the end site of Chr10. Meanwhile, we also found that some of genes are located at the beginning or end site of Chrs, such as OsF3H1, OsFNSII, OsCHS2, OsLODX3, OsF2H, OsCHI6 and OsCHI7, which are located at the beginning site of their corresponding Chrs, while OsCHI3, OsF3H3, OsF3H4, OsF3H5, OsF3H8, OsFLS2 and OsFLS3 mapped to the end site of their corresponding Chrs.    For a gene family, gene duplication events, including tandem duplication, whole genome/segmental duplication, proximal duplication and dispersed duplication, are the primary power for gene expansion [85]. Therefore, 22 duplication events were examined in the total of 85 flavonoid-related genes, among which seven pairs were derived from segmental duplications and fifteen pairs were from tandem duplications ( Figure 3 and Table 2). Furthermore, we noticed that these duplication events only existed in six gene families (CHS, CHI, F3H, LDOX, ANS and ANR), indicating that duplication events played an essential role in the expansion of these gene families. Further analysis displayed that the numbers and types of duplication modes were greatly diverse among these six gene families, which could mean that expansion mechanisms of these genes were different during the long-term evolutionary process. For example, tandem duplication was the principal duplication mode in the LDOX and ANR gene families, while segmental duplication was the major duplication mode in the CHI, F3H and ANS gene families. In addition, we found the number of tandem duplications was more than that of segmental duplications in the CHS gene family, which suggests tandem duplication played a principle role in the gene expansion.
Next, the values of non-synonymous substitutions per non-synonymous sites (Ka), synonymous sites (Ks) and the Ka/Ks ratio in each duplicated gene pair were calculated to evaluate the selective pressure. If a statistically significant Ka/Ks ratio equal to one represents a neutral or absence of selection, greater than or lower than that means positive and negative (or purifying) selection, respectively. The gene pair OsLDOX4/OsLDOX5 was excluded to compute the Ka/Ks ratio because of values of Ka and Ks both equal to zero, which showed that OsLDOX4 and OsLDOX5 genes were completely identical at the nucleotide sequence level. Thereby, our findings showed the majority of Ka/Ks ratios were less than 1.0, meaning they underwent negative selection, except for five gene pairs, including OsF3H1/OsF3H2, OsF3H2/OsF3H8, OsLDOX7/OsLDOX8, OsLDOX8/OsLDOX9 and OsANS1/OsANS2, which were greater than 1.0, suggesting positive selection (Table 2). At the same time, divergence time of all gene duplicate pairs varied from 0.55 to 27.64 Mya (Table 2). These results demonstrated that expansion of gene families in the flavonoid biosynthesis pathway was the result of multiple selective forces during the long-term evolutionary process.

Expression Profiling of Rice Genes in the Flavonoid Biosynthestic Pathway
Expression analysis of gene families can provide important research clues for the further study of their functional differentiation [102,103]. In this study, to investigate the expression pattern of the identified genes involved in the flavonoid biosynthesis, we obtained the expression profiles in 11 major tissues from the NCBI SRA databases, including leaves, post-emergence inflorescence, pre-emergence inflorescence, anther, pistil, seed at 5 DAP, embryos at 25 DAP, endosperm at 25 DAP, seed at 10 DAP, shoots and seedling at the four-leaf stage. We observed that the majority of the genes were expressed differentially in all tissues (Figure 4), except for 12 genes that were not detected in the all tissues. Most of the genes showed a relatively high expression level in specific tissues, while some genes had relatively high levels in multiple tissues. For instance, OsFLS2, OsF3 H2, OsF3 H4, OsANR3, OsDFR3, OsDFR6 and OsLDOX9 had relatively high expression in leaves. OsCHS1, OsCHS8, OsCHS12, OsCHS18, OsCHS23, OsCHS26, OsCHS28, OsDFR1, OsF3 H3 and OsLDOX8 showed high expression levels in seedlings at the four-leaf stage. OsANS1, OsANS2, OsFLS1, OsF3H4, OsF3 H1 and OsCHI4 were relatively highly abundant in anther. OsF3H3 and OsCHS9 had the highest relative expression levels in seed at 5 DAP. OsCHS2, OsCHS24, OsCHS25, OsCHI3, OsANR5, OsDFR4, OsDFR5, OsDFR8, OsF3 5 H, OsF3'H9 and OsFNSII were relatively highly expressed in post-emergence inflorescence. OsCHS10, OsCHS22, OsDFR7 and OsANR1 were predominately expressed in pre-emergence inflorescence. OsLDOX1, OsLAR1 and OsDFR2 had relatively high levels in pistil. OsLDOX5 and OsF2H had high expression levels in embryos at 25 DAP. OsCHS13, OsCHS14, OsCHS15, OsLDOX2, OsLDOX7, OsF3H1, OsF3H2, OsF3H8, OsANR4, OsANR6 and OsF3'H6 exhibited a high expression level in shoots. OsCHS3, OsCHS27, OsCHI1, OsCHI2, OsCHI5, OsCHI6, OsCHI7, OsFLS3, OsFLS3, OsF3H5, OsF3H6, OsF3H7, OsLDOX3, OsLDOX4, OsF3 H8, OsF3 H10 and OsANR2 were highly expressed in multiple tissues. In addition, all genes had relatively low expression levels in seeds at 10 DAP. These results indicate that gene families in the flavonoid biosynthesis pathway might be associated with distinct physiological functions in rice.
Many plant species are adversely affected by environmental stresses during their lifespan. Flavonoids, a group of the most bioactive plant secondary metabolites, play the most important roles in stress responses through serving as ROS scavengers by wellknown antioxidant activities [5,22,23,39,65]. In the present study, gene expression profiles from the GEO database of rice seedlings under cold and salt stress were investigated to analyze the expression patterns of genes involved in the flavonoid biosynthesis pathway ( Figure 5). Firstly, it should be noted that 29 genes were not detected in the two microarray datasets. Under cold stress, most of genes were upregulated in roots and shoots at almost all tested time points, which indicates flavonoids can play an important role in cold acclimation [22,23]. Unlike the cold stress, the majority of genes showed no response or were downregulated at almost all time points under salt stress, while some genes in roots and shoots were upregulated at some time points. For instance, OsDFR6, OsANR6 and OsLDOX2 were upregulated in roots and shoots at almost all time points. OsCHS12 was upregulated in roots from 24 h salt stress. OsLDOX1 and OsF3H4 were upregulated for a mere 3 h in shoots after salt stress. OsF3H7, OsLAR, OsCHS2 and OsCHS27 were only upregulated in roots after salt stress for 24 h. OsCHS28 and OsF3H2 were upregulated in roots during the recovery. It is noteworthy that OsCHS12, OsCHS28, OsF3H2, OsDFR6 and OsLDOX2 were strongly upregulated under cold and salt stresses, suggesting that they may play critical roles in a plant's adaption to cold-and salt-stress conditions. To further examine the accuracy of the above results, we analyzed these five key genes' (OsCHS12, OsCHS28, OsF3H2, OsDFR6 and OsLDOX2) relative expression levels via qRT-PCR under 200 mM NaCl and 4 • C treatment for 0 h, 3 h and 24 h ( Figure 6). The qRT-PCR analysis results supported our conclusion mentioned above that the five genes (OsCHS12, OsCHS28, OsF3H2, OsDFR6 and OsLDOX2) were cold-and salt-stress response genes.  and OsLDOX2 were strongly upregulated under cold and salt stresses, suggesting that they may play critical roles in a plant's adaption to cold-and salt-stress conditions. To further examine the accuracy of the above results, we analyzed these five key genes' (OsCHS12, OsCHS28, OsF3H2, OsDFR6 and OsLDOX2) relative expression levels via qRT-PCR under 200 mM NaCl and 4 °C treatment for 0 h, 3 h and 24 h ( Figure 6). The qRT-PCR analysis results supported our conclusion mentioned above that the five genes (OsCHS12, OsCHS28, OsF3H2, OsDFR6 and OsLDOX2) were cold-and salt-stress response genes.
(a) (b) It is commonly accepted that phytohormones play essential roles in the regulation of plant growth and development. The crosstalk between flavonoids and phytohormone pathways has been reported [11,[15][16][17]. However, the function of the gene families from flavonoid-related pathways under phytohormone treatments has not been well-documented in rice. In order to explore it, the expression profiles from the NCBI GEO databases were obtained to analyze the genes' relative expression patterns under 50 μM ab- It is commonly accepted that phytohormones play essential roles in the regulation of plant growth and development. The crosstalk between flavonoids and phytohormone pathways has been reported [11,[15][16][17]. However, the function of the gene families from flavonoid-related pathways under phytohormone treatments has not been welldocumented in rice. In order to explore it, the expression profiles from the NCBI GEO databases were obtained to analyze the genes' relative expression patterns under 50 µM abscisic acid (ABA), 10 µM gibberellin (GA), 10 µM auxin (IAA), 1 µM brassinosteroid (BR), 1 µM cytokinin (CTK) and 100 µM jasmonic acid (JA) treatments (Figure 7). First, it is worth noting that 17 out of the total of 85 genes were not detected in the database. We observed that most of the genes were suppressed or downregulated in all six treatments, especially in GA, IAA and BR treatments, whereas some others were upregulated in one or multiple treatments (Figure 7). For instance, OsF3H7, OsF2H and OsANR2 were found to be upregulated only under ABA treatment. OsCHS10, OsCHS16, OsANS1, OsF3 H4 and OsF3 H2 were only upregulated under CTK treatment. Expression of OsCHS8, OsCHS13, OsCHS14, OsCHS27, OsCHI1, OsFLS3, OsANR3, OsDFR5, OsLDOX3, OsLDOX6 and OsF3H2 were elevated solely under JA treatment. OsLDOX2, OsCHI2, OsCHI4 and OsCHS2 were upregulated when applied with ABA and JA treatments. OsCHS6 and OsCHS28 were upregulated following GA, CTK and JA exposure. Expression of OsCHI6 and OsCHI7 was higher under ABA and CTK. In addition, OsCHS18 for CTK and JA, OsF3H5 for IAA and CTK, OsANS2 for ABA, GA and JA, OsANR6 for ABA, CTK and JA, OsCHS15 for BR, CTK and JA, OsDFR3 for IAA and JA and OsLDOX1 for ABA, IAA and JA were all upregulated. These findings demonstrate that these genes involved in flavonoid biosynthesis might participate in the relevant hormone signaling pathways.

Sequence Alignment and Phylogenetic Analysis of CHS Gene Family
The CHSs belong to the representative type III polyketide synthase (PKS) superfamily, which is widely distributed in the plant kingdom from the lower bryophytes to angiosperms [57,104]. To date, at least one putative CHS gene candidate can be identified in every land plant species based on its corresponding available genomic data [57,105]. We found that the catalytic triad C164-H303-N336 inherited from the ancient PKS III superfamily [106] was highly conserved in almost all 28 OsCHS genes through using Medicago

Sequence Alignment and Phylogenetic Analysis of CHS Gene Family
The CHSs belong to the representative type III polyketide synthase (PKS) superfamily, which is widely distributed in the plant kingdom from the lower bryophytes to angiosperms [57,104]. To date, at least one putative CHS gene candidate can be identified in every land plant species based on its corresponding available genomic data [57,105]. We found that the catalytic triad C164-H303-N336 inherited from the ancient PKS III superfamily [106] was highly conserved in almost all 28 OsCHS genes through using Medicago sativa MsCHS as a template to align the sequence of OsCHSs with the aid of the online servers PDB and ESPript (Supplementary Figure S1). As the gatekeeper [106], phenylalanine at position 215 (F215) connected with CoA-binding was strictly conserved, while the other at F265 was relatively diverse. For instance, OsCHS2, OsCHS11, OsCHS12, OsCHS24 and OsCHS26 included a tyrosine (Y265), OsCHS9, OsCHS13, OsCHS14 and OsCHS15 contained glycine (G265), OsCHS3 and OsCHS28 included an isoleucine (I265), and OsCHS25 contained a cysteine (C265) as a substitute for F265, which probably caused their functional diversity and divergent enzymatic activities of OsCHS genes. In addition, as a CHS/STS characteristic signature sequence [107], the WGVLFGFGP375GLT motif was also maintained in almost all 28 OsCHS genes (Supplementary Figure S1), although their conserved motif mentioned above might show some substituted residues in some cases. Plant CHSs further fell into three distinct clades based on the phylogenetic relationships of OsCHSs and their homologs from other plant species on the NCBI (Supplementary Figure S2 and Supplementary Table S1). The majority of OsCHSs cluster into clade II, belonging to the oldest CHSs, except for six genes (OsCHS1-3, OsCHS8 and OsCHS27-28) clustered into the largest clade I belonging to the typical CHS and 2 genes (OsCHS10 and OsCHS22) clustered into clade III belonging to the members of anther-specific CHS-like (ASCL) proteins. Interestingly, all OsCHSs were more closely clustered with ZmCHSs, TaCHSs, HvCHSs and SiCHSs, indicating that the CHS homologs from Gramineae crops were more closely related.

Sequence Alignment and Phylogenetic Analysis of CHI Gene Family
CHIs are members of the CHI-folding protein family and can be categorized into four types based on sequence similarity, including types I-IV [108,109]. Generally, around 70% or more identical belong to the same type, while less than 50% identical are classified into different types. Type I CHI proteins are widespread in vascular plants, while type II CHI mainly exists in leguminous plants. Type III CHI proteins are the fatty-acid-binding proteins (FAPs) that participate in fatty-acid biosynthesis and are widely distributed in green algae and land plants [57]. Type IV CHI proteins are closely related to the ancestors of plant CHIs (CHI-like proteins (CHILs) that first appeared in mosses and evolved from FAPs) and are found in basal and higher plant species, including mosses, liverworts, lycophytes, ferns, gymnosperms and angiosperms [108,109]. In this study, we identified seven gene candidates encoding OsCHIs. Alignment of the OsCHI sequences to MsCHI showed that only OsCHI3 shared the highly conserved critical catalytic residues essential for CHI active sites, including R36, T48, Y106 and N113, except for serine at position 190 (S190) as a substitute for T190 (Supplementary Figure S3), which was in line with what has been previously reported in rice. The mutation of OsCHI3 can cause the golden hull and internode 1 phenotype [110]. However, the other six OsCHIs did not retain the catalytic core of the CHI protein fold and were substituted at many of these critical sites, which demonstrated they were divided into different types. A phylogenetic analysis of CHIs among plant species indicated that plant CHIs were grouped into four distinct clades (Supplementary Figure S4 and Supplementary Table S2), which was consistent with previous studies [108,109]. Among them, OsCHI3 belonged to type I, OsCHI6-7 clusterd to type IV, type II included OsCHI2, OsCHI4 and OsCHI5, and OsCHI1 was classified into type III, which was consistent with the results from alignment of the CHI sequences and analysis of their conserved residues. These findings indicated that the formed OsCHI gene family had undergone diverse functional divergence during the subsequent evolutionary process.

Sequence Alignment and Phylogenetic Analysis of 2OGD Gene Family (F3H, FLS, LDOX and ANS)
Four flavonoid biosynthetic enzymes, F3H, FLS, LDOX and ANS, are members of the 2-oxoglutarate dependent dioxygenase (2-OGD) protein superfamily. The 2OGDs are localized in the cytosol and are non-heme, iron-containing proteins that can incorporate 2-oxoglutarate (2OG) and molecular oxygen (O 2 ) into various substrates to form oxidized products (such as flavonoids), succinate and CO 2 [111]. A phylogenetic analysis showed that plant 2-OGDs were divided into three classes, including DOXA, DOXB and DOXC [111]. F3H, FLS, LDOX and ANS were classified into the DOXC class, in which F3H was in the DOXC28, whereas FLS, LDOX and ANS were in DOXC47 [57,111]. In our study, 22 gene candidates, including eight OsF3Hs, three OsFLS, nine OsLDOX and two OsANS genes were identified ( Table 1). Most of the rice F3H, FLS, LDOX and ANS genes were highly conserved in the 2-OGD specific and critical residue forming motifs (Supplementary Figure S5), including the ferrous iron binding motif (H232-x-D234-xn-H288), 2-oxoglutarate binding motif (Y217 and R298-x-S300) and the proper folding motif of the 2-OGD polypeptide (G78, H85, P198, G272) [61,112]. In addition, the substrate binding motifs (F142, F144, K213, F304, E306) were only found in OsFLS1, which was consistent with recent findings [44]. Furthermore, the two FLS-specific motifs were also found in the three OsFLSs, except OsFLS2 only included one motif due to the truncation of the C-end [113]. Interestingly, 12 residues relevant for F3H activity existed in almost all rice F3H, FLS, LDOX and ANS genes (Supplementary Figure S5) [114], indicating that the F3H gene may be the ancestor of FLS, LDOX and ANS genes [57]. A phylogenetic analysis of the plants' 2-OGDs aligned here grouped the genes into four distinct clades (Supplementary Figure S6 and Supplementary Table S3). Based on the phylogenetic analysis, all the remaining genes were included in the corresponding gene family, except OsFLS2-3 clustered with OsLDOXs in clade IV, suggesting the OsFLS gene family performed a different function during the expansion process 3.8. Sequence Alignment and Phylogenetic Analysis of CYP450 Gene Family (F3 H, F3 5 H, FNSII and F2H) F3 H, F3 5 H, FNSII and F2H are involved in flavonoid biosynthesis and belong to the huge cytochrome P450 (CYP450) superfamily of heme-containing membrane proteins localized on the cytosolic surface of the endoplasmic reticulum (ER) in eukaryotes [56]. CYP450s catalyze monooxygenase/hydroxylation reactions in various metabolic processes by insertion of an O atom using molecular O 2 and NADPH as co-substrates [55,56]. CYP450s are generally categorized into families when having 40% or more amino acid sequence identity, and those are further grouped into subfamilies with 55% or more identity [57]. The F3 H and F3 5 H enzymes that are divided into the CYP75 family exist only in gymnosperms and angiosperms, while FNSII and F2H are members of the CYP93 family found solely in angiosperms [57,115]. In this study, ten F3 Hs, one F3 5 H, one FNSII and one F2H genes were identified from rice (Table 1). Sequence alignment showed that the majority of the rice F3 H, F3 5 H, FNSII and F2H genes contained the CYP450-featured conserved motif (Supplementary Figure S7), including the proline-rich hinge region (LPPGPxxxP) required for the CYP450 s optimal orientation [116], the oxygen-binding pocket motif [117], the ExxR motif and PERF motif to form the E-R-R triad for core structure stabilization [118] and the heme-binding domain (FGxGRRxCxG) [119]. In addition, typical F3 H-specific motifs were also identified in all OsF3 H genes [58], such as the "VVVAAS" motif that has high similarity in both OsF3 H9 and OsF3 H10, the "VDVKG" motifs that exist in almost all OsF3 Hs except for OsF3 H6 (although there are amino acid substituted residues in some case). It should also be noted that six substrate recognition sites (SRSs) were also conserved in some or multiple sites, and the oxygen-binding pocket motif was included in SRS4 (Supplementary Figure S7). The phylogenetic relationship of F3 H, F3 5 H, FNSII and F2H determined these proteins were divided into four clades. Among them, OsFNSII and OsF2H clustered with the typical FNSIIs and OsF3 5 H clustered with known F3 5 Hs, while OsF3 H1-8 and OsF3 H9-10 (CYP75B4 and CYP75B3) were separately divided into two distinct clades (clades I and II) (Supplementary Figure S8 and Supplementary Table S4) showing the functional divergence of OsF3 Hs during the evolutionary process.
3.9. Sequence Alignment and Phylogenetic Analysis of SDR Gene Family (DFR, LAR and ANR) DFR, LAR and ANR belong to the short-chain dehydrogenase/reductase (SDR) superfamily, which is the huge NAD(P)(H)-dependent oxidoreductase family ubiquitous in viruses, archaea, prokaryotes and eukaryotes [120]. Plant SDRs are categorized into five types: "classical", "divergent", "extended", "atypical" and "unknown", based on their structures, motifs and active sites [121]. The DFR and ANR are member of the SDR108E family ("extended" type SDRs), while the LAR is classified into the SDR460A family in the "atypical" type [57,121]. We identified eight OsDFRs, six OsANRs and one OsLAR in rice via genome-wide analysis. The resulting alignment data indicated the catalytic core in these genes was highly conserved except for some variant residues (Supplementary Figure S9). They included the glycine-rich Rossmann dinucleotide-(NADH/NADPH) binding motif, the putative conserved catalytic triad (S128, Y163, and K167) and the substrate binding region [59,60]. A phylogenetic analysis was performed by the homologs of DFR, LAR and ANR sequences (Supplementary Figure S10 and Supplementary Table S5) and revealed that these proteins were divided into five clades. Among them, OsLAR clustered with the known LARs, while the OsDFRs and OsANRs were separately divided into three and two distinct clades, respectively, indicating their functional divergence. OsDFR1 belonging to the typical DFRs was consistent with previous findings showing its role in the regulation of the rice pigmentation in hull and pericarp [122,123]. OsANR1-5 clustered with the characterized ANRs, while OsANR6 clustered with DFR-likes.

Discussion
It is well known that flavonoids play significant roles in a plant's adaption to the diverse environmental conditions due to their potent antioxidant activity [5,39,65]. The metabolites and enzymes associated with the flavonoid biosynthesis pathway have been extensively investigated in dicotyledon plants, while only limited attention has been paid to monocots such as rice [2,58]. Given the fact that rice is a staple food, the identification and functional analysis of genes involved in flavonoid biosynthesis should be characterized comprehensively and systematically at the rice genome level. In the present study, we mainly focus on the key structural enzymes involved in the biosynthesis of the flavonoid scaffold molecules by using the recently released rice genome.
In this study, 85 genes belonging to 13 gene families related to the rice flavonoid biosynthetic pathway were identified. On the basis of structural features, motif analyses and phylogenetic relationships, these 13 gene families were further subdivided into 5 distinct lineages, including the PKS III subfamily, CHI-folding protein family, 2-OGD superfamily, CYP450 superfamily and SDR superfamily (Figure 2). The majority of the 28 OsCHSs gene candidates were consistent with the previous studies with the exception of the OsCHS19 (Os10g0168300) gene [74,75]. OsCHS19 contained the typical Chal_sti_synt_N and Chal_sti_synt_C two domains and the characteristic catalytic triad (except for the C164) inherited from the PKS III ancestor, and the "gatekeeper" connected with CoA-binding at positions 215 and 256 were also highly conserved ( Figure S1). Based on sequence features and conserved amino acid residue analysis, we proposed OsCHS19 belonged to one of CHS gene families. Therefore, 28 OsCHS members were finally selected to analyze in this study. Notably, seven OsCHS genes, including OsCHS4, OsCHS5, OsCHS7, OsCHS11, OsCHS17, OsCHS19 and OsCHS20, had no mRNA signal based on the expression profiles of RNA-Seq and microarray datasets, suggesting they might be pseudogenes (although these gene duplicates were retained during the evolutionary process). For OsCHIs, OsCHI3 was in line with the recent finding of its belonging to the typical type I [110], while the remaining six OsCHIs were also examined and were classified as type II, III or IV ( Figure S4). The finding indicated rice retained all four types of CHI during the subsequent evolutionary process. A recent study reported that two copies of OsF3 H (CYP75B3 and CYP75B4) genes belonging to the CYP450 family presented in rice genome [41,43]. However, 10 OsF3 H gene copies were identified in this study, and OsF3 H9 and OsF3 H10 (also termed CYP75B4 and CYP75B3) clustered with the typical F3 Hs, while the remaining eight OsF3 Hs clustered into the other clade ( Figure S8), indicating the functional divergence of OsF3 Hs may be formed during the long-term evolutionary process via gene duplicates. Anthocyanin and proanthocyanidin are synthesized by DFR, LAR and ANR during the last steps of the pathway and are among the less understood flavonoid biosynthetic genes in rice. Although eight OsDFRs, six OsANRs and one OsLAR in rice were identified via genome-wide analysis in this study, their biological function remains unclear and needs to be further explored in the future. Recently, a widespread and systematic classification of the plant 2OGD superfamily has been performed [115]. Based on the study, OsANS, OsF3H and OsFLS genes in the rice genome were divided into the DOXC group, and it was proposed that only four rice 2-OGD genes participated in the rice flavonoid biosynthesis pathway, including two OsANSs (Os01g0372500 and Os06g0626700), one OsF3H (Os04g0662600) and one OsFLS (Os02g0767300) [115]. Except for the OsANS genes, the others two genes were different from the results mentioned above, and eight OsF3Hs and three OsFLSs were identified in this study. Moreover, nine OsLDOXs were also examined. All these genes contained the 2-OGD specific and critical motifs and were categorized into their known corresponding gene families (except for OsFLS2-3, which clustered with the OsLDOXs) ( Figure S5). Previous reports show naringenin can be converted to flavone in vitro by OsFNSIs (Os04g0581000 and Os03g0122300) [51,52]. It was difficult to distinguish the FNS I and F3H in rice due to their high amino acid homology. Therefore, the OsFNSIs were classified as OsF3Hs (termed OsF3H1 and OsF3H2) in our study based on phylogenetic analysis and the Rice Annotation Project Network. In addition, OsF3H1 (Os03g0122300), mediating brown planthopper resistance via a flavonoid pathway, had also been confirmed in rice [30]. These findings suggest that more than four 2-OGD genes are involved in the rice flavonoid pathway. We propose that these inconsistent results may be due to the limitations of previous identification methods by using older software or due to continuous updates to the rice genome over time. Notably, OsFLS1 belonged to the conventional FLS gene family consistent with previous report in rice [44], while OsFLS2-3 clustered with known LDOXs (Figure S6), indicating the OsFLS gene family had undergone functional divergence or is undergoing divergence through gene duplication, and the biological function of OsFLS2-3 needs to be experimentally verified in the future. Furthermore, OsFLS and LDOXs/ANSs were closer in genetic relationship in contrast to F3Hs, and almost all of the rice FLS, LDOX and ANS genes existed the residues of F3H activity ( Figure S5), suggesting that F3H may have been the first 2OGD gene for flavonoid biosynthesis, and the other 2OGD genes involved in flavonoid biosynthesis may have evolved from F3H [57].
It has been previously reported that flavonoids can be separately synthesized in cytoplasm (such as endoplasmic reticulum/ER), chloroplast and nucleus [18]. As expected, most of these 85 genes were predicted to be localized to the cytoplasm, followed by the chloroplast and nucleus in our study ( Table 1). As it has been reported that exon-intron structures might have some regulatory functions in gene expression and evolution, absence of exons-introns might lead to fusion and rearrangement of disparate chromosome segments and difference of expression patterns in numerous gene families [98]. In this study, except for OsF3 H3, OsF3 H5, OsF3 H7, OsANS1 and OsANS2, which had no intron, the other 80 genes separately contained 1, 2, 3, 4, 5 and 9 introns (Figure 2). These findings demonstrate that intron loss and gain may have occurred or was occurring during expansion of these gene families in rice. For instance, most of the 28 OsCHS genes contained a single intron, except for OsCHS1, OsCHS5 and OsCHS19, which had two introns and OsCHS11 had three introns (Figure 2), which was consistent with previous reports that most plant CHS family genes contain a single intron [124]. In seven gene candidates encoding the OsCHIs gene family, OsCHI3 and OsCHI5 contained three introns and OsCHI1, OsCHI6 and OsCHI7 had four introns, which was similar to other CHI genes, as they are generally composed of three or four introns ( Figure 2). Notably, OsCHI2 and OsCHI4 had as many as nine introns, which is a rare phenomenon among gene structures but is similar to findings in cotton species [125]. Interestingly, we also found the distribution of 85 rice flavonoid-related genes on Chrs displayed a certain physical positon with bias to Chr4, Chr7, Chr9 and Chr10. Most of gene family members with high sequence similarity were clustered together on the Chrs in close proximity and were located in the genome duplication zone as duplicate gene pairs. These findings might be consistent with recent evidence that the rice genome has undergone genome duplication and expansion during the long-term evolutionary process, and tandem gene duplication represent the major proportion of the duplication events [126][127][128]. Several previous studies have shown that enzymes of the flavonoid biosynthetic pathway often assemble as complexes to facilitate metabolite channeling [99][100][101]. We found gene clusters with a relative high density were also observed among the distinct gene family members on Chrs 4, 6, 7, 8, 9 and 10, which might provide the chance for these genes to form macromolecular complexes in physical position or space during the rice flavonoid biosynthesis process.
Earlier research demonstrated that gene duplication events, as a prevailing feature in plant genomes and include segmental and tandem duplication events, are the principle cause of the expansion of gene families [85]. With the aid of MCScanX software, we explored the expansion mechanisms of the 85 rice flavonoid-related genes and found 31 genes (31/85) were located in the rice genome duplication zone, including 7 segmental duplication pairs and 15 tandem duplication pairs ( Figure 3 and Table 2). Notably, of these 85 tested genes, these duplication events existed only in six gene families (CHS, CHI, F3H, LDOX, ANS and ANR), while the remaining seven gene families (FLS, DFR, LAR and CYP450s) had no duplication events, which indicated that the expansion mechanisms of these 13 gene families were greatly different. We further noted that the numbers and types of duplication modes were greatly diverse among these six gene families which had duplication events. For instance, segmental duplications occurred in the expansion of the CHI, F3H and ANS gene families, while tandem duplication occurred in the LDOX and ANR gene families. Significantly, the instances of tandem duplication were more than that of segmental duplication in the CHS gene family, suggesting that tandem duplication was the dominant contributor to the expansion of the rice CHS gene family, which was consistent with the previous Han et al. (2017) investigation [75]. Meanwhile, we found the quantity of genes undergoing each duplication mode was directly correlated with that of the corresponding gene family. We speculated that expansion of these exclusive gene families may play essential roles for rice to adapt to the terrestrial environment during the process of evolution, and further verification was required to make the conclusion. Although no duplication events were identified in DFR and F3 H gene families via MCScanX, eight and 10 gene members, respectively, remained to be found separately in them, indicating that expansion of both gene families may be from other duplication modes, such as dispersed duplication, retroduplication, transposon-mediated duplication or proximal duplication. Taken together, these findings demonstrated that 13 flavonoid-related gene families in rice had undergone volatile, lineage-specific gene expansion.
Previous studies reported that the origin time of the Oryza tribe (Oryzeae), the Oryza genus and the Oryza-AA genome were about 24, 15 and 5 Mya [129][130][131][132], respectively. In this study, the divergence times of all 22 gene pairs were estimated, and we found these duplication events occurred around between 0.55 and 27.65 Mya (Table 2), indicating the current flavonoid gene families in rice had at least four expansion events during the evolutionary process. Specifically, the divergence times of ten OsCHS gene pairs, including two segmental and eight tandem duplication events, were calculated to be a wider range of times, 7.19-27.65 Mya. Among them, two were between the origin of the Oryza genus and the diversification of Oryzeae, one was earlier than the origin of Oryzeae and the remaining seven were between the origin of the Oryza-AA genome and the diversification of the Oryza genus. These findings indicate that the OsCHS gene family in the rice genome was duplicated at least three times during the long-term evolutionary process, and the duplication events mainly occurred in the time of the Oryza genus differentiation. In addition, the divergence times of six duplication events from OsF3H and OsANR genes were estimated to be 16.04-24.26 Mya, indicating that these gene families' duplications mainly occurred between the origin of the Oryza genus and the differentiation time of Oryzeae. Meanwhile, the estimated divergence time of three duplication events of OsLDOXs was 3.32, 6.36 and 16.51 Mya, close to the origin time of the Oryza-AA genome and the Oryza genus, suggesting that at least two duplication events of the OsLDOX gene families occurred during the recent differentiation of the Oryza-AA genome. Furthermore, the divergence time (0.55 Mya) of one OsANS duplication event was also calculated and found to be later than the origin of the Oryza-AA genome, indicating that expansion of OsANSs occurred recently in the differentiation of Oryza-AA genome. Based on these data, we summarized that gene duplications associated with the rice flavonoid synthetic pathway mainly occurred after the origin of the Oryza genus. Moreover, the ratio of Ka/Ks substitutions is used to determine the selective pressure after duplication. In our study, analysis of 22 duplicated gene pairs from the rice flavonoid biosynthetic pathway indicated that 16 (16/22) gene pairs were less than 1.0, while five gene pairs were greater than 1.0, which meant the majority of these genes underwent negative (or purifying) selection during the evolution (Table 2).
Biochemical characteristics, function and regulation can be reflected by distinct gene's expression patterns in organisms [102,103]. The recently released high-throughput transcriptome data can provide valuable resources and information for analyzing rice functional genes at the genome-wide level. Given the close relationship between flavonoids and the plant's environmental adaption, expression profiles of 85 genes related with flavonoid biosynthesis were detected in 11 various tissues and under several stresses (including phytohormone, cold and salt) (Figures 4-7). In our study, most genes showed tissue-specific expression patterns (Figure 4), some of which were relatively high levels in multiple tissues (except for seed at 10 DAP), indicating that different gene families in the flavonoid biosynthesis pathway might have distinct physiological functions in rice. For instance, OsANS1, OsANS2, OsFLS1, OsF3H4, OsF3 H1 and OsCHI4 were relatively highly abundant in anther, while OsFLS2, OsF3 H2, OsANR3, OsDFR3 and OsLDOX9 had relatively high expression in leaves, which indicates these genes might play critical roles working together in anther and leaves. These results were similar to previous findings in other plant species [43,133,134]. In addition, plant species are adversely affected by abiotic or biotic stresses during their lifespan [135]. Previous studies revealed that genes associated with flavonoid biosynthesis participated in stress responses, such as cold and salt stresses [22,23,25,26]. Our result showed most genes strongly respond to cold stress ( Figure 5), which is in line with the belief that flavonoids play important roles in cold acclimation [22,23,25]. Although the majority of genes were suppressed or had no response under salt stress ( Figure 5), we still noted that some genes, such as OsCHS12, OsCHS28, OsF3H2, OsF3H4, OsDFR6, OsANR6 and OsLDOX1-2, showed strong upregulation during salt stress and may act as good candidate genes for salt stress. Interestingly, OsCHS12, OsCHS28, OsF3H2, OsDFR6 and OsLDOX2, with strong responses to cold and salt stresses, may play important roles in response to cold and salt stresses ( Figures 5 and 6), and their actual functions need to be further verified in the future. Further, genes involved in flavonoid pathways have been reported to be regulated by phytohormones [15]. However, the flavonoid-related genes' response to phytohormone had not been studied in rice. In this study, we found that most genes were suppressed in six phytohormone treatments (Figure 7), especially under GA, IAA and BR treatments, which was consistent with the antagonized interaction between flavonoids and plant hormones [11,[15][16][17]. Notably, there were some genes strongly responsive to plant hormone treatments, indicating these genes might participate in the relevant hormone signaling pathways. For instance, we found that OsCHS10, OsCHS16, OsANS1, OsF3 H4 and OsF3 H2 were upregulated under CTK treatment, while OsF3H7, OsF2H and OsANR2 were upregulated under ABA treatment. Furthermore, expression level of OsCHS8, OsCHS13, OsCHS14, OsCHS27, OsCHI1, OsFLS3, OsANR3, OsDFR5, OsLDOX3, OsLDOX6 and OsF3H2 were elevated under JA treatment, which was in line with the previous findings from other plant species showing their role in the regulation of JA in flavonoid accumulation and the stimulation of flavonoid biosynthetic genes [133]. Taken together, the distinct expression patterns of genes associated with the flavonoid scaffold biosynthesis in this study provides some valuable information for further functional characterization.

Conclusions
A systematic analysis of 85 genes associated with the rice flavonoid scaffold biosynthetic pathway, including chromosomal location, phylogenetic relationship, gene structure, motifs, duplication events, selective forces and expression patterns, was performed. The results revealed these 85 genes belong to 13 gene families and can be further categorized into five distinct lineages. Subsequently, 22 duplication events, including 7 segmental and 15 tandem, were identified, and it was found that flavonoid-related gene families in rice had mainly undergone lineage-specific gene expansion under purifying selection at different evolutionary time points, especially in the differentiation time of the Oryza genus. Moreover, expression patterns from RNA-Seq, microarray and qRT-PCR analysis demonstrated that genes involved in flavonoid scaffold biosynthesis showed spatial and temporal regulation in a tissue (or stress)-specific way, indicating they played important roles in plants' growth, development and stress responses. We inferred OsCHS12, Os-CHS28, OsF3H2, OsDFR6 and OsLDOX2 may be potential candidate genes for rice cold and salt resistance breeding, although their actual biological functions need to be explored in the future.
Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/genes13030410/s1, Figure S1: Protein sequence alignment of Os-CHSs against A. thaliana CHS (AtCHS) and M. sativa CHS (MsCHS); Figure S2: Phylogenetic tree of the CHS proteins; Figure S3: Protein sequence alignment of OsCHIs against MsCHI; Figure S4: Phylogenetic tree of the CHI proteins; Figure S5: Protein sequence alignment of flavonoid biosynthesis-related 2OGDs, including ANSs, FLSs, F3Hs and LDOXs; Figure S6: Phylogenetic tree of the F3H, FLS, LDOX and ANS proteins; Figure S7: Protein sequence alignment of flavonoid biosynthesis-related CYP450s, including F3 Hs, F3 5 Hs, FNSIIs and F2H; Figure S8: Phylogenetic tree of flavonoid biosynthesisrelated CYP450s, including F3 Hs, F3 5 Hs, FNSIIs and F2Hs; Figure S9: Protein sequence alignment of flavonoid biosynthesis-related SDRs, including DFRs, LARs and ANRs; Figure S10: Phylogenetic tree of DFRs, LARs and ANRs; Table S1: The proteins used for CHS phylogenetic tree; Table S2: The proteins used for CHI phylogenetic tree; Table S3: The proteins used in the phylogenetic tree of F3H, FLS, LDOX and ANS; Table S4: The proteins used in the phylogenetic tree of F3'H, F3'5'H, FNSII and F2H; Table S5: The proteins used in the phylogenetic tree of DFR, LAR and ANR; Table S6: The 13 gene families involved in the rice flavonoid scaffold biosynthesis were further examined via HMMER searches; Table S7: Primers used in real-time PCR.
Author Contributions: J.W. performed all of the statistical analyzation and validation and wrote the manuscript. C.Z. analyzed parts of the GEO raw data. J.W. and Y.L. conceptualized and designed the experiments as well as modified the manuscript. All authors have read and agreed to the published version of the manuscript.