Next Article in Journal
Effects of Postharvest Water Deficits on the Physiological Behavior of Early-Maturing Nectarine Trees
Previous Article in Journal
Effects of Green Brazilian Propolis Alcohol Extract on Nociceptive Pain Models in Rats
Order Article Reprints
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:

Automatic Identification of Players in the Flavonoid Biosynthesis with Application on the Biomedicinal Plant Croton tiglium

Genetics and Genomics of Plants, CeBiTec & Faculty of Biology, Bielefeld University, 33615 Bielefeld, Germany
Department of Plant Sciences, Evolution and Diversity, University of Cambridge, Cambridge CB2 3EA, UK
Author to whom correspondence should be addressed.
Plants 2020, 9(9), 1103;
Received: 30 June 2020 / Revised: 11 August 2020 / Accepted: 25 August 2020 / Published: 27 August 2020
(This article belongs to the Section Plant Genetics, Genomics and Biotechnology)


The flavonoid biosynthesis is a well-characterised model system for specialised metabolism and transcriptional regulation in plants. Flavonoids have numerous biological functions such as UV protection and pollinator attraction, but also biotechnological potential. Here, we present Knowledge-based Identification of Pathway Enzymes (KIPEs) as an automatic approach for the identification of players in the flavonoid biosynthesis. KIPEs combines comprehensive sequence similarity analyses with the inspection of functionally relevant amino acid residues and domains in subjected peptide sequences. Comprehensive sequence sets of flavonoid biosynthesis enzymes and knowledge about functionally relevant amino acids were collected. As a proof of concept, KIPEs was applied to investigate the flavonoid biosynthesis of the medicinal plant Croton tiglium on the basis of a transcriptome assembly. Enzyme candidates for all steps in the biosynthesis network were identified and matched to previous reports of corresponding metabolites in Croton species.

1. Introduction

Flavonoids are a group of specialised plant metabolites comprising more than 9000 identified compounds [1] with numerous biological functions [2]. Flavonoids are derived from the aromatic amino acid phenylalanine in a branch of the phenylpropanoid pathway, namely, the flavonoid biosynthesis (Figure 1). Generally, flavonoids consist of two aromatic C6-rings and one heterocyclic pyran ring [3]. Products of the flavonoid biosynthesis can be assigned to different subgroups, including chalcones, flavones, flavonols, flavandiols, anthocyanins, proanthocyanidins (PAs), and aurones [4]. These subclasses are characterised by different oxidation states [5]. In plants, the aglycons are often modified through the addition of various sugars, leading to a huge diversity [6].
Flavonoids have important developmental and ecological roles in plants, including the control of auxin transport [7], the attraction of pollinators [8], protection of plants against UV light [9], and defence against pathogens and herbivores [10]. Different types of flavonoids can take up these roles. Anthocyanins appear as violet, blue, orange, or red pigments in plants recruiting pollinators and seed dispersers [8]. PAs accumulate in the seed coat leading to the characteristic dark colour of seeds in many species [8]. Flavonols are stored in their glycosylated form in the vacuole of epidermal cells or on occasion in epicuticular waxes [4]. They possess several physiological functions including antimicrobial defence, scavenging of reactive oxygen species (ROS), UV protection, signalling, and colouration of flower pigmentation, together with anthocyanins [9]. Consequently, the activity of different branches of the flavonoid biosynthesis needs to be adjusted in response to developmental stages and environmental conditions. While the biosynthesis of anthocyanins can be triggered by abiotic factors such as light, temperature, dryness, or salts [11], PAs are formed independently of external stimuli in the course of seed development, leading to a brown seed colour [11].
As the accumulation of flavonoids in fruits and vegetables [12] leads to colouration desired by customers, this pigment pathway is of biotechnological relevance. Therefore, the flavonoid biosynthesis was previously modified by genetic engineering in multiple species (as reviewed in [13]). Flavonoids are not just interesting colourants, but they have also been reported to have nutritional benefits [14] and even potential in medical applications [15]. Reported anti-oxidative, anti-inflammatory, anti-mutagenic, and anti-carcinogenic properties of flavonoids provide health benefits to humans [16]. For example, kaempferols are assumed to inhibit cancer cell growth and induce cancer cell apoptosis [17]. Heterologous production of flavonoids in plants is considered a promising option to meet customers’ demands. Studies have already demonstrated that the production of anthocyanins in plant cell cultures is possible [18,19].
Flavonoid biosynthesis is one of the best-studied pathways in plants, thus serving as a model system for the investigation of specialised metabolism [9]. Academic interest in the synthesis of flavonoids spans multiple fields, including molecular genetics, chemical ecology, biochemistry, and health sciences [9,20]. In particular, the three subgroups—flavonols, anthocyanins, and PAs—are well studied in the model organism Arabidopsis thaliana [21]. Since a partial lack of flavonoids is not lethal under most conditions, there are large numbers of mutants with visible phenotypes caused by the knockout of various genes in the pathway [22]. For example, seeds lacking PAs show a yellow phenotype due to the absence of brown pigments in the seed coat, which inspired the name of mutants in this pathway—transparent testa [23]. While the early steps of the flavonoid aglycon biosynthesis are very well known, some later steps require further investigation. In particular, the transfer of sugars to PAs and anthocyanidins offer potential for future discoveries [24].
The core pathway of the flavonoid aglycon biosynthesis comprises several key steps that allow effective channelling of substrates in specific branches (Figure 1). A type III polyketide synthase, the chalcone synthase (CHS), catalyses the initial step of the flavonoid biosynthesis, which is the conversion of p-coumaroyl-Coenzyme A (p-coumaroyl-CoA) and three malonyl-CoA into naringenin chalcone [25]. CHS is well studied in a broad range of species since a knock-out or down-regulation of this step influences all branches of the flavonoid biosynthesis. Flower colour engineering with CHS resulted in the identification of mechanisms for the suppression of gene expression [26]. A. thaliana CHS can be distinguished from very similar stilbene synthases (STS) on the basis of two diagnostic amino acid residues, Q166 and Q167, while a STS would show Q166 H167 or H166 Q167 [27]. The chalcone isomerase (CHI) catalyses the conversion of bicyclic chalcones into tricyclic (S)-flavanones [28]. CHI I converts 6′-tetrahydroxychalcone to 5-hydroxyflavanone, while CHI II additionally converts 6′-deoxychalcone to 5-deoxyflavanone [29]. An investigation of CHI in early land plants revealed the presence of CHI II, which is in contrast to the initial assumption that CHI II activity would be restricted to legumes [30]. A detailed theory about the evolution of functional CHIs from non-enzymatic fatty acid binding proteins and the origin of CHI-like proteins was developed on the basis of evolution experiments [31]. The CHI product naringenin can be processed by different enzymes, broadening the flavonoid biosynthesis pathway to a metabolic network.
Flavanone 3β-hydroxylase (F3H/FHT) catalyses 3-hydroxylation of naringenin to dihydroflavonols [32]. As a member of the 2-oxoglutarate-dependent dioxygenase (2-ODD) family, F3H utilises the same cofactors and cosubstrate as the two other 2-ODD enzymes in the flavonoid biosynthesis: flavonol synthase (FLS) and leucoanthocyanidin dioxygenase (LDOX)/anthocyanidin synthase (ANS) [33]. The 2-ODD enzymes share overlapping substrate and product selectivities [34]. FLS was identified as a bifunctional enzyme showing F3H activity in some species, including A. thaliana [35], Oryza sativa [36], and Ginkgo biloba [37]. ANS, an enzyme of a late step in the flavonoid biosynthesis pathway, can have both FLS and F3H activity [38,39,40,41]. Due to its FLS side-activity, ANS has to be considered as an additional candidate for the synthesis of flavonols. The flavonoid 3’-hydroxylase (F3′H) catalyses the conversion of naringenin to eriodictyol and the conversion of dihydrokaempferol to dihydroquercetin [42]. Expression and activity of flavonoid 3′5′-hydroxylase (F3′5′H) is essential for the formation of 5′-hydroxylated anthocyanins, which cause the blue colour of flowers [13,43]. F3′5′H competes with FLS for dihydroflavonols, and thus it is possible that F3′5′H processes only the excess of these substrates that surpass the FLS capacity [44]. Functionality of enzymes such as F3′5′H or F3′H is determined by only a few amino acids. A T487S mutation converted a Gerbera hybrida F3′H into a F3′5′H and the reverse mutation in an Osteospermum hybrida F3′5′H deleted the F3′5′H activity almost completely while F3′H activity remained [45]. The central enzyme in the flavonol biosynthesis is the flavonol synthase (FLS), which converts a dihydroflavonol into the corresponding flavonol by introducing a double bond between C-2 and C-3 of the heterocylic pyran ring (Figure 1) [46,47]. FLS activity was first identified in irradiated parsley cells [48] and has then been characterised in several species including Petunia hybrida [46], A. thaliana [49], and Zea mays [24], revealing species-specific substrate specificities and affinities.
Another branching pathway channels naringenin into the flavone synthesis (Figure 1). Together with flavonols, flavones occur as primary pigments in white flowers and function as co-pigments with anthocyanins in blue flowers [50]. Flavanones can be oxidised to flavones by flavone synthase I (FNS I) [51] and FNS II [52]. Hence, FNS I and FNS II compete with F3H for flavanones and present a branching reaction in the flavonoid biosynthesis [53]. Being a 2-ODD, FNS I shows only minor differences in its catalytic mechanism compared to F3H, which are determined by only seven amino acid residues [53]. The exchange of all seven residues in parsley F3H resulted in a complete change to FNS I activity [53].
Colourful pigments are generated in the anthocyanin and proanthocyanidin biosynthesis. The NADPH-dependent reduction of dihydroflavonols to leucoanthocyanidins by dihydroflavonol-4-reductase (DFR) is the first committed step of the anthocyanin and proanthocyanidin biosynthesis. There is a competition between FLS and DFR for dihydroflavonols [54]. DFR enzymes have different preferences for various dihydroflavonols (dihydrokaempferol, dihydroquercetin, and dihydromyricetin). The molecular basis of these preferences are probably due to differences in a 26-amino acid substrate-binding domain of these enzymes [55]. N at position 3 of the substrate-determining domain was associated with recognition of all three dihydroflavonols [55]. D at position 3 prevented the acceptance of dihydrokaempferols [55], while L or A led to a preference for dihydrokaempferol and substantially reduced the processing of dihydromyricetin [55,56]. Although this position is central for the substrate specificity, other positions contribute to the substrate specificity [57]. ANS catalyses the last step in the anthocyanin aglycon biosynthesis, the conversion of leucoanthocyanidins into anthocyanidins. The NADPH/NADH-dependent isoflavone-like reductases, leucoanthocyanidin reductase (LAR)/leucocyanidin reductase (LCR), and anthocyanidin reductase (ANR, encoded by BANYULS (BAN)) are members of the reductase epimerase dehydrogenase superfamily [58]. LAR channels leucoanthocyanidins into the proanthocyanidin biosynthesis, which is in competition with the anthocyanidin formation catalysed by ANS. There is also a competition between 3-glucosyltransferase (3GT) and ANR for anthocyanidins [59]. While 3GT generates stable anthocyanins through the addition of a sugar group to anthocyanidins, ANR channels anthocyanidins into the proanthocyanidin biosynthesis. Anthocyanidins are unstable in aqueous solution and fade rapidly unless the pH value is extremely low [60]. Suppression of ANR1 and ANR2 in Glycine max caused the formation of red seeds through a reduction in proanthocyanidin biosynthesis and an increased anthocyanin biosynthesis [61]. Substrate preferences of ANR can differ between species, as demonstrated for A. thaliana and M. truncatula [62].
As a complex metabolic network with many branches, flavonoid biosynthesis requires sophisticated regulation. Activity of different branches is mainly regulated at the transcriptional level [63]. In A. thaliana, as in many other plants, R2R3-MYBs [64,65] and basic helix-loop-helix (bHLH) proteins [66] are two main transcription factor families involved in the regulation of the flavonoid biosynthesis. The WD40 protein TRANSPARENT TESTA 1 (TTG1) facilitates the interaction of R2R3-MYBs and bHLHs in the regulation of the anthocyanin and proanthocyanidin biosynthesis in A. thaliana [67]. Due to its components, this trimeric complex is also referred to as MBW complex (MYB-bHLH-WD40) [67]. Examples of MBW complexes are MYB123/bHLH42/TTG1 and MYB75/bHLH2/TTG1, which are involved in anthocyanin biosynthesis regulation in a tissue-specific manner [68]. However, there are also bHLH-independent R2R3-MYBs such as MYB12, MYB11, and MYB111. These proteins regulate early genes of the flavonoid biosynthesis like CHS, CHI, F3H, and FLS as single transcriptional activators [69].
Many previous studies performed a systematic investigation of the flavonoid biosynthesis in plant species including Fragaria x ananassa [70], Musa acuminata [71], Tricyrtis spp. [72], and multiple Brassica species [73]. In addition to these systematic investigations, genes of the flavonoid biosynthesis are often detected as differentially expressed in transcriptomic studies without particular focus on this pathway [74,75,76]. In depth investigation of the flavonoid biosynthesis starts with the identification of candidate genes for all steps. This identification of candidates often relies on an existing annotation or requires tedious manual inspection of sequence alignments. As plant genome sequences and their structural annotations become available at an increasing pace [77], the timely addition of functional annotations is an ever-increasing challenge. Therefore, we developed a pipeline for the automatic identification of flavonoid biosynthesis players in any given set of peptide, transcript, or genomic sequences. As a proof of concept, we validated the predictions made by Knowledge-based Identification of Pathway Enzymes (KIPEs) with a manual annotation of the flavonoid biosynthesis in the medicinal plant Croton tiglium. C. tiglium is a member of the family Euphorbiaceae [78] and was first mentioned over 2,200 years ago in China as a medicinal plant, probably because of the huge variety of specialised metabolites [79]. Oil of C. tiglium was traditionally used to treat gastrointestinal disorders and may have abortifacient and counterirritant effects [80]. Additionally, C. tiglium produces phorbol esters and a ribonucleoside analog of guanosine with antitumor activity [81,82]. Characterisation of the specialised metabolism of C. tiglium will facilitate the unlocking of its potential in agronomical, biotechnological, and medical applications. The flavonoid biosynthesis of C. tiglium is largely unexplored. To the best of our knowledge, previous studies only showed the presence of flavonoids through analysis of extracts [83,84,85]. However, transcriptomic resources are available [86] and provide the basis for a systematic investigation of the flavonoid biosynthesis in C. tiglium.
A huge number of publicly available genome and transcriptome assemblies of numerous plant species provide a valuable resource for comparative analysis of the flavonoid biosynthesis. Here, we present an automatic workflow for the identification of flavonoid biosynthesis genes applicable to any plant species and demonstrate the functionality by analyzing a de novo transcriptome assembly of C. tiglium.

2. Results

We developed a tool for the automatic identification of enzyme sequences in a set of peptide sequences, a transcriptome assembly, or a genome sequence. Knowledge-based Identification of Pathway Enzymes (KIPEs) identifies candidate sequences on the basis of overall sequence similarity, functionally relevant amino acid residues, and functionally relevant domains (Figure 2). As a proof of concept, the transcriptome assembly of Croton tiglium was screened with KIPEs to identify the flavonoid aglycon biosynthesis network. Results of the automatic annotation were validated by a manually curated annotation.

2.1. Concept and Components of Knowledge-Based Identification of Pathway Enzymes (KIPEs)

2.1.1. General Concept

The automatic detection of sequences encoding enzymes of the flavonoid biosynthesis network requires (1) a set of bait sequences covering a broad taxonomic range and (2) information about functionally relevant amino acid residues and domains. Bait sequences were selected to encode enzymes with evidence of functionality, i.e., mutant complementation studies or in vitro assays. Additional bait sequences were included, which were previously studied in comparative analyses of the particular enzyme family. Positions of amino acids and domains with functional relevance need to refer to a reference sequence included in the bait sequence set. All bait sequences and one reference sequence related to one reaction in the network are supplied in one FASTA file. However, many FASTA files can be provided to cover all reactions of a complete metabolic network. Positions of functionally relevant residues and domains are specified in an additional text file on the basis of the reference sequence (see manual for details, Collections of bait sequences and detailed information about the relevant amino acid residues in flavonoid biosynthesis enzymes are provided along with KIPEs. However, these collections can be customised by users to reflect updated knowledge and specific research questions. KIPEs was developed to have a minimal amount of dependencies. Only the frequently used alignment tools BLAST and MAFFT are required. Both tools are freely available as precompiled binaries without the need for installation.

2.1.2. Three Modes

A user can choose between three different analysis modes depending on the available input sequences: peptide sequences, transcript sequences, or a genome sequence. If a reliable peptide sequence annotation is available, these peptide sequences should be subjected to the analysis. Costs in terms of time and computational efforts are substantially lower for the analysis of peptide sequences than for the analysis of genome sequences. The provided peptide sequences are screened via blastp for similarity to previously characterised bait sequences. If default criteria are applied, BLAST hits are considered if the sequence similarity is above 40% and if the score is above 30% of the score resulting from an alignment of the query sequence against itself. These lenient filter criteria are applied to collect a comprehensive set of candidate sequences, which is subsequently refined through the construction of global alignments via MAFFT. Next, phylogenetic trees are generated to identify the best candidates on the basis of their position in a tree. Candidates are classified on the basis of the closest distance to a bait sequence. Multiple closely related bait sequences can be considered if specified. When transcript sequences are supplied to KIPEs, in silico translation in all six possible frames generates a set of peptide sequences that are subsequently analysed as described above. Supplied DNA sequences are screened for similarity to the bait peptide sequences via tblastn. Hits reported by tblastn are considered exons or exon fragments and therefore assigned to groups that might represent candidate genes. The connection of these hits is attempted in a way that canonical GT-AG splice site combinations emerge. One isoform per locus is constructed and subsequently analysed as described above.

2.1.3. Final Filtering

After identification of initial candidates through overall sequence similarity, a detailed comparison against a well characterised reference sequence with described functionally relevant amino acid residues is performed. All candidates are screened for matching amino acid residues at functionally relevant positions. Sequences encoding functional enzymes are expected to display a matching amino acid residue at all checked positions. Additionally, the conservation of relevant domains is analysed. A prediction about the functionality/non-functionality of all enzymes encoded by the candidate sequences is performed at this step. Results of intermediate steps are stored to allow in depth inspection if necessary.

2.2. Technical Validation of KIPEs

A first technical validation of KIPEs was performed on the basis of sequence data sets of plant species with previously characterised flavonoid biosynthesis, namely, Arabidopsis lyrata, A. thaliana, Cicer arietinum, Fragaria vesca, Glycine max, Malus domestica, Medicago truncatula, Musa acuminata, Populus trichocarpa, Solanum lycopersicum, Solanum tuberosum, Theobroma cacao, and Vitis vinifera. The flavonoid biosynthesis of these species was previously characterised, thus providing an opportunity for validation. KIPEs identified candidate sequences with conservation of all functionally relevant amino acid residues for the expected enzymes in all species (Supplementary S1).

2.3. The Flavonoid Biosynthesis Enzymes in Croton tiglium

Genes in the flavonoid biosynthesis of C. tiglium were identified on the basis of bait sequences of over 200 plant species and well-characterised reference sequences of A. thaliana, Glycine max, Medicago sativa, Osteospermum spec., Petroselinum crispum, Populus tomentosa, and Vitis vinifera. The transcriptome assembly of C. tiglium revealed sequences encoding enzymes for all steps in the flavonoid biosynthesis (Table 1). Phylogenetic analyses placed the C. tiglium sequences of enzymes in the flavonoid biosynthesis close to the corresponding sequences of related Malpighiales species such as Populus tomentosa (Supplementary S2). Conservation of functionally relevant amino acid residues was inspected in an alignment with sequences of characterised enzymes of the respective step (Supplementary S3).
The general phenylpropanoid biosynthesis is represented by 10 phenylalanine ammonia lyase (PAL) candidates, two cinnamate 4-hydroxylases (C4H) candidates, and one 4-coumarate-CoA ligase (4CL) candidate (Table 1, Supplementary S4, Supplementary S5). Many PAL sequences show a high overall sequence similarity, indicating that multiple alleles or isoforms could contribute to the high number. A phylogenetic analysis supports the hypothesis that multiple PAL candidates might be alleles or alternative transcript variants of the same genes (Supplementary S2). Very low transcript abundances indicate that at least three of the PAL candidates can be neglected (Table 1).
Although multiple CHS candidates were identified on the basis of overall sequence similarity to the A. thaliana CHS sequence, only CtCHSa showed all functionally relevant amino acid residues (Supplementary S3). Five other candidates were discarded due to the lack of Q166 and Q167, which differentiate CHS from other polyketide synthases such as STS or LESS ADHESIVE POLLEN 5 (LAP5). Additionally, a CHS signature sequence at the C-terminal end and the malonyl-CoA-binding motif at position 313 to 329 in the A. thaliana sequence are conserved in CtCHSa. A phylogenetic analysis supported these findings by placing CtCHSa in a clade with bona fide chalcone synthases (Supplementary S2). There is only one CHI candidate, CtCHI Ia, which contains all functionally relevant amino acid residues (Supplementary S3). No CHI II candidate was detected. C. tiglium has one F3H candidate, one F3′H candidate, and two F3′5′H candidates. CtF3Ha, CtF3′Ha, CtF3′5′Ha, and CtF3′5′Hb show conservation of the respective functionally relevant amino acid residues (Supplementary S3). CtF3′Ha contains the N-terminal proline-rich domain and a perfectly conserved oxygen binding pocket corresponding to the sequence at position 302 to 307 in the A. thaliana reference sequence. Both CtF3′5′Ha and CtF3′5′Hb were also considered as F3′H candidates but showed overall a higher similarity to the F3′5′H bait sequences than to the F3′H bait sequences. The flavone biosynthesis capacities of C. tiglium remained elusive. No FNS I candidates with conservation of all functionally relevant amino acids were detected. However, there were four FNS II candidates that showed only one substitution of an amino acid residue in the oxygen binding pocket (T313F). The committed step of the flavonol biosynthesis was represented by CtFLSa and CtFLSb, which showed all functionally relevant residues (Supplementary S3).
C. tiglium contains excellent candidates for all steps of the anthocyanidin and proanthocyanidin biosynthesis. CtDFR showed conservation of the functionally relevant amino acid residues (Supplementary S3). We investigated the substrate specificity domain to understand the enzymatic potential of the DFR in C. tiglium. Position 3 of this substrate specificity domain showed a D that is associated with low acceptance of dihydrokaempferols. CtLAR is the only LAR candidate with conservation of the functionally relevant amino acid residues (Supplementary S3). CtANS is the only ANS candidate with conservation of the functionally relevant amino acid residues (Supplementary S3). There are two ANR candidates in C. tiglium. CtANRa and CtANRb showed conservation of all functionally relevant amino acid residues (Supplementary S3). CtANRa showed 74% identical amino acid residues when compared to the reference sequence, which exceeded the 49% of CtANRb substantially.
The identification of candidates in a transcriptome assembly already showed transcriptional activity of the respective gene. To resolve the transcriptional activity of genes in greater detail, we quantified the presence of candidate transcripts in different tissues of C. tiglium and compared the resulting values to Croton draco through cross-species transcriptomics (Supplementary S6). High transcript abundance of almost all flavonoid biosynthesis candidates was observed in seeds, while only a few candidate transcripts were observed in other investigated tissues (Table 1). Transcripts involved in the proanthocyanidin biosynthesis showed an exceptionally high abundance in seeds of C. tiglium and inflorescence of C. draco. Overall, the tissue-specific abundance of many transcripts was found to be similar between C. tiglium and C. draco. LAR and ANR showed substantially higher transcript abundances in inflorescences of C. draco compared to C. tiglium. CHS and ANS showed the highest transcript abundance in pink flowers of C. draco (Supplementary S6).

2.4. Transcriptional Regulators of the Flavonoid Biosynthesis in Croton tiglium

To demonstrate the applicability of KIPEs for the investigation of non-enzyme sequences such as transcription factor gene families, we screened the transcriptome assembly of C. tiglium for members of the MYB, bHLH, and WD40 family. This analysis revealed candidates for some key regulators of the flavonoid biosynthesis, namely, MYB11/MYB12/MYB111 (subgroup7), MYB123 (subgroup5), MYB75/MYB90/MYB113/MYB114 (subgroup6), bHLH2/bHLH42, and TTG1, according to the nomenclature in A. thaliana (Table 2, Supplementary S7). The MYB subgroups 6 and 7 have multiple members in A. thaliana and C. tiglium. Therefore, C. tiglium candidates were only assigned to an orthogroup (Table 2). The reliable identification of MYB orthologs between both species was not feasible (Supplementary S7). There are five homologous sequences of MYB123 in C. tiglium, with one of them probably originating from the same gene. The R2R3 MYB domain was detected in the MYB candidates, except for DN21046_c0_g1_i3, DN21046_c0_g1_i3, DN30455_c10_g1_i1, and DN33314_c5_g2_i4. With the exception of DN33314_c5_g2_i4 (truncated protein), all CtMYB candidates of subgroup 6 have a conserved bHLH interaction domain, while the CtMYB candidates of the bHLH-independent subgroup 7 did not show this conserved domain. There are seven C. tiglium sequences in a clade with the A. thaliana bHLH42 (Supplementary S7), but these might be alternative isoforms originating from the same gene. The same is true for the seven isoforms detected as homologous sequences of A. thaliana bHLH2 (Supplementary S7). Three TTG1 candidates exist in the C. tiglium transcriptome assembly, but two of them might be isoforms belonging to the same gene. The MYB, bHLH, and TTG1 transcription factor candidates show generally lower transcript abundances than the enzyme candidates (Table 1 and Table 2). The highest transcript abundance of all three MBW complex components was observed in seeds.

3. Discussion

As previous studies of extracts from Croton tiglium and various other Croton species revealed the presence of flavonoids [84,85,87,88,89,90,91,92], steps in the central flavonoid aglycon biosynthesis network should be represented by at least one functional enzyme each. However, this is the first identification of candidates involved in the biosynthesis. Previous reports [84,85,87,88,89,90,91,92] about flavonoids align well with our observation (Table 1) that at least one predicted peptide contains all previously described functionally relevant amino acid residues of the respective enzyme. The only exception is the flavone synthase step. While FNS I is frequently absent in flavonoid-producing species outside the Apiaceae, FNS II is more broadly distributed across plants [53]. C. tiglium is not a member of the Apiaceae, and thus the absence of FNS I and the presence of FNS II candidates are expected.
All candidate sequences of presumably functional enzymes belong to actively transcribed genes, as indicated by the presence of these sequences in a transcriptome assembly. Since the flavonoid biosynthesis is mainly regulated at the transcriptional level [63] and previously reported blocks in the pathway are expected to be due to transcriptional down-regulation [93,94], we expect most branches of the flavonoid biosynthesis in C. tiglium to be functional. No CHI II candidate was detected, and thus C. tiglium probably lacks a 6′-deoxychalcone to 5-deoxyflavanone catalytic activity like most non-leguminous plants [30,95].
A domination of proanthocyanidins has been reported for Croton species [88]. This high proanthocyanidin content aligns well with high transcript abundance of proanthocyanidin biosynthesis genes (CtLAR, CtANR). PAs have been reported to account for up to 90% of the dried weight of red sap of Croton lechleri [96]. Expression of CtFLSa in the leaves matches previous reports about flavonol extraction from leaves [90,97]. Interestingly, almost all analysed Croton species showed very high amounts of quercetin derivates compared to kaempferol derivates in their leaf extracts, which significantly correlated with antioxidant potential [97]. This high quercetin concentration might be due to a high expression level of CtF3′Ha in leaves. Since F3′H converts dihydrokaempferol (DHK) to dihydroquercetin (DHQ), a high gene expression might result in high amounts of DHQ, which can be used from FLS to produce quercetin. At the same time, the production of kaempferols from DHK is reduced.
Flavonols have been extracted from several Croton species and various important functions have been attributed to these flavonols. Quercetin 3,7-dimethyl ether was extracted from Croton schiedeanus and elicits vasorelaxation in isolated aorta [91]. Casticin, a methyoxylated flavonol from Croton betulaster, modulates cerebral cortical progenitors in rats by directly decreasing neuronal death, and indirectly via astrocytes [98]. Besides the anticancer activity of flavonol-rich extracts from Croton celtidifolius in mice [99], flavonols extracted from Croton menyharthii leaves possess antimicrobial activity [100]. Kaempferol 7-o-β-d-(6″-O-cumaroyl)-glucopyranoside isolated from Croton piauhiensis leaves enhanced the effect of antibiotics and showed antibacterial activity on its own [101]. Flavonols extracted from Croton cajucara showed anti-inflammatory activities [102].
The investigation of the CtDFR substrate specificity revealed aspartate at the third position of the substrate specificity domain, which was previously reported to reduce the acceptance of dihydrokaempferol [55]. Although the substrate specificity of DFR is not completely resolved, a high DHQ affinity would fit to the high transcript abundance of CtF3′Hs, which encode putative DHQ-producing enzymes. Further investigations are needed to reveal how effectively C. tiglium produces anthocyanidins and proanthocyanidins on the basis of different dihydroflavonols. As C. tiglium is known to produce various proanthocyanidins [83], a functional biosynthetic network must be present. Phlobatannine have been reported in leaves of C. tiglium [83] which aligns well with our identification of a probably functional CtDFRa.
Our automatic approach for the identification of flavonoid biosynthesis genes could be applied to identify target genes for an experimental validation in a species with a newly sequenced transcriptome or genome. Due to multiple refinement steps, the predictions of KIPEs have a substantially higher fidelity than frequently used BLAST results. In particular, the distinction of different enzymes with very similar sequences (e.g., CHS, STS, LAP5) was substantially improved by KIPEs. Additionally, the automatic identification of flavonoid biosynthesis enzymes/genes across a large number of plant species facilitates comparative analyses that could be a valuable addition to functional studies or might even replace some studies. As functionally relevant amino acid residues are well described for many of the enzymes, an automatic classification of candidate sequences as functional or non-functional is feasible in many cases. It has not escaped our notice that “non-functionality” only holds with respect to the initially expected enzyme function. Sub- and neofunctionalisation, especially following gene duplications, are likely. Results produced by KIPEs could be used to identify species-specific modifications of the general flavonoid biosynthesis. Bi- or even multifunctionality has been described for some members of the 2-ODDs (FLS [36,103,104], F3H, FNS I, and ANS [38,39,40,41]). Experimental characterisation of these enzymes will still be required to determine the degree of the possible multifunctionalities in one enzyme. However, enzyme characterisation experiments could be informed by the results produced by KIPEs. As KIPEs has a particular focus on high impact amino acid substitutions, it would also be possible to screen sequence datasets of phenotypically interesting plants to identify blocks in pathways. Another potential application is the assessment of the functional impact of amino acid substitutions, e.g., in re-sequencing studies. There are established tools such as SnpEff [105] for the annotation of sequence variants in re-sequencing studies. Additionally, KIPEs could operate on the set of modified peptide sequences to analyse the functional relevance of sequence variants. If functionally relevant amino acids are affected, KIPEs could predict that the variant might cause non-functionality.
Although KIPEs can be applied to screen a genome sequence, we recommend supplying peptide or transcript sequences as input whenever possible. Well-established gene prediction tools such as AUGUSTUS [106] and GeMoMa [107] generate gene models of superior quality in most cases. KIPEs is restricted to the identification of canonical GT-AG splice sites. The very low frequency of non-canonical splice sites in plant genomes [108] would cause extreme computational costs and could lead to substantial numbers of mis-annotations. To the best of our knowledge, non-canonical splice sites have not been reported for genes in the flavonoid biosynthesis. Nevertheless, dedicated gene prediction tools can incorporate additional hints to predict non-canonical introns with high fidelity.
During the identification of amino acid residues, which were previously reported to be relevant for the enzyme function, we observed additional patterns. Certain positions showed imperfect conservation, but multiple amino acids with similar biochemical properties occurred at the respective positions. Low relevance of the amino acids at these positions for the enzymatic activity could be one explanation. However, these patterns could also point to lineage-specific specialisations of various enzymes. A previous study reported the evolution of different F3′H classes in monocots [109]. Subtle differences between isoforms might cause different enzyme properties, e.g., altered substrate specificities, which could explain the presence of multiple isoforms of the same enzyme in some species. For example, a single amino acid has substantial influence on the enzymatic functionality of F3′H and F3′5′H [45]. This report matches our observation of both F3′5′H candidates being initially also considered as F3′H candidates. A higher overall similarity to the F3′5′H bait sequences than to the F3′H bait sequences allowed an accurate classification. This example showcases the challenges when assigning enzyme functions to peptide sequences.
We developed KIPEs for the automatic identification and annotation of core flavonoid biosynthesis enzymes because this network is well characterised in numerous plant species. Additionally, we demonstrated the applicability for the identification of gene families by screening the transcriptome assembly for MYB, bHLH, and WD40 candidates. Quality and fidelity of the KIPEs results depend on the quality of the bait sequence set and the knowledge about functionally relevant amino acid residues. Nevertheless, the implementation of KIPEs allows the analysis of additional steps of the flavonoid biosynthesis (e.g., the glycosylation of flavonoids) and even the analysis of other pathways. Here, we presented the identification of enzyme candidates on the basis of single amino acid residues with functional relevance. Functionally characterised domains were subordinate in this enzyme detection process. However, KIPEs can also assess the conservation of domains. This function is not only relevant for the analysis of enzymes but could be applied to the analysis of other proteins such as transcription factors with specific binding domains.

4. Materials and Methods

4.1. Retrieval of Bait and Reference Sequences

The NCBI protein database was screened for sequences of the respective enzyme for all steps in the core flavonoid biosynthesis by searching for the common names. Listed sequences were screened for associated publications about functionality of the respective sequence. Only peptide sequences with evidence for enzyme functionality were retrieved (Supplementary S8). To generate a comprehensive set of bait sequences, we also considered sequences with indirect evidence such as clear differential expression associated with a phenotype and sequences that were previously included in analyses of the respective enzyme family. The set of bait and reference sequences used for the analyses described in this manuscript is designated FlavonoidBioSynBaits_v1.0.

4.2. Collection of Information about Important Amino Acid Residues

All bait sequences and one reference sequence per step in the flavonoid biosynthesis were subjected to a global alignment via MAFFT v7 [110]. Highly conserved positions, which were also reported in the literature to be functionally relevant, are referred to as “functionally relevant amino acid residues” in this manuscript (Supplementary S9). The amino acid residues and their positions in a designated reference sequence are provided in one table per reaction in the network ( A customised Python script was applied to identify contrasting residues between two sequence sets, e.g., chalcone and stilbene synthases (

4.3. Implementation and Availability of KIPEs

KIPEs is implemented in Python 2.7. The script is freely available at github: Details about the usage are described in the manual provided along with the Python script. Collections of bait and reference sequences as well as data tables about functionally relevant amino acid residues are included. In summary, these datasets allow the automatic identification of flavonoid biosynthesis genes in other plant species via KIPEs. Customisation of all datasets is possible in order to enable the analysis of other pathways. Mandatory dependencies of KIPEs are blastp [111], tblastn [111], and MAFFT [110]. FastTree2 [112] is an optional dependency that substantially improves the fidelity of the candidate identification and classification. Positions of candidate sequences in a phylogenetic tree are used to identify the closest bait sequences. The function of the closest bait sequence is then transferred to the candidate. However, it is possible to consider a candidate sequence for multiple different functions. If the construction of phylogenetic trees is not possible, the highest similarity to a bait sequence in a global alignment is used instead to predict a function. An analysis of functionally relevant amino acid residues in the candidate sequences is finally used to assign a function.

4.4. Phylogenetic Analysis

Alignments were generated with MAFFT v7 [110] and cleaned with pxclsq [113] to remove alignment columns with very low occupancy (<0.1). Phylogenetic trees were constructed with FastTree v2.1.10 [112] using the WAG+CAT model. FigTree ( was used to visualise the phylogenetic trees. Alignments were visualised online at v3.0 [114] using 3D structures of reference enzymes derived from the Protein Data Bank (PDB) [115] (Supplementary S10). If no PDB entry was available, the amino acid sequence of the respective reference enzyme was subjected to I-TASSER [116] for protein structure prediction and modelling (Supplementary S10, Supplementary S11). Functionally relevant amino acid residues in the C. tiglium sequences were subsequently highlighted in the generated PDFs (Supplementary S3).

4.5. Transcript Abundance Quantification

All available RNA-Seq data sets of C. tiglium [86,117] and C. draco [118] were retrieved from the Sequence Read Archive ( via fastq-dump v2.9.6 ( Kallisto v0.44 [119] was applied with default parameters to quantify the abundance of transcripts based on the C. tiglium transcriptome assembly [86].

4.6. Application of KIPEs for the Identification of Transcription Factors

KIPEs was run with sets of MYB, bHLH, and WD40 peptide sequences (MYB_bHLH_WD40_v1.0) to identify corresponding candidates in the C. tiglium transcriptome assembly. MYB sequences of A. thaliana [64], Vitis vinifera [120], Beta vulgaris [121], and Musa acuminata [122] were subject to KIPEs as baits. bHLH bait sequences were collected from A. thaliana [123], V. vinifera [124], Nelumbo nucifera [125], Citrus grandis [126], M. acuminata [127], and Solanum melongena [128]. WD40 sequences of A. thaliana [129], Triticum aestivum [130], and Setaria italica [131] were collected as bait sequences for the identification of the WD40 protein TTG1. Phylogenetic trees with the candidates reported by KIPEs, the sets of bait sequences derived from the genome-wide studies, and selected sequences retrieved from the NCBI were generated with FastTree v2.1.10 [112] on the basis of alignments constructed with MAFFT v7 [110]. The MYB domain and bHLH-interaction domain were identified with a Python script ( on the basis of previously defined patterns [122].

5. Conclusions

KIPEs enables the automatic identification of enzymes involved in the flavonoid biosynthesis in uninvestigated sequence datasets of plants, thus paving the way for comparative studies and the identification of lineage-specific differences. While we demonstrated the applicability of KIPEs for the identification and sequence-based characterisation of players in the core flavonoid biosynthesis, we envision applications beyond this pathway. Various enzymes of entire metabolic networks can be identified if sufficient knowledge about functionally relevant amino acids is available.

Supplementary Materials

The following are available online at, Supplementary S1: KIPEs evaluation results. Supplementary S2: Phylogenetic trees of candidates. Supplementary S3: Multiple sequence alignments of candidates (yellow highlighting is used for functional relevant residues in the C. tiglium sequences, acc = relative accessibility, black background indicates perfect conservation across all sequences). Supplementary S4: Coding sequences of C. tiglium flavonoid biosynthesis genes. Supplementary S5: Peptide sequences of C. tiglium flavonoid biosynthesis genes. Supplementary S6: Gene expression heatmap of all candidate genes. Supplementary S7: Unrooted phylogenetic trees of MYB, bHLH, and WD40 candidates in C. tiglium and corresponding bait sequences. Supplementary S8: List of bait and reference sequences. Supplementary S9: Functionally relevant amino acid residues considered for analysis of flavonoid biosynthesis enzymes. Supplementary S10: Information about used crystal structures of previously characterised enzymes and protein models produced in this study. Supplementary S11: 3D models of flavonoid biosynthesis enzyme structures generated by I-TASSER.

Author Contributions

B.P. and H.M.S. conceived the project. B.P., F.R., and H.M.S. conducted data analysis. B.P., F.R., and H.M.S. wrote the manuscript. B.P. supervised the project. All authors have read and agreed to the final version of this manuscript.


This research received no external funding.


We are extremely grateful to all researchers who characterised enzymes in the flavonoid biosynthesis, submitted the underlying sequences to the appropriate databases, and published their experimental findings. We acknowledge support for the publication costs by the Deutsche Forschungsgemeinschaft and the Open Access Publication Fund of Bielefeld University. We thank the Center for Biotechnology (CeBiTec) at Bielefeld University for providing an environment to perform the computational analyses.

Conflicts of Interest

The authors declare no conflict of interest.


  1. Williams, C.A.; Grayer, R.J. Anthocyanins and other flavonoids. Nat. Prod. Rep. 2004, 21, 539–573. [Google Scholar] [CrossRef] [PubMed]
  2. Jaakola, L.; Hohtola, A. Effect of latitude on flavonoid biosynthesis in plants. Plant Cell Environ. 2010, 33, 1239–1247. [Google Scholar] [CrossRef] [PubMed]
  3. Kandaswami, C.; Kanadaswami, C.; Lee, L.-T.; Lee, P.-P.H.; Hwang, J.-J.; Ke, F.-C.; Huang, Y.-T.; Lee, M.-T. The antitumor activities of flavonoids. In Vivo 2005, 19, 895–909. [Google Scholar] [PubMed]
  4. Winkel-Shirley, B. Flavonoid Biosynthesis. A Colorful Model for Genetics, Biochemistry, Cell Biology, and Biotechnology. Plant Physiol. 2001, 126, 485–493. [Google Scholar] [CrossRef][Green Version]
  5. Marais, J.P.J.; Deavours, B.; Dixon, R.A.; Ferreira, D. The Stereochemistry of Flavonoids; Grotewold, E., Ed.; Springer: New York, NY, USA, 2006; pp. 47–69. [Google Scholar]
  6. Pourcel, L.; Routaboul, J.-M.; Cheynier, V.; Lepiniec, L.; Debeaujon, I. Flavonoid oxidation in plants: From biochemical properties to physiological functions. Trends Plant. Sci. 2007, 12, 29–36. [Google Scholar] [CrossRef]
  7. Murphy, A.; Peer, W.A.; Taiz, L. Regulation of auxin transport by aminopeptidases and endogenous flavonoids. Planta 2000, 211, 315–324. [Google Scholar] [CrossRef]
  8. Mol, J.; Grotewold, E.; Koes, R. How genes paint flowers and seeds. Trends Plant Sci. 1998, 3, 212–217. [Google Scholar] [CrossRef]
  9. Harborne, J.B.; Williams, C.A. Advances in flavonoid research since 1992. Phytochemistry 2000, 55, 481–504. [Google Scholar] [CrossRef]
  10. Harborne, J.B. Recent advances in chemical ecology. Nat. Prod. Rep. 1999, 16, 509–523. [Google Scholar] [CrossRef]
  11. Appelhagen, I.; Jahns, O.; Bartelniewoehner, L.; Sagasser, M.; Weisshaar, B.; Stracke, R. Leucoanthocyanidin Dioxygenase in Arabidopsis thaliana: Characterization of mutant alleles and regulation by MYB-BHLH-TTG1 transcription factor complexes. Gene 2011, 484, 61–68. [Google Scholar] [CrossRef]
  12. Panche, A.N.; Diwan, A.D.; Chandra, S.R. Flavonoids: An overview. J. Nutr. Sci. 2016, 5. [Google Scholar] [CrossRef] [PubMed][Green Version]
  13. Nishihara, M.; Nakatsuka, T. Genetic engineering of flavonoid pigments to modify flower color in floricultural plants. Biotechnol. Lett. 2011, 33, 433–441. [Google Scholar] [CrossRef]
  14. Kozłowska, A.; Szostak-Wegierek, D. Flavonoids--food sources and health benefits. Rocz. Panstw. Zakl. Hig. 2014, 65, 79–85. [Google Scholar] [PubMed]
  15. Havsteen, B.H. The biochemistry and medical significance of the flavonoids. Pharmacol. Ther. 2002, 96, 67–202. [Google Scholar] [CrossRef]
  16. Rice-Evans, C.; Miller, N.; Paganga, G. Antioxidant properties of phenolic compounds. Trends Plant. Sci. 1997, 2, 152–159. [Google Scholar] [CrossRef]
  17. Chen, A.Y.; Chen, Y.C. A review of the dietary flavonoid, kaempferol on human health and cancer chemoprevention. Food Chem. 2013, 138, 2099–2107. [Google Scholar] [CrossRef][Green Version]
  18. Zhang, W.; Furusaki, S. Production of anthocyanins by plant cell cultures. Biotechnol. Bioprocess. Eng. 1999, 4, 231–252. [Google Scholar] [CrossRef]
  19. Appelhagen, I.; Wulff-Vester, A.K.; Wendell, M.; Hvoslef-Eide, A.-K.; Russell, J.; Oertel, A.; Martens, S.; Mock, H.-P.; Martin, C.; Matros, A. Colour bio-factories: Towards scale-up production of anthocyanins in plant cell cultures. Metab. Eng. 2018, 48, 218–232. [Google Scholar] [CrossRef]
  20. Forkmann, G. Flavonoids as Flower Pigments: The Formation of the Natural Spectrum and its Extension by Genetic Engineering. Plant Breed. 1991, 106, 1–26. [Google Scholar] [CrossRef]
  21. Saito, K.; Yonekura-Sakakibara, K.; Nakabayashi, R.; Higashi, Y.; Yamazaki, M.; Tohge, T.; Fernie, A.R. The flavonoid biosynthetic pathway in Arabidopsis: Structural and genetic diversity. Plant Physiol. Biochem. 2013, 72, 21–34. [Google Scholar] [CrossRef][Green Version]
  22. Koornneef, M. Mutations affecting the testa color in Arabidopsis. Arab. Inf. Serv. 1990, 28, 1–4. [Google Scholar]
  23. Shirley, B.W.; Hanley, S.; Goodman, H.M. Effects of ionizing radiation on a plant genome: Analysis of two Arabidopsis transparent testa mutations. Plant Cell 1992, 4, 333–347. [Google Scholar] [CrossRef] [PubMed]
  24. Falcone Ferreyra, M.L.; Casas, M.I.; Questa, J.I.; Herrera, A.L.; Deblasio, S.; Wang, J.; Jackson, D.; Grotewold, E.; Casati, P. Evolution and expression of tandem duplicated maize flavonol synthase genes. Front. Plant. Sci. 2012, 3, 101. [Google Scholar] [CrossRef] [PubMed][Green Version]
  25. Ferrer, J.-L.; Jez, J.M.; Bowman, M.E.; Dixon, R.A.; Noel, J.P. Structure of chalcone synthase and the molecular basis of plant polyketide biosynthesis. Nat. Struct. Biol. 1999, 6, 775–784. [Google Scholar] [CrossRef]
  26. Van der Krol, A.R.; Mur, L.A.; Beld, M.; Mol, J.N.; Stuitje, A.R. Flavonoid genes in petunia: Addition of a limited number of gene copies may lead to a suppression of gene expression. Plant Cell 1990, 2, 291–299. [Google Scholar] [CrossRef][Green Version]
  27. Schröder, G.; Schröder, J. A single change of histidine to glutamine alters the substrate preference of a stilbene synthase. J. Biol. Chem. 1992, 267, 20558–20560. [Google Scholar]
  28. Jez, J.M.; Bowman, M.E.; Dixon, R.A.; Noel, J.P. Structure and mechanism of the evolutionarily unique plant enzyme chalcone isomerase. Nat. Struct. Biol. 2000, 7, 786–791. [Google Scholar] [CrossRef]
  29. Jez, J.M.; Noel, J.P. Reaction mechanism of chalcone isomerase. pH dependence, diffusion control, and product binding differences. J. Biol. Chem. 2002, 277, 1361–1369. [Google Scholar] [CrossRef][Green Version]
  30. Cheng, A.-X.; Zhang, X.; Han, X.-J.; Zhang, Y.-Y.; Gao, S.; Liu, C.-J.; Lou, H.-X. Identification of chalcone isomerase in the basal land plants reveals an ancient evolution of enzymatic cyclization activity for synthesis of flavonoids. New Phytol. 2018, 217, 909–924. [Google Scholar] [CrossRef][Green Version]
  31. Kaltenbach, M.; Burke, J.R.; Dindo, M.; Pabis, A.; Munsberg, F.S.; Rabin, A.; Kamerlin, S.C.L.; Noel, J.P.; Tawfik, D.S. Evolution of chalcone isomerase from a noncatalytic ancestor. Nat. Chem. Biol. 2018, 14, 548–555. [Google Scholar] [CrossRef][Green Version]
  32. Forkmann, G.; Heller, W.; Grisebach, H. Anthocyanin Biosynthesis in Flowers of Matthiola incana Flavanone 3-and Flavonoid 3′-Hydroxylases. Z. Nat. C 1980, 35, 691–695. [Google Scholar] [CrossRef]
  33. Prescott, A.G.; John, P. DIOXYGENASES: Molecular Structure and Role in Plant Metabolism. Annu. Rev. Plant. Physiol. Plant. Mol. Biol. 1996, 47, 245–271. [Google Scholar] [CrossRef] [PubMed]
  34. Cheng, A.-X.; Han, X.-J.; Wu, Y.-F.; Lou, H.-X. The Function and Catalysis of 2-Oxoglutarate-Dependent Oxygenases Involved in Plant Flavonoid Biosynthesis. Int J. Mol. Sci 2014, 15, 1080–1095. [Google Scholar] [CrossRef] [PubMed][Green Version]
  35. Owens, D.K.; Alerding, A.B.; Crosby, K.C.; Bandara, A.B.; Westwood, J.H.; Winkel, B.S.J. Functional analysis of a predicted flavonol synthase gene family in Arabidopsis. Plant Physiol. 2008, 147, 1046–1061. [Google Scholar] [CrossRef] [PubMed][Green Version]
  36. Park, S.; Kim, D.-H.; Park, B.-R.; Lee, J.-Y.; Lim, S.-H. Molecular and Functional Characterization of Oryza sativa Flavonol Synthase (OsFLS), a Bifunctional Dioxygenase. J. Agric. Food Chem. 2019, 67, 7399–7409. [Google Scholar] [CrossRef]
  37. Xu, F.; Li, L.; Zhang, W.; Cheng, H.; Sun, N.; Cheng, S.; Wang, Y. Isolation, characterization, and function analysis of a flavonol synthase gene from Ginkgo biloba. Mol. Biol. Rep. 2012, 39, 2285–2296. [Google Scholar] [CrossRef]
  38. Turnbull, J.J.; Nagle, M.J.; Seibel, J.F.; Welford, R.W.D.; Grant, G.H.; Schofield, C.J. The C-4 stereochemistry of leucocyanidin substrates for anthocyanidin synthase affects product selectivity. Bioorganic Med. Chem. Lett. 2003, 13, 3853–3857. [Google Scholar] [CrossRef]
  39. Welford, R.W.D.; Turnbull, J.J.; Claridge, T.D.W.; Prescott, A.G.; Schofield, C.J. Evidence for oxidation at C-3 of the flavonoid C-ring during anthocyanin biosynthesis. Chem. Commun. 2001, 1828–1829. [Google Scholar] [CrossRef]
  40. Yan, Y.; Chemler, J.; Huang, L.; Martens, S.; Koffas, M.A.G. Metabolic Engineering of Anthocyanin Biosynthesis in Escherichia coli. Appl. Environ. Microbiol. 2005, 71, 3617–3623. [Google Scholar] [CrossRef][Green Version]
  41. Almeida, J.R.M.; D’Amico, E.; Preuss, A.; Carbone, F.; de Vos, C.H.R.; Deiml, B.; Mourgues, F.; Perrotta, G.; Fischer, T.C.; Bovy, A.G.; et al. Characterization of major enzymes and genes involved in flavonoid and proanthocyanidin biosynthesis during fruit development in strawberry (Fragaria xananassa). Arch. Biochem. Biophys. 2007, 465, 61–71. [Google Scholar] [CrossRef]
  42. Brugliera, F.; Barri-Rewell, G.; Holton, T.A.; Mason, J.G. Isolation and characterization of a flavonoid 3′-hydroxylase cDNA clone corresponding to the Ht1 locus of Petunia hybrida. Plant. J. 1999, 19, 441–451. [Google Scholar] [CrossRef] [PubMed]
  43. de Vetten, N.; ter Horst, J.; van Schaik, H.P.; de Boer, A.; Mol, J.; Koes, R. A cytochrome b5 is required for full activity of flavonoid 3′,5′-hydroxylase, a cytochrome P450 involved in the formation of blue flower colors. Proc. Natl. Acad. Sci. USA 1999, 96, 778–783. [Google Scholar] [CrossRef] [PubMed][Green Version]
  44. Olsen, K.M.; Hehn, A.; Jugdé, H.; Slimestad, R.; Larbat, R.; Bourgaud, F.; Lillo, C. Identification and characterisation of CYP75A31, a new flavonoid 3′5′-hydroxylase, isolated from Solanum lycopersicum. BMC Plant Biol. 2010, 10, 21. [Google Scholar] [CrossRef] [PubMed][Green Version]
  45. Seitz, C.; Ameres, S.; Forkmann, G. Identification of the molecular basis for the functional difference between flavonoid 3′-hydroxylase and flavonoid 3′,5′-hydroxylase. FEBS Lett. 2007, 581, 3429–3434. [Google Scholar] [CrossRef] [PubMed]
  46. Holton, T.A.; Brugliera, F.; Tanaka, Y. Cloning and expression of flavonol synthase from Petunia hybrida. Plant. J. 1993, 4, 1003–1010. [Google Scholar] [CrossRef] [PubMed]
  47. Forkmann, G.; Vlaming, P.; de Spribille, R.; Wiering, H.; Schram, A.W. Genetic and Biochemical Studies on the Conversion of Dihydroflavonols to Flavonols in Flowers of Petunia hybrida. Z. Nat. C 1985, 41, 179–186. [Google Scholar] [CrossRef]
  48. Britsch, L.; Heller, W.; Grisebach, H. Conversion of Flavanone to Flavone, Dihydroflavonol and Flavonol with an Enzyme System from Cell Cultures of Parsley. Z. Nat. C 1981, 36, 742–750. [Google Scholar] [CrossRef]
  49. Pelletier, M.K.; Murrell, J.R.; Shirley, B.W. Characterization of Flavonol Synthase and Leucoanthocyanidin Dioxygenase Genes in Arabidopsis (Further Evidence for Differential Regulation of “Early” and “Late” Genes). Plant Physiol. 1997, 113, 1437–1445. [Google Scholar] [CrossRef][Green Version]
  50. Hostetler, G.L.; Ralston, R.A.; Schwartz, S.J. Flavones: Food Sources, Bioavailability, Metabolism, and Bioactivity. Adv. Nutr. 2017, 8, 423–435. [Google Scholar] [CrossRef][Green Version]
  51. Martens, S.; Forkmann, G.; Britsch, L.; Wellmann, F.; Matern, U.; Lukacin, R. Divergent evolution of flavonoid 2-oxoglutarate-dependent dioxygenases in parsley. FEBS Lett. 2003, 544, 93–98. [Google Scholar] [CrossRef][Green Version]
  52. Martens, S.; Forkmann, G. Cloning and expression of flavone synthase II from Gerbera hybrids. Plant J. 1999, 20, 611–618. [Google Scholar] [CrossRef] [PubMed]
  53. Gebhardt, Y.H.; Witte, S.; Steuber, H.; Matern, U.; Martens, S. Evolution of Flavone Synthase I from Parsley Flavanone 3β-Hydroxylase by Site-Directed Mutagenesis. Plant Physiol. 2007, 144, 1442–1454. [Google Scholar] [CrossRef] [PubMed][Green Version]
  54. Davies, K.M.; Schwinn, K.E.; Deroles, S.C.; Manson, D.G.; Lewis, D.H.; Bloor, S.J.; Bradley, J.M. Enhancing anthocyanin production by altering competition for substrate between flavonol synthase and dihydroflavonol 4-reductase. Euphytica 2003, 131, 259–268. [Google Scholar] [CrossRef]
  55. Johnson, E.T.; Ryu, S.; Yi, H.; Shin, B.; Cheong, H.; Choi, G. Alteration of a single amino acid changes the substrate specificity of dihydroflavonol 4-reductase. Plant. J. 2001, 25, 325–333. [Google Scholar] [CrossRef]
  56. Miosic, S.; Thill, J.; Milosevic, M.; Gosch, C.; Pober, S.; Molitor, C.; Ejaz, S.; Rompel, A.; Stich, K.; Halbwirth, H. Dihydroflavonol 4-Reductase Genes Encode Enzymes with Contrasting Substrate Specificity and Show Divergent Gene Expression Profiles in Fragaria Species. PLoS ONE 2014, 9, e112707. [Google Scholar] [CrossRef][Green Version]
  57. Katsu, K.; Suzuki, R.; Tsuchiya, W.; Inagaki, N.; Yamazaki, T.; Hisano, T.; Yasui, Y.; Komori, T.; Koshio, M.; Kubota, S.; et al. A new buckwheat dihydroflavonol 4-reductase (DFR), with a unique substrate binding structure, has altered substrate specificity. BMC Plant Biol. 2017, 17, 239. [Google Scholar] [CrossRef]
  58. Gang, D.R.; Kasahara, H.; Xia, Z.Q.; Vander Mijnsbrugge, K.; Bauw, G.; Boerjan, W.; Van Montagu, M.; Davin, L.B.; Lewis, N.G. Evolution of plant defense mechanisms. Relationships of phenylcoumaran benzylic ether reductases to pinoresinol-lariciresinol and isoflavone reductases. J. Biol. Chem. 1999, 274, 7516–7527. [Google Scholar] [CrossRef][Green Version]
  59. Gao, J.; Shen, L.; Yuan, J.; Zheng, H.; Su, Q.; Yang, W.; Zhang, L.; Nnaemeka, V.E.; Sun, J.; Ke, L.; et al. Functional analysis of GhCHS, GhANR and GhLAR in colored fiber formation of Gossypium hirsutum L. BMC Plant Biol. 2019, 19, 455. [Google Scholar] [CrossRef][Green Version]
  60. Timberlake, C.F.; Bridle, P. Spectral Studies of Anthocyanin and Anthocyanidin Equilibria in Aqueous Solution. Nature 1966, 212, 158–159. [Google Scholar] [CrossRef]
  61. Kovinich, N.; Saleem, A.; Rintoul, T.L.; Brown, D.C.W.; Arnason, J.T.; Miki, B. Coloring genetically modified soybean grains with anthocyanins by suppression of the proanthocyanidin genes ANR1 and ANR2. Transgenic Res. 2012, 21, 757–771. [Google Scholar] [CrossRef]
  62. Xie, D.-Y.; Sharma, S.B.; Dixon, R.A. Anthocyanidin reductases from Medicago truncatula and Arabidopsis thaliana. Arch. Biochem. Biophys. 2004, 422, 91–102. [Google Scholar] [CrossRef] [PubMed][Green Version]
  63. Weisshaar, B.; Jenkins, G.I. Phenylpropanoid biosynthesis and its regulation. Curr. Opin. Plant. Biol. 1998, 1, 251–257. [Google Scholar] [CrossRef]
  64. Stracke, R.; Werber, M.; Weisshaar, B. The R2R3-MYB gene family in Arabidopsis thaliana. Curr. Opin. Plant Biol. 2001, 4, 447–456. [Google Scholar] [CrossRef]
  65. Du, H.; Liang, Z.; Zhao, S.; Nan, M.-G.; Tran, L.-S.P.; Lu, K.; Huang, Y.-B.; Li, J.-N. The Evolutionary History of R2R3-MYB Proteins Across 50 Eukaryotes: New Insights Into Subfamily Classification and Expansion. Sci. Rep. 2015, 5, 11037. [Google Scholar] [CrossRef] [PubMed][Green Version]
  66. Hichri, I.; Barrieu, F.; Bogs, J.; Kappel, C.; Delrot, S.; Lauvergeat, V. Recent advances in the transcriptional regulation of the flavonoid biosynthetic pathway. J. Exp. Bot. 2011, 62, 2465–2483. [Google Scholar] [CrossRef][Green Version]
  67. Ramsay, N.A.; Glover, B.J. MYB-bHLH-WD40 protein complex and the evolution of cellular diversity. Trends Plant Sci. 2005, 10, 63–70. [Google Scholar] [CrossRef]
  68. Carretero-Paulet, L.; Galstyan, A.; Roig-Villanova, I.; Martínez-García, J.F.; Bilbao-Castro, J.R.; Robertson, D.L. Genome-wide classification and evolutionary analysis of the bHLH family of transcription factors in Arabidopsis, poplar, rice, moss, and algae. Plant Physiol. 2010, 153, 1398–1412. [Google Scholar] [CrossRef][Green Version]
  69. Stracke, R.; Ishihara, H.; Huep, G.; Barsch, A.; Mehrtens, F.; Niehaus, K.; Weisshaar, B. Differential regulation of closely related R2R3-MYB transcription factors controls flavonol accumulation in different parts of the Arabidopsis thaliana seedling. Plant J. 2007, 50, 660–677. [Google Scholar] [CrossRef][Green Version]
  70. Pillet, J.; Yu, H.-W.; Chambers, A.H.; Whitaker, V.M.; Folta, K.M. Identification of candidate flavonoid pathway genes using transcriptome correlation network analysis in ripe strawberry (Fragaria × ananassa) fruits. J. Exp. Bot. 2015, 66, 4455–4467. [Google Scholar] [CrossRef]
  71. Pandey, A.; Alok, A.; Lakhwani, D.; Singh, J.; Asif, M.H.; Trivedi, P.K. Genome-wide Expression Analysis and Metabolite Profiling Elucidate Transcriptional Regulation of Flavonoid Biosynthesis and Modulation under Abiotic Stresses in Banana. Sci. Rep. 2016, 6, 31361. [Google Scholar] [CrossRef][Green Version]
  72. Otani, M.; Kanemaki, Y.; Oba, F.; Shibuya, M.; Funayama, Y.; Nakano, M. Comprehensive isolation and expression analysis of the flavonoid biosynthesis-related genes in Tricyrtis spp. Biol Plant 2018, 62, 684–692. [Google Scholar] [CrossRef]
  73. Qu, C.; Zhao, H.; Fu, F.; Wang, Z.; Zhang, K.; Zhou, Y.; Wang, X.; Wang, R.; Xu, X.; Tang, Z.; et al. Genome-Wide Survey of Flavonoid Biosynthesis Genes and Gene Expression Analysis between Black- and Yellow-Seeded Brassica napus. Front. Plant. Sci. 2016, 7. [Google Scholar] [CrossRef] [PubMed][Green Version]
  74. Wang, J.; Zhang, Q.; Cui, F.; Hou, L.; Zhao, S.; Xia, H.; Qiu, J.; Li, T.; Zhang, Y.; Wang, X.; et al. Genome-Wide Analysis of Gene Expression Provides New Insights into Cold Responses in Thellungiella salsuginea. Front. Plant. Sci. 2017, 8. [Google Scholar] [CrossRef] [PubMed][Green Version]
  75. Amirbakhtiar, N.; Ismaili, A.; Ghaffari, M.R.; Firouzabadi, F.N.; Shobbar, Z.-S. Transcriptome response of roots to salt stress in a salinity-tolerant bread wheat cultivar. PLoS ONE 2019, 14, e0213305. [Google Scholar] [CrossRef] [PubMed]
  76. Sicilia, A.; Testa, G.; Santoro, D.F.; Cosentino, S.L.; Lo Piero, A.R. RNASeq analysis of giant cane reveals the leaf transcriptome dynamics under long-term salt stress. BMC Plant Biol 2019, 19, 355. [Google Scholar] [CrossRef][Green Version]
  77. Pucker, B.; Schilbert, H.M. Genomics and Transcriptomics Advance in Plant Sciences. In Molecular Approaches in Plant Biology and Environmental Challenges; Energy, Environment, and Sustainability; Singh, S.P., Upadhyay, S.K., Pandey, A., Kumar, S., Eds.; Springer: Singapore, 2019; pp. 419–448. ISBN 9789811506901. [Google Scholar]
  78. Kalwij, J.M. Review of ‘The Plant List, a working list of all plant species. J. Veg. Sci. 2012, 23, 998–1002. [Google Scholar] [CrossRef]
  79. Pope, J. On a New Preparation of Croton Tiglium. Med. Chir. Trans. 1827, 13, 97–102. [Google Scholar] [CrossRef][Green Version]
  80. Gläser, S.; Sorg, B.; Hecker, E. A Method for Quantitative Determination of Polyfunctional Diterpene Esters of the Tigliane Type in Croton tiglium. Planta Med. 1988, 54, 580. [Google Scholar] [CrossRef]
  81. Kim, J.H.; Lee, S.J.; Han, Y.B.; Moon, J.J.; Kim, J.B. Isolation of isoguanosine from Croton tiglium and its antitumor activity. Arch. Pharm. Res. 1994, 17, 115–118. [Google Scholar] [CrossRef]
  82. El-Mekkawy, S.; Meselhy, M.R.; Nakamura, N.; Hattori, M.; Kawahata, T.; Otake, T. Anti-HIV-1 phorbol esters from the seeds of Croton tiglium. Phytochemistry 2000, 53, 457–464. [Google Scholar] [CrossRef]
  83. Abbas, M.; Shahid, M.; Sheikh, M.A.; Muhammad, G. Phytochemical Screening of Plants Used in Folkloric Medicine: Effect of Extraction Method and Solvent. Asian J. Chem. 2014, 26, 6194–6198. [Google Scholar] [CrossRef]
  84. Palmeira, S.F., Jr.; Conserva, L.M.; Silveira, E.R. Two clerodane diterpenes and flavonoids from Croton brasiliensis. J. Braz. Chem. Soc. 2005, 16, 1420–1424. [Google Scholar] [CrossRef][Green Version]
  85. Kostova, I.; Iossifova, T.; Rostan, J.; Vogler, B.; Kraus, W.; Navas, H. Chemical and biological studies on Croton panamensis latex (Dragon’s Blood). Pharm. Pharmacol. Lett. 1999, 9, 34–36. [Google Scholar]
  86. Haak, M.; Vinke, S.; Keller, W.; Droste, J.; Rückert, C.; Kalinowski, J.; Pucker, B. High Quality de Novo Transcriptome Assembly of Croton tiglium. Front. Mol. Biosci. 2018, 5. [Google Scholar] [CrossRef] [PubMed]
  87. Li, C.; Wu, X.; Sun, R.; Zhao, P.; Liu, F.; Zhang, C. Croton Tiglium Extract Induces Apoptosis via Bax/Bcl-2 Pathways in Human Lung Cancer A549 Cells. Asian Pac. J. Cancer Prev. 2016, 17, 4893–4898. [Google Scholar] [CrossRef] [PubMed]
  88. Salatino, A.; Salatino, M.L.F.; Negri, G. Traditional uses, chemistry and pharmacology of Croton species (Euphorbiaceae). J. Braz. Chem. Soc. 2007, 18, 11–33. [Google Scholar] [CrossRef]
  89. Tsacheva, I.; Rostan, J.; Iossifova, T.; Vogler, B.; Odjakova, M.; Navas, H.; Kostova, I.; Kojouharova, M.; Kraus, W. Complement inhibiting properties of dragon’s blood from Croton draco. Z. Nat. C J. Biosci. 2004, 59, 528–532. [Google Scholar] [CrossRef]
  90. Maciel, M.A.; Pinto, A.C.; Arruda, A.C.; Pamplona, S.G.; Vanderlinde, F.A.; Lapa, A.J.; Echevarria, A.; Grynberg, N.F.; Côlus, I.M.; Farias, R.A.; et al. Ethnopharmacology, phytochemistry and pharmacology: A successful combination in the study of Croton cajucara. J. Ethnopharmacol. 2000, 70, 41–55. [Google Scholar] [CrossRef]
  91. Guerrero, M.F.; Puebla, P.; Carrón, R.; Martín, M.L.; Román, L.S. Quercetin 3,7-dimethyl ether: A vasorelaxant flavonoid isolated from Croton schiedeanus Schlecht. J. Pharm. Pharmacol. 2002, 54, 1373–1378. [Google Scholar] [CrossRef]
  92. Krebs, H.C.; Ramiarantsoa, H. Clerodane diterpenes of Croton hovarum. Phytochemistry 1997, 45, 379–381. [Google Scholar] [CrossRef]
  93. Shimada, S.; Otsuki, H.; Sakuta, M. Transcriptional control of anthocyanin biosynthetic genes in the Caryophyllales. J. Exp. Bot. 2007, 58, 957–967. [Google Scholar] [CrossRef] [PubMed][Green Version]
  94. Nesi, N.; Debeaujon, I.; Jond, C.; Pelletier, G.; Caboche, M.; Lepiniec, L. The TT8 Gene Encodes a Basic Helix-Loop-Helix Domain Protein Required for Expression of DFR and BAN Genes in Arabidopsis Siliques. Plant Cell 2000, 12, 1863–1878. [Google Scholar] [CrossRef] [PubMed][Green Version]
  95. Ni, R.; Zhu, T.-T.; Zhang, X.-S.; Wang, P.-Y.; Sun, C.-J.; Qiao, Y.-N.; Lou, H.-X.; Cheng, A.-X. Identification and evolutionary analysis of chalcone isomerase-fold proteins in ferns. J. Exp. Bot. 2020, 71, 290–304. [Google Scholar] [CrossRef] [PubMed][Green Version]
  96. Cai, Y.; Evans, F.J.; Roberts, M.F.; Phillipson, J.D.; Zenk, M.H.; Gleba, Y.Y. Polyphenolic compounds from Croton lechleri. Phytochemistry 1991, 30, 2033–2040. [Google Scholar] [CrossRef]
  97. Furlan, C.M.; Santos, K.P.; Sedano-Partida, M.D.; Motta, L.B.; da Santos, D.Y.A.C.; Salatino, M.L.F.; Negri, G.; Berry, P.E.; van Ee, B.W.; Salatino, A. Flavonoids and antioxidant potential of nine Argentinian species of Croton (Euphorbiaceae). Braz. J. Bot. 2015, 38, 693–702. [Google Scholar] [CrossRef]
  98. de Sampaio e Spohr, T.C.L.; Stipursky, J.; Sasaki, A.C.; Barbosa, P.R.; Martins, V.; Benjamim, C.F.; Roque, N.F.; Costa, S.L.; Gomes, F.C.A. Effects of the flavonoid casticin from Brazilian Croton betulaster in cerebral cortical progenitors in vitro: Direct and indirect action through astrocytes. J. Neurosci. Res. 2010, 88, 530–541. [Google Scholar] [CrossRef]
  99. Biscaro, F.; Parisotto, E.B.; Zanette, V.C.; Günther, T.M.F.; Ferreira, E.A.; Gris, E.F.; Correia, J.F.G.; Pich, C.T.; Mattivi, F.; Filho, D.W.; et al. Anticancer activity of flavonol and flavan-3-ol rich extracts from Croton celtidifolius latex. Pharm. Biol. 2013, 51, 737–743. [Google Scholar] [CrossRef][Green Version]
  100. Aderogba, M.A.; Ndhlala, A.R.; Rengasamy, K.R.R.; Van Staden, J. Antimicrobial and selected in vitro enzyme inhibitory effects of leaf extracts, flavonols and indole alkaloids isolated from Croton menyharthii. Molecules 2013, 18, 12633–12644. [Google Scholar] [CrossRef]
  101. Cruz, B.G.; Dos Santos, H.S.; Bandeira, P.N.; Rodrigues, T.H.S.; Matos, M.G.C.; Nascimento, M.F.; de Carvalho, G.G.C.; Braz-Filho, R.; Teixeira, A.M.R.; Tintino, S.R.; et al. Evaluation of antibacterial and enhancement of antibiotic action by the flavonoid kaempferol 7-O-β-D-(6″-O-cumaroyl)-glucopyranoside isolated from Croton piauhiensis müll. Microb. Pathog. 2020, 143, 104144. [Google Scholar] [CrossRef]
  102. Nascimento, A.M.; Maria-Ferreira, D.; Dal Lin, F.T.; Kimura, A.; de Santana-Filho, A.P.; Werner, M.F.D.P.; Iacomini, M.; Sassaki, G.L.; Cipriani, T.R.; de Souza, L.M. Phytochemical analysis and anti-inflammatory evaluation of compounds from an aqueous extract of Croton cajucara Benth. J. Pharm. Biomed. Anal. 2017, 145, 821–830. [Google Scholar] [CrossRef]
  103. Prescott, A.G.; Stamford, N.P.J.; Wheeler, G.; Firmin, J.L. In vitro properties of a recombinant flavonol synthase from Arabidopsis thaliana. Phytochemistry 2002, 60, 589–593. [Google Scholar] [CrossRef]
  104. Lukacin, R.; Wellmann, F.; Britsch, L.; Martens, S.; Matern, U. Flavonol synthase from Citrus unshiu is a bifunctional dioxygenase. Phytochemistry 2003, 62, 287–292. [Google Scholar] [CrossRef]
  105. Cingolani, P.; Platts, A.; Wang, L.L.; Coon, M.; Nguyen, T.; Wang, L.; Land, S.J.; Lu, X.; Ruden, D.M. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly 2012, 6, 80–92. [Google Scholar] [CrossRef] [PubMed][Green Version]
  106. Stanke, M.; Steinkamp, R.; Waack, S.; Morgenstern, B. AUGUSTUS: A web server for gene finding in eukaryotes. Nucleic Acids Res. 2004, 32, W309–W312. [Google Scholar] [CrossRef] [PubMed][Green Version]
  107. Keilwagen, J.; Hartung, F.; Paulini, M.; Twardziok, S.O.; Grau, J. Combining RNA-seq data and homology-based gene prediction for plants, animals and fungi. BMC Bioinform. 2018, 19, 189. [Google Scholar] [CrossRef][Green Version]
  108. Pucker, B.; Brockington, S.F. Genome-wide analyses supported by RNA-Seq reveal non-canonical splice sites in plant genomes. BMC Genom. 2018, 19, 980. [Google Scholar] [CrossRef] [PubMed][Green Version]
  109. Jia, Y.; Li, B.; Zhang, Y.; Zhang, X.; Xu, Y.; Li, C. Evolutionary dynamic analyses on monocot flavonoid 3′-hydroxylase gene family reveal evidence of plant-environment interaction. BMC Plant Biol. 2019, 19, 347. [Google Scholar] [CrossRef][Green Version]
  110. Katoh, K.; Standley, D.M. MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability. Mol. Biol. Evol. 2013, 30, 772–780. [Google Scholar] [CrossRef][Green Version]
  111. Altschul, S.F.; Gish, W.; Miller, W.; Myers, E.W.; Lipman, D.J. Basic local alignment search tool. J. Mol. Biol. 1990, 215, 403–410. [Google Scholar] [CrossRef]
  112. Price, M.N.; Dehal, P.S.; Arkin, A.P. FastTree 2–Approximately Maximum-Likelihood Trees for Large Alignments. PLoS ONE 2010, 5, e9490. [Google Scholar] [CrossRef]
  113. Brown, J.W.; Walker, J.F.; Smith, S.A. Phyx: Phylogenetic tools for unix. Bioinformatics 2017, 33, 1886–1888. [Google Scholar] [CrossRef] [PubMed][Green Version]
  114. Robert, X.; Gouet, P. Deciphering key features in protein structures with the new ENDscript server. Nucleic Acids Res. 2014, 42, W320–W324. [Google Scholar] [CrossRef] [PubMed][Green Version]
  115. Berman, H.M.; Westbrook, J.; Feng, Z.; Gilliland, G.; Bhat, T.N.; Weissig, H.; Shindyalov, I.N.; Bourne, P.E. The Protein Data Bank. Nucleic Acids Res. 2000, 28, 235–242. [Google Scholar] [CrossRef] [PubMed][Green Version]
  116. Roy, A.; Kucukural, A.; Zhang, Y. I-TASSER: A unified platform for automated protein structure and function prediction. Nat. Protoc. 2010, 5, 725–738. [Google Scholar] [CrossRef][Green Version]
  117. Leebens-Mack, J.H.; Barker, M.S.; Carpenter, E.J.; Deyholos, M.K.; Gitzendanner, M.A.; Graham, S.W.; Grosse, I.; Li, Z.; Melkonian, M.; Mirarab, S.; et al. One thousand plant transcriptomes and the phylogenomics of green plants. Nature 2019, 574, 679–685. [Google Scholar] [CrossRef][Green Version]
  118. Canedo-Téxon, A.; Ramón-Farias, F.; Monribot-Villanueva, J.L.; Villafán, E.; Alonso-Sánchez, A.; Pérez-Torres, C.A.; Ángeles, G.; Guerrero-Analco, J.A.; Ibarra-Laclette, E. Novel findings to the biosynthetic pathway of magnoflorine and taspine through transcriptomic and metabolomic analysis of Croton draco (Euphorbiaceae). BMC Plant Biol. 2019, 19, 560. [Google Scholar] [CrossRef]
  119. Bray, N.L.; Pimentel, H.; Melsted, P.; Pachter, L. Near-optimal probabilistic RNA-seq quantification. Nat. Biotechnol. 2016, 34, 525–527. [Google Scholar] [CrossRef]
  120. Matus, J.T.; Aquea, F.; Arce-Johnson, P. Analysis of the grape MYB R2R3 subfamily reveals expanded wine quality-related clades and conserved gene structure organization across Vitis and Arabidopsis genomes. BMC Plant Biol. 2008, 8, 83. [Google Scholar] [CrossRef][Green Version]
  121. Stracke, R.; Holtgräwe, D.; Schneider, J.; Pucker, B.; Sörensen, T.R.; Weisshaar, B. Genome-wide identification and characterisation of R2R3-MYB genes in sugar beet (Beta vulgaris). BMC Plant Biol. 2014, 14, 249. [Google Scholar] [CrossRef][Green Version]
  122. Pucker, B.; Pandey, A.; Weisshaar, B.; Stracke, R. The R2R3-MYB gene family in banana (Musa acuminata): Genome-wide identification, classification and expression patterns. bioRxiv 2020. [Google Scholar] [CrossRef]
  123. Heim, M.A.; Jakoby, M.; Werber, M.; Martin, C.; Weisshaar, B.; Bailey, P.C. The Basic Helix–Loop–Helix Transcription Factor Family in Plants: A Genome-Wide Study of Protein Structure and Functional Diversity. Mol. Biol. Evol. 2003, 20, 735–747. [Google Scholar] [CrossRef] [PubMed][Green Version]
  124. Wang, P.; Su, L.; Gao, H.; Jiang, X.; Wu, X.; Li, Y.; Zhang, Q.; Wang, Y.; Ren, F. Genome-Wide Characterization of bHLH Genes in Grape and Analysis of their Potential Relevance to Abiotic Stress Tolerance and Secondary Metabolite Biosynthesis. Front. Plant Sci. 2018, 9. [Google Scholar] [CrossRef] [PubMed][Green Version]
  125. Mao, T.-Y.; Liu, Y.-Y.; Zhu, H.-H.; Zhang, J.; Yang, J.-X.; Fu, Q.; Wang, N.; Wang, Z. Genome-wide analyses of the bHLH gene family reveals structural and functional characteristics in the aquatic plant Nelumbo nucifera. PeerJ 2019, 7, e7153. [Google Scholar] [CrossRef]
  126. Zhang, X.-Y.; Qiu, J.-Y.; Hui, Q.-L.; Xu, Y.-Y.; He, Y.-Z.; Peng, L.-Z.; Fu, X.-Z. Systematic analysis of the basic/helix-loop-helix (bHLH) transcription factor family in pummelo (Citrus grandis) and identification of the key members involved in the response to iron deficiency. BMC Genom. 2020, 21, 233. [Google Scholar] [CrossRef] [PubMed][Green Version]
  127. Wang, Z.; Jia, C.; Wang, J.-Y.; Miao, H.-X.; Liu, J.-H.; Chen, C.; Yang, H.-X.; Xu, B.; Jin, Z. Genome-Wide Analysis of Basic Helix-Loop-Helix Transcription Factors to Elucidate Candidate Genes Related to Fruit Ripening and Stress in Banana (Musa acuminata L. AAA Group, cv. Cavendish). Front. Plant Sci. 2020, 11. [Google Scholar] [CrossRef] [PubMed]
  128. Tian, S.; Li, L.; Wei, M.; Yang, F. Genome-wide analysis of basic helix–loop–helix superfamily members related to anthocyanin biosynthesis in eggplant (Solanum melongena L.). PeerJ 2019, 7, e7768. [Google Scholar] [CrossRef]
  129. van Nocker, S.; Ludwig, P. The WD-repeat protein superfamily in Arabidopsis: Conservation and divergence in structure and function. BMC Genom. 2003, 4, 50. [Google Scholar] [CrossRef] [PubMed][Green Version]
  130. Hu, R.; Xiao, J.; Gu, T.; Yu, X.; Zhang, Y.; Chang, J.; Yang, G.; He, G. Genome-wide identification and analysis of WD40 proteins in wheat (Triticum aestivum L.). BMC Genom. 2018, 19, 803. [Google Scholar] [CrossRef][Green Version]
  131. Mishra, A.K.; Muthamilarasan, M.; Khan, Y.; Parida, S.K.; Prasad, M. Genome-Wide Investigation and Expression Analyses of WD40 Protein Family in the Model Plant Foxtail Millet (Setaria italica L.). PLoS ONE 2014, 9, e86852. [Google Scholar] [CrossRef][Green Version]
Figure 1. Simplified illustration of the general phenylpropanoid pathway and the core flavonoid aglycon biosynthesis network. PAL (phenylalanine ammonium lyase), C4H (cinnamate 4-hydroxylase), 4CL (4-coumarate:CoA ligase), CHS (naringenin-chalcone synthase), CHI (chalcone isomerase), FNS (flavone synthase), FLS (flavonol synthase), F3H/FHT (flavanone 3-hydroxylase), F3′H (flavonoid 3′-hydroxylase), F3′5′H (flavonoid 3′5′-hydroxylase), DFR (dihydroflavonol 4-reductase), ANS/LDOX (anthocyanidin synthase/leucoanthocyanidin dioxygenase), LAR/LCR (leucoanthocyanidin reductase/leucocyanidin reductase), and ANR (anthocyanidin reductase).
Figure 1. Simplified illustration of the general phenylpropanoid pathway and the core flavonoid aglycon biosynthesis network. PAL (phenylalanine ammonium lyase), C4H (cinnamate 4-hydroxylase), 4CL (4-coumarate:CoA ligase), CHS (naringenin-chalcone synthase), CHI (chalcone isomerase), FNS (flavone synthase), FLS (flavonol synthase), F3H/FHT (flavanone 3-hydroxylase), F3′H (flavonoid 3′-hydroxylase), F3′5′H (flavonoid 3′5′-hydroxylase), DFR (dihydroflavonol 4-reductase), ANS/LDOX (anthocyanidin synthase/leucoanthocyanidin dioxygenase), LAR/LCR (leucoanthocyanidin reductase/leucocyanidin reductase), and ANR (anthocyanidin reductase).
Plants 09 01103 g001
Figure 2. This overview illustrates the components and steps of Knowledge-based Identification of Pathway Enzymes (KIPEs). Three different modes allow the screening of peptide, transcript, or genome sequences for candidate sequences. Bait sequences and information about functionally relevant features (blue) are supplied by the user. Different modules of KIPEs (light green) are executed consecutively depending on the type of input data. Intermediate results and the final output (dark green) are stored to keep the process transparent.
Figure 2. This overview illustrates the components and steps of Knowledge-based Identification of Pathway Enzymes (KIPEs). Three different modes allow the screening of peptide, transcript, or genome sequences for candidate sequences. Bait sequences and information about functionally relevant features (blue) are supplied by the user. Different modules of KIPEs (light green) are executed consecutively depending on the type of input data. Intermediate results and the final output (dark green) are stored to keep the process transparent.
Plants 09 01103 g002
Table 1. Candidates in the flavonoid biosynthesis of Croton tiglium. “TRINITY” prefix of all sequence names was omitted for brevity. Candidates are sorted by their position in the respective pathway and decreasing similarity to bait sequences. Transcripts per million (TPM) values of the candidates in different tissues are shown: leaf (SRR6239848), stem (SRR6239849), inflorescence (SRR6239850), root (SRR6239851), and seed (SRR6239852). Displayed values were rounded to the closest integer, and thus extremely low abundances appear as 0. A full table with all available RNA-Seq samples and transcript abundance values for all candidates is available in the Supplementary Information (Supplementary S6).
Table 1. Candidates in the flavonoid biosynthesis of Croton tiglium. “TRINITY” prefix of all sequence names was omitted for brevity. Candidates are sorted by their position in the respective pathway and decreasing similarity to bait sequences. Transcripts per million (TPM) values of the candidates in different tissues are shown: leaf (SRR6239848), stem (SRR6239849), inflorescence (SRR6239850), root (SRR6239851), and seed (SRR6239852). Displayed values were rounded to the closest integer, and thus extremely low abundances appear as 0. A full table with all available RNA-Seq samples and transcript abundance values for all candidates is available in the Supplementary Information (Supplementary S6).
Sequence IDFunctionLeafStemInflorescenceRootSeed
DN27125_c0_g1_i1CtCHI Ia11219388
DN33407_c7_g7_i2 1CtFNS IIa170951
DN33407_c7_g7_i1 1CtFNS IIb03142
DN27999_c0_g1_i2 1CtFNS IIc020750
DN33407_c7_g6_i4 1CtFNS IId000220
1 These sequences might encode non-functional enzymes or enzymes with a different function (see results and discussion for details) but represent the best flavone synthase (FNS) II candidates. The background color shows the transcript abundance.
Table 2. Transcriptional regulator candidates of the flavonoid biosynthesis. MYB11/MYB12/MYB111 candidates are summarised as subgroup 7 MYBs. MYB75/MYB90/MYB113/MYB114 are summarised as subgroup 6. Transcripts per million (TPM) values of the candidates in different tissues are shown: leaf (SRR6239848), stem (SRR6239849), inflorescence (SRR6239850), root (SRR6239851), and seed (SRR6239852). Displayed values were rounded to the closest integer, and thus extremely low abundances appear as 0.
Table 2. Transcriptional regulator candidates of the flavonoid biosynthesis. MYB11/MYB12/MYB111 candidates are summarised as subgroup 7 MYBs. MYB75/MYB90/MYB113/MYB114 are summarised as subgroup 6. Transcripts per million (TPM) values of the candidates in different tissues are shown: leaf (SRR6239848), stem (SRR6239849), inflorescence (SRR6239850), root (SRR6239851), and seed (SRR6239852). Displayed values were rounded to the closest integer, and thus extremely low abundances appear as 0.
Sequence IDGroupLeafStemInflorescenceRootSeed
DN30455_c10_g1_i1Subgroup 701004
DN21046_c0_g1_i3Subgroup 700000
DN21046_c0_g1_i2Subgroup 701009
DN28041_c1_g1_i4Subgroup 600007
DN28041_c1_g1_i2Subgroup 600000
DN33356_c3_g1_i2Subgroup 600004
The background color shows the transcript abundance.

Share and Cite

MDPI and ACS Style

Pucker, B.; Reiher, F.; Schilbert, H.M. Automatic Identification of Players in the Flavonoid Biosynthesis with Application on the Biomedicinal Plant Croton tiglium. Plants 2020, 9, 1103.

AMA Style

Pucker B, Reiher F, Schilbert HM. Automatic Identification of Players in the Flavonoid Biosynthesis with Application on the Biomedicinal Plant Croton tiglium. Plants. 2020; 9(9):1103.

Chicago/Turabian Style

Pucker, Boas, Franziska Reiher, and Hanna Marie Schilbert. 2020. "Automatic Identification of Players in the Flavonoid Biosynthesis with Application on the Biomedicinal Plant Croton tiglium" Plants 9, no. 9: 1103.

Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop