agronomy The Analysis of Partial Sequences of the Flavonone 3 Hydroxylase Gene in Lupinus mutabilis Reveals Differential Expression of Two Paralogues Potentially Related to Seed Coat Colour

: Flavonone 3 hydroxylases (EC 1.14.11.9) are key enzymes in the synthesis of anthocyanins and other ﬂavonoids. Such compounds are involved in seed coat colour and stem pigmentation. Lupinus mutabilis (tarwi) is a legume crop domesticated in the Andean region, valued for the high protein and oil content of its seeds. Tarwi accessions are being selected for cultivation in Europe under deﬁned breeding criteria. Seed coat colour patterns are relevant breeding traits in tarwi, and these are conditioned by anthocyanin content. We identiﬁed and isolated part of the tarwi ﬂavonone 3-hydroxylase gene ( LmF3h ) from two accessions with distinct seed coat colour patterns. Two partial LmF3h paralogues, with predicted 20% amino-acid changes but little predicted tertiary structure alterations, were identiﬁed in the coloured seed genotype, while only one was present in the white seed genotype. Upon selection and validation of appropriate reference genes, a RT-qPCR analysis showed that these paralogues have different levels of expression during seed development in both genotypes, although they follow the same expression patterns. DNA and transcription analyses enabled to highlight potential F3H paralogues relatable to seed coat pigmentation in tarwi and, upon biochemical and genetic conﬁrmation, prompt marker-assisted breeding for relevant phenotypic traits associated with ﬂavonoid synthesis. (using initial stage of seed development as control for LM18 and SBP genotype) with Ubc , Hel , Adh3 and α tub reference genes, in different experimental samples. LmF3h_b paralogue is only present in LM18 genotype.


Introduction
Flavonoids are important secondary metabolites that include chalcones, flavones, flavonols, flavandiols, anthocyanins, and proanthocyanidins or condensed tannins. These metabolites are known for their multiple benefits for humans and plants. For humans, they are associated with positive effects, for instance as antioxidant, antibacterial, anti-cancer, anti-mutagenic, anti-atherosclerosis, and anti-inflammatory agents. In plants, they play important roles in protection against biotic and abiotic stresses, act as fertility regulators, and in auxin biosynthesis. They increase the agronomic and industrial values of products, conferring pigmentation to flowers, fruits, vegetative tissues and attracting pollinators [1][2][3][4][5]. Within the flavonoids, anthocyanins are the most extensively distributed pigments in higher plants. Their biological and genetic processes, as well as gene regulation, have been widely studied in different systems [3]. Anthocyanins are deposited in the tissues via two biosynthesis pathways, which is well-described in several studies that point to the involvement of structural and regulatory genes [6]. Structural genes encode multiple enzymes: phenylalanine ammonia lyase (PAL); cinnamate 4-hydroxylase (C4H); chalcone synthase (CHS); flavanone 3-hydroxylase (F3H); dihydroflavonol 4-reductase (DFR); anthocyanidin synthase (ANS); and UDP-glucose: flavonoid 3-O-glucosyltransferase (UFGT) [7,8]. Regulatory genes encode the transcription factors WIP-type Zn-Finger, AtMYB123, AtbHLH042, MADS AtAGL32, WD40 or WDR, and AtWRKY44 to control the temporal and spatial expression of the structural genes [1,9,10]. F3H (EC 1.14.11.9) is the central enzyme of the flavonoid biosynthetic pathway. This enzyme is associated with several functions, mainly on plant resistance to stresses caused by drought, salt, cold, damage caused by fungi, and ultraviolet-B radiation [11]. In several legume species such as Glycine max, Medicago truncatula, M. sativa and Phaseolus vulgaris, studies of characterization, expression, and function of the F3H gene have been carried out in different tissues to better understand the role of this gene [12][13][14][15], but in species of the genus Lupinus the information is still scarce. Most of these works have studied the expression of the F3H gene in roots, flowers, and stems and little is known about the role of F3H in the colouration of the seeds of legumes. However, in a study carried out in Arabidopsis thaliana, F3H was reported as being associated with spotted pale brown seed coat colour [1,10]. The seed coat acts as a protection layer after physiologic maturation, and contributes to the maintenance of the physiologic quality, seed longevity, disease resistance, control of seed development, and dormancy [16][17][18]. The brown pigmentation of the seed coat has been studied in other species such as Glycine max and Phaseolus vulgaris and has been associated with the presence of proanthocyanidins [19][20][21]. Beneficial effects have been associated with the consumption of coloured seeds due to its diverse phytochemical contents, such as flavonoids ( [22] and references therein). The acceptability of grain seed by consumers is related with seed coat colour evaluation because many consumers are looking for foods with high nutritional value and health benefits [23]. Brown seed coat is a desirable agronomic trait, hence several studies in various species have been performed to elucidate seed coat pigmentation mechanisms.
Lupins (Lupinus spp.) are legume crops used for food and feed, appreciated for the high protein content of their seeds, for their adaptability to marginal cultivation conditions and for their high beneficial effect during crop rotation due to high nitrogen input to the soil. In parallel with the three lupin crops domesticated in the Mediterranean basin (white, yellow, and narrow-leaved lupin), tarwi (L. mutabilis Sweet) was domesticated in the Andean region and used as a legume crop by the Incas [24]. Tarwi is appreciated for the high protein and oil contents of its seeds [25,26] and is being selected and adapted to cultivation in Europe under diverse environmental conditions [27,28]. Breeding criteria include growth habit, seed characteristics and yield, nitrogen fixation, and disease response. Anthracnose, caused by Colletotrichum lupini, is one of the most important and widespread diseases of lupins [29], and tarwi is generally regarded as susceptible [30]. Nevertheless, stem anthocyanin-pigmentation was shown to be related to lesser disease susceptibility [31]. In several legumes, disease resistance has been associated with coloured seeds. Work performed in the 1970s referred to the importance of coloured seeds in resistance against root-rot disease in common beans [32], as well as resistance against fungi and seed vivipary in sorghum [33]. Stasz et al. [34], evaluating the time and site of infection by Pythium ultimum of resistant and susceptible germinating pea seeds, found that anthocyanin-coloured pea seeds had higher levels of resistance against this pathogen than non-coloured ones. Islam et al. [35] also came to the conclusion that coloured seeds have a beneficial effect on resistance against common bean diseases. On the other hand, the consumption of coloured seeds of certain legumes has been associated with multiple beneficial health effects. For instance, the consumption of coloured common bean and black soybean seeds has been identified as being fundamental in inhibiting the proliferation of different cancer cells in humans ( [36] and references therein). A study of phenolic composition was carried out by Xu and Chang [37] in several legumes and found that black and red varieties of common beans and black soy have higher total phenolic content, total flavonoid content, and condensate Agronomy 2022, 12, 450 3 of 16 tannin content, that are said to be highly beneficial to health. Soybean seeds with a coloured coat are known to be rich in isoflavanones and proteins, which confer benefits to human health [38,39]. Segev et al. [40] stated that coloured chickpea seeds rich in antioxidants and with high total polyphenol content have great potential to be used as functional foods. Mineral content and bioavailability in commons beans were correlated to seed coat colour [41] and human selection of bean genotypes regarding invisible micronutrient traits has been related to seed coat colour, cooking time, and palatability attributes [42]. Tarwi seed coat presents diverse colours and colour patterns that are more variable than in most other grain legumes and certainly more than in other lupin crops ( Figure 1). Nevertheless, little is known about mineral content variation among tarwi genotypes [25]. Seed coat colours and colour patterns are perceived in diverse ways by consumers [43,44]. In tarwi, the white colour is reported as the most preferred by consumers [27], indicating little consumption of coloured grains. However, many countries encourage consumption of coloured foods to prevent chronic diseases [45]. Therefore, gaining knowledge about where and when these anthocyanins are expressed in legume seeds is important for the development of specific markers that can be used to establish breeding strategies regarding superior cultivars. Gene expression studies have been carried out in several legume species [46][47][48][49]. However, in species of the genus Lupinus, little is known about the genes that control seed pigmentation, an even about the expression profile of these genes. Knowing the expression profile of these genes in different seed development stages is essential to control the factors affecting colour expression in seeds and, on the other hand, to produce quality seeds. The main aim of this study is to characterize contrasting genotypes in the colour of seed coat (brown and white) based on partial conserved gene sequence and expression, with the purpose of supporting marker-assisted tarwi breeding regarding phenotypic traits related to anthocyanin pigmentation. black soybean seeds has been identified as being fundamental in inhibiting the proliferation of different cancer cells in humans ( [36] and references therein). A study of phenolic composition was carried out by Xu and Chang [37] in several legumes and found that black and red varieties of common beans and black soy have higher total phenolic content, total flavonoid content, and condensate tannin content, that are said to be highly beneficial to health. Soybean seeds with a coloured coat are known to be rich in isoflavanones and proteins, which confer benefits to human health [38,39]. Segev et al. [40] stated that coloured chickpea seeds rich in antioxidants and with high total polyphenol content have great potential to be used as functional foods. Mineral content and bioavailability in commons beans were correlated to seed coat colour [41] and human selection of bean genotypes regarding invisible micronutrient traits has been related to seed coat colour, cooking time, and palatability attributes [42]. Tarwi seed coat presents diverse colours and colour patterns that are more variable than in most other grain legumes and certainly more than in other lupin crops ( Figure 1). Nevertheless, little is known about mineral content variation among tarwi genotypes [25]. Seed coat colours and colour patterns are perceived in diverse ways by consumers [43,44]. In tarwi, the white colour is reported as the most preferred by consumers [27], indicating little consumption of coloured grains. However, many countries encourage consumption of coloured foods to prevent chronic diseases [45]. Therefore, gaining knowledge about where and when these anthocyanins are expressed in legume seeds is important for the development of specific markers that can be used to establish breeding strategies regarding superior cultivars. Gene expression studies have been carried out in several legume species [46][47][48][49]. However, in species of the genus Lupinus, little is known about the genes that control seed pigmentation, an even about the expression profile of these genes. Knowing the expression profile of these genes in different seed development stages is essential to control the factors affecting colour expression in seeds and, on the other hand, to produce quality seeds. The main aim of this study is to characterize contrasting genotypes in the colour of seed coat (brown and white) based on partial conserved gene sequence and expression, with the purpose of supporting markerassisted tarwi breeding regarding phenotypic traits related to anthocyanin pigmentation. (white with brown spotted and cream brown eyebrow); A4-A6 (white with brown spotted and dark brown eyebrow); A7 (light brown with brown eyebrow); A8 and A9 (brown with dark brown eyebrow); A10 (dark brown with dark brown eyebrow); B1 (white with grey crescent); B2 (white with cream brown crescent), B3-B5 (white with brown crescent), B6 (white with black crescent); B7 (white with dark brown crescent); B8 and B9 (white with brown spotted and black crescent); B10 (grey with black eyebrow); C1 (light grey with dark grey crescent); C2 (cream with cream crescent); C3 (cream with spotted and brown eyebrow); C4 (grey with spotted and dark grey crescent); C5 and C7 (white with brown spotted); C9 (white with grey spotted); C6, C10 and D1 (white with grey spotted and grey moustache), D2 (white with grey spotted and black moustache); C8 (white with brown spotted Figure 1. Common patterns of Lupinus mutabilis (tarwi) seed coat: eyebrow (A1-A10, C3 and D5); crescent (B1-C2 and C4); spotted (C5, C7 and C9); moustache (C6, C8, C10, D1 and D2); marbled (D3 and D8). Variants: A1 (white with brown eyebrow), A2 (white with cream brown eyebrow); A3 (white with brown spotted and cream brown eyebrow); A4-A6 (white with brown spotted and dark brown eyebrow); A7 (light brown with brown eyebrow); A8 and A9 (brown with dark brown eyebrow); A10 (dark brown with dark brown eyebrow); B1 (white with grey crescent); B2 (white with cream brown crescent), B3-B5 (white with brown crescent), B6 (white with black crescent); B7 (white with dark brown crescent); B8 and B9 (white with brown spotted and black crescent); B10 (grey with black eyebrow); C1 (light grey with dark grey crescent); C2 (cream with cream crescent); C3 (cream with spotted and brown eyebrow); C4 (grey with spotted and dark grey crescent); C5 and C7 (white with brown spotted); C9 (white with grey spotted); C6, C10 and D1 (white with grey spotted and grey moustache), D2 (white with grey spotted and black moustache); C8 (white with brown spotted and moustache), C9 (grey spotted); D3 (white with black marbled); D4 (black); D5 (white with grey spotted and black eyebrow); D6 (combination black marbled and black colour); D7 (white); D8 (white with brown marbled); D9 (light brown); D10 (brown) (according to IBPGR [50]).

Plant Material
Plant material used in this study were L. mutabilis accessions SBP (white seeds) and LM18 (brown seed coat with dark brown eyebrow), from the Lupinus germplasm collection at Instituto Superior de Agronomia, Universidade de Lisboa (Lisbon, Portugal). The seeds used in this experiment came from the 7th generation of plants multiplied under self-pollination conditions (using an insect net over the multiplication plot). LM18 is a promising breeding line, presenting high efficiency in converting vegetative growth to seed production [28] and exhibiting a moderate resistance response to anthracnose by developing anthocyanin pigmentation around infection areas [31]. SBP is a breeding line with homogeneous white seeds and high susceptibility to anthracnose (unpublished data). Lupinus angustifolius ('Illyarrie') leaves were also used for DNA analyses.

Gene Analysis
DNA was extracted from leaves of young plants of each of the three genotypes (narrowleaved lupin and both tarwi accessions) using the DNeasy ® Plant mini kit (Qiagen, Hilden, Germany) according to the manufacturer instructions and as previously described [28].
The soybean flavanone 3-hydroxylase gene (GLYMA_02G048400) was used to interrogate the narrow-leaved lupin genome (https://plants.ensembl.org/Lupinus_angustifolius/ Info/Index; accessed 8 April 2020; [51]) using BLASTN and BLASTX tools, and protein sequence XP_019420372 (L. angustifolius naringenin, 2-oxoglutarate 3-dioxygenase-like) was used for subsequent comparisons. PCR primers were drawn based on nucleotide and protein alignments between soybean and narrow-leaved lupin F3H genes along with Vigna angularis LOC108323664 (naringenin, 2-oxoglutarate 3-dioxygenase) ( Figure S1). The alignment of the sequences for the identification of the conserved regions was performed with the ClustalW algorithm. The conserved region in the three species was used for the design of primers. Primers 5 GATTGGAGAGAGATTGTGACATA 3 (forward) and 5 GTGATCAGCATT-CTTGAACC 3 (reverse) were used to amplify L. mutabilis DNA. A PCR was performed under the following conditions: pre-denaturation 5 min at 94 • C, 40 cycles of 30 s at 94 • C, 45 s at 49 • C, and 1 min at 72 • C, and a final extension at 72 • C for 10 min. PCR reactions were carried out in a final volume of 10 µL containing 20 ng of DNA, 0.5 µM of each primer and 5 µL of dNTP + Taq DNA polymerase (NZYTech, Lisbon, Portugal). PCR products were visualised by electrophoresis at 2% agarose gels and, when necessary, PCR products were gel-excised using the GeneJET Gel Extraction Kit (Thermo Fisher Scientific, Waltham, MA, USA), and sequenced. The analysis of sequencing chromatograms was performed using the SeqMan module of DNAStar v5.05 (Lasergene, Madison, WI, USA).
Local DNA and polypeptide alignments were conducted with MegaX [52] using the Muscle algorithm. In comparison with the L. angustifolius (XP_019420372), intronic and exonic regions were identified in L. mutabilis. The exonic region of L. mutabilis was used to deduce the polypeptide sequence and to construct a dendrogram using the Maximum Likelihood method and comparing similarities with L. angustifolius orthologs. The evolutionary relationship among sequences was inferred by using the Maximum Likelihood method based on the JTT matrix-based model. A phylogenetic tree was generated from 500 bootstrap datasets to provide statistical support for each node. We used iTOL (https://itol.embl.de accessed 8 April 2020) for displaying and annotating the generated phylogenetic trees.
A protein analysis was performed to compare predicted three-dimensional and secondary structures of L. mutabilis using Phyre2 [53].

Transcription Analysis
Expression of the putative F3H gene along the development of tarwi seed coat in white and coloured genotypes was assessed through RT-qPCR. For this, RNA was extracted from seed coats of SBP and LM18 accessions along seed development ( Figure 2). Following Zabala and Vodkin [54], samples were divided in six or seven groups according to fresh seed weight (expressed in mg/seed) as exemplified in Figure 2.

Transcription Analysis
Expression of the putative F3H gene along the development of tarwi seed white and coloured genotypes was assessed through RT-qPCR. For this, RNA tracted from seed coats of SBP and LM18 accessions along seed development (F Following Zabala and Vodkin [54], samples were divided in six or seven groups ac to fresh seed weight (expressed in mg/seed) as exemplified in Figure 2. Three plants (biological replicates) were used for each stage of seed developm for each genotype. For reasons of material quantity, in the first two stages, the se lected in each plant were put together, making a single sample per each plant an In the other developmental stages at least 10 seeds were collected to isolate seed the first two stages of seed development, in both genotypes, it was not possible t the seed coat because the cotyledons were not formed and the inside of the se watery, thus the whole seed was used. Seed coat was isolated from the third stage development onwards. All seeds were immediately processed after being remov the pods. In each seed development stage about 300-400 mg was collected and fro samples around 100 mg of material was used for dissection using sterile scalpels diately frozen in liquid nitrogen and stored at −80 °C. The total RNA was extracte the Spectrum™ Plant Total RNA Kit (Sigma-Aldrich, St. Louis, MO, USA) and ev as previously described [55]. After evaluation of RNA quality and integrity, cD synthetised from 1 μg total RNA in a final volume of 20 μL using the RevertAid H Reverse transcriptase (Thermo Scientific, Waltham, MA, USA) according to the m turer's instructions. The cDNA products were stored at −20 °C for future use. F scription analysis, three biological and two technical replicates were performed.

Reference Genes
Reference genes for RT-qPCR were selected considering the literature avai narrow-leaved lupin [56,57] and validated in tarwi. Five reference genes were tes ble 1). To ensure the reliability of the potential reference gene, the expression pr two fragments were measured and normalised with the most stable reference gen mined using the RefFinder tool [58], which provided a classification based on fou ods: Bestkeeper [59]; geNorm [60]; NormFinder [61,62]; and the ∆Ct method [63]. Three plants (biological replicates) were used for each stage of seed development and for each genotype. For reasons of material quantity, in the first two stages, the seeds collected in each plant were put together, making a single sample per each plant and stage. In the other developmental stages at least 10 seeds were collected to isolate seed coat. In the first two stages of seed development, in both genotypes, it was not possible to isolate the seed coat because the cotyledons were not formed and the inside of the seeds was watery, thus the whole seed was used. Seed coat was isolated from the third stage of seed development onwards. All seeds were immediately processed after being removed from the pods. In each seed development stage about 300-400 mg was collected and from these samples around 100 mg of material was used for dissection using sterile scalpels, immediately frozen in liquid nitrogen and stored at −80 • C. The total RNA was extracted using the Spectrum™ Plant Total RNA Kit (Sigma-Aldrich, St. Louis, MO, USA) and evaluated as previously described [55]. After evaluation of RNA quality and integrity, cDNA was synthetised from 1 µg total RNA in a final volume of 20 µL using the RevertAid H Minus Reverse transcriptase (Thermo Scientific, Waltham, MA, USA) according to the manufacturer's instructions. The cDNA products were stored at −20 • C for future use. For transcription analysis, three biological and two technical replicates were performed.

Reference Genes
Reference genes for RT-qPCR were selected considering the literature available for narrow-leaved lupin [56,57] and validated in tarwi. Five reference genes were tested (Table 1). To ensure the reliability of the potential reference gene, the expression profiles of two fragments were measured and normalised with the most stable reference gene determined using the RefFinder tool [58], which provided a classification based on four methods: Bestkeeper [59]; geNorm [60]; NormFinder [61,62]; and the ∆Ct method [63].

Quantitative Reverse-Transcription PCR
According to the results obtained in the genomics analysis, RT-qPCR primers were designed for tarwi LmF3h_a and LmF3h_b fragments ( Table 2). RT-qPCR reactions were carried out using a CFX96 thermal cycler (Bio-Rad, Hercules, CA, USA) with the Bio-Rad CFX Manager software (Bio-Rad). For such, 5 µL of cDNA, 7.5 µL of SsoFast EvaGreen (Bio-Rad) and 0.45 µL of each primer (10 ng/µL) were used for each reaction. The amplification programme was performed at 95 • C for 3 min followed by 40 cycles of 10 s at 95 • C and 30 s at 60 • C. To produce melting curves, an additional denaturing cycle with temperature ranging between 65 • C and 95 • C in 0.5 • C increments was performed for all reactions after amplification. The specificity amplification of products was confirmed by a melting curve analysis and agarose gel electrophoresis. Each set of reactions included a negative control with no template. The efficiencies of RT-qPCR primer pairs were calculated: E = (10[−1/slope] − 1) × 100 [64], using LinRegPCR [65]. The mRNA expression of the genes of interest (goi) was quantified using the comparative threshold cycle (Ct) values method [66]. The Ct value of the reference genes was subtracted from the Ct value of the genes of interest to obtain ∆Ct. The first stage of seed development for each phenotype was used as the starting point for RT-qPCR data normalisation using reference genes. For such, the geometric mean of the multiple reference genes was calculated using 2 −∆∆Ct , where ∆∆Ct equals to ∆Ctsample − ∆Ctcontrol [63].

DNA Sequences Analysis
Based on soybean and narrow-leaved lupin F3H gene sequences, we amplified an orthologous region in the tarwi genome. Two fragments were obtained in L. angustifolius (with~700 pb and~1200 pb) and two with~1100 pb and~1600 pb were obtained in L. mutabilis LM18 (brown seeds with dark-brown eyebrow) (Figure 3a). The amplification The nucleotide sequence analysis of the three DNA fragments (Figure 3b) revealed no differences between the 1100 bp fragments of SBP and LM18. This fragment was termed LmF3h_a (Genbank reference MW387156). The 1600 bp fragment, only identified in brown seeds, was termed LmF3h_b (Genbank reference MW387157). Upon comparison with the L. angustifolius naringenin, 2-oxoglutarate 3-dioxygenase-like protein sequence (XP_019420372), an intronic and an exonic region were identified in each of LmF3h_a and LmF3h_b sequences. The alignment of LmF3h_a and LmF3h_b sequences revealed that, in the intron, 254 out of 674 nucleotides (37.7%) have mutations. In the exon, of the 373 nucleotides, there are 73 with mutations (19.6%). Of these 73 mutations, 29 are synonymous and 44 are non-synonymous, that is, 60.3% of exon mutations are non-synonymous, which translates to 26 amino acids different in an exon with 123 amino acids (20.3%) ( Figure S2), half of which represent changes in the amino acid family. Even though the rate of exon mutation is lower than that of the intron (19.6% and 37.7%, respectively), most mutations in the exon are non-synonymous. Although non-synonymous mutations represented 11.8% of the nucleotides of the exon, they translate into 20.3% differences in polypeptides.
The variability observed between LmF3h_a and LmF3h_b is too great for them to be considered alleles of the same gene (~80%). This finding is supported by the relative position of LmF3H_a and LmF3H_b in the dendrogram (Figure 4) with three similar proteins of L. angustifolius (Supplementary material Table S1). LmF3H_a and LmF3H_b grouped in different clusters suggesting that they represent paralogous genes rather than alleles of the same gene (Supplementary material Table S1 and Figure 4). Moreover, the F3H proteins including the tarwi fragments group in the same clade with other F3H proteins of several legume species and the well-characterized Arabidopsis TT6, while the other 2oxoglutarate dioxygenases bearing proteins (FNS, ANS, FLS, H6H and G20ox) constitute an independent group, which is in accordance with the percent of identity between F3H The nucleotide sequence analysis of the three DNA fragments (Figure 3b) revealed no differences between the 1100 bp fragments of SBP and LM18. This fragment was termed LmF3h_a (Genbank reference MW387156). The 1600 bp fragment, only identified in brown seeds, was termed LmF3h_b (Genbank reference MW387157). Upon comparison with the L. angustifolius naringenin, 2-oxoglutarate 3-dioxygenase-like protein sequence (XP_019420372), an intronic and an exonic region were identified in each of LmF3h_a and LmF3h_b sequences. The alignment of LmF3h_a and LmF3h_b sequences revealed that, in the intron, 254 out of 674 nucleotides (37.7%) have mutations. In the exon, of the 373 nucleotides, there are 73 with mutations (19.6%). Of these 73 mutations, 29 are synonymous and 44 are non-synonymous, that is, 60.3% of exon mutations are non-synonymous, which translates to 26 amino acids different in an exon with 123 amino acids (20.3%) ( Figure S2), half of which represent changes in the amino acid family. Even though the rate of exon mutation is lower than that of the intron (19.6% and 37.7%, respectively), most mutations in the exon are non-synonymous. Although non-synonymous mutations represented 11.8% of the nucleotides of the exon, they translate into 20.3% differences in polypeptides.
The variability observed between LmF3h_a and LmF3h_b is too great for them to be considered alleles of the same gene (~80%). This finding is supported by the relative position of LmF3H_a and LmF3H_b in the dendrogram (Figure 4) with three similar proteins of L. angustifolius (Supplementary material Table S1). LmF3H_a and LmF3H_b grouped in different clusters suggesting that they represent paralogous genes rather than alleles of the same gene (Supplementary material Table S1 and Figure 4). Moreover, the F3H proteins including the tarwi fragments group in the same clade with other F3H proteins of several legume species and the well-characterized Arabidopsis TT6, while the other 2-oxoglutarate dioxygenases bearing proteins (FNS, ANS, FLS, H6H and G20ox) constitute an independent group, which is in accordance with the percent of identity between F3H proteins and the other 2-oxoglutarate dioxygenases bearing proteins (Supplementary material Table S1). proteins and the other 2-oxoglutarate dioxygenases bearing proteins (Supplementary material Table S1). Differences reported in the primary structure of the LmF3H_a and LmF3H_b predicted that polypeptides generate little differences in the predicted secondary and tertiary structures, as depicted in Figure 5. The analysis of secondary structures of the polypeptide fragments and the prediction of the disorder revealed the same percentage of disorder estimated at 12% for the two proteins, differing only in the alpha helix and beta strand with about 28 and 23% for the LmF3H_a and around 30 and 25% for the LmF3H_b, respectively. The differences observed for the alpha helix and beta strand in the two sequences were only 2%, and did not produce major changes in the predicted structures of the protein fragments. LmF3H_a and LmF3H_b fragments possess three out of five F3H highly conserved motifs (2, 3 and 4) which were detected in legumes including lupins as well as in the Arabidopsis thaliana TT6 gene with proved function as F3H [67]. These motifs contain two histidines (H220 and H278) and one aspartic acid (D222) of the ferrous iron binding sites, as well as the three strictly conserved prolines (P148, P204, P207), indicated by arrows ( Figure 6). These were suggested to have an important role on the folding process of the protein [68]. Tarwi fragments also show the specific domains of the F3H proteins (Supplementary material Figure S4). Differences reported in the primary structure of the LmF3H_a and LmF3H_b predicted that polypeptides generate little differences in the predicted secondary and tertiary structures, as depicted in Figure 5. The analysis of secondary structures of the polypeptide fragments and the prediction of the disorder revealed the same percentage of disorder estimated at 12% for the two proteins, differing only in the alpha helix and beta strand with about 28 and 23% for the LmF3H_a and around 30 and 25% for the LmF3H_b, respectively. The differences observed for the alpha helix and beta strand in the two sequences were only 2%, and did not produce major changes in the predicted structures of the protein fragments. LmF3H_a and LmF3H_b fragments possess three out of five F3H highly conserved motifs (2, 3 and 4) which were detected in legumes including lupins as well as in the Arabidopsis thaliana TT6 gene with proved function as F3H [67]. These motifs contain two histidines (H220 and H278) and one aspartic acid (D222) of the ferrous iron binding sites, as well as the three strictly conserved prolines (P148, P204, P207), indicated by arrows ( Figure 6). These were suggested to have an important role on the folding process of the protein [68]. Tarwi fragments also show the specific domains of the F3H proteins (Supplementary material Figure S4).

Selection and Validation of Reference Genes
A screening test was performed for the five reference genes selected using DNA and cDNA, and the result revealed that only the ATPsyn gene primers did not amplify tarwi DNA and cDNA ( Figure S3a,b), therefore ATPsyn gene was discarded. The analysis for selection reference genes were carried out for the remaining four genes. Although all genes can be used, we recommend the use of the Ubc and Hel genes regardless of the colour of the seed coat because they were the most specific (Supplementary material Figure S5). Based on the geNorm, NormFinder and ∆Ct algorithms, we conclude that all genes are stable (Supplementary material Table S2 and Figure S6). This finding is supported by results of the variation in pairs, V2/3 and V3/4 were 0.024 and 0.015, respectively (Supplementary material Figure S7), less than 0.15, suggesting that these genes can be used for normalization of qPCR data in tarwi.

Expression Profiles of LmF3h_a and LmF3h_b
Different stages of seed development in brown and white phenotypes were compared with each other in terms of the expression of LmF3h_a and LmF3h_b (Figure 7). A transcript analysis in different stages of development of brown and white seeds showed different levels of expression of LmF3h_a and LmF3h_b gene fragments. For brown seed, transcripts of both fragments present an increasing expression in the initial stages of seed development, unlike white seeds where there was a decrease in LmF3h_a expression in the third stage of seed development. We noted that in both phenotypes the pattern of expression profiles is the same for LmF3h_a similar to the teeth of a saw, with the greatest expression in stage four, decreasing at stage six and raising again in stage seven. Furthermore, in both paralogues (LmF3h_a and LmF3h_b) and phenotypes (brown and white seeds) the maximum expression of transcripts was observed in the transition stage (600-700 mg) and the highest level of expression of LmF3h_a paralogue was recorded in brown seeds, rather than in white seeds.

Selection and Validation of Reference Genes
A screening test was performed for the five reference genes selected using DNA and cDNA, and the result revealed that only the ATPsyn gene primers did not amplify tarwi DNA and cDNA ( Figure S3a,b), therefore ATPsyn gene was discarded. The analysis for selection reference genes were carried out for the remaining four genes. Although all genes can be used, we recommend the use of the Ubc and Hel genes regardless of the colour of the seed coat because they were the most specific ( the third stage of seed development. We noted that in both phenotypes the pattern of expression profiles is the same for LmF3h_a similar to the teeth of a saw, with the greatest expression in stage four, decreasing at stage six and raising again in stage seven. Furthermore, in both paralogues (LmF3h_a and LmF3h_b) and phenotypes (brown and white seeds) the maximum expression of transcripts was observed in the transition stage (600-700 mg) and the highest level of expression of LmF3h_a paralogue was recorded in brown seeds, rather than in white seeds.

Discussion
In this study we identified and sequenced a portion of a putative F3H gene in L. mutabilis. In spite of sequence homologies, further studies will be needed in order to obtain the full sequence of the gene and to validate its biochemical role.
LmF3h_a and LmF3h_b paralogous fragments were detected in the genome of tarwi accessions with brown seed coat, while in accessions with white seed coat only the LmF3h_a paralogous fragment was detected. This result suggests that the LmF3h_b paralogue may be associated with the brown seed pigmentation process, although an analysis of different tarwi genotypes is needed to thoroughly fundament this relationship. A similar result was reported by Himi et al. [69], that observed the differential expression of

Discussion
In this study we identified and sequenced a portion of a putative F3H gene in L. mutabilis. In spite of sequence homologies, further studies will be needed in order to obtain the full sequence of the gene and to validate its biochemical role.
LmF3h_a and LmF3h_b paralogous fragments were detected in the genome of tarwi accessions with brown seed coat, while in accessions with white seed coat only the LmF3h_a paralogous fragment was detected. This result suggests that the LmF3h_b paralogue may be associated with the brown seed pigmentation process, although an analysis of different tarwi genotypes is needed to thoroughly fundament this relationship. A similar result was reported by Himi et al. [69], that observed the differential expression of the F3H-A1, F3H-B1, and F3H-D1 genes in wheat grains and coleoptiles and found that the transcripts of the three genes were all detected in red seeds, while in white seeds only the F3H-A1 gene was detected. It is interesting to note that these results corroborate those reported in the present study, highlighting the fact that both LmF3h paralogous fragments are expressed in the brown-coated seeds. The non-detection of the LmF3h_b paralogue in white seeds can be a result of a deletion or of a large insertion into the amplified region that did not allow its amplification by PCR. Insertions can be related with transposition events, and some genes of the flavonoid biosynthetic pathway are known to be able to house transposable elements that can prevent their transcription [70]. In fact, a long terminal repeat type retroelement was reported in the F3H gene in Torenia [71] which is responsible for white flowers. The similarity between our polypeptides (LMF3H_a and LmF3H_b) and those of L. angustifolius (XP_019459681, XP_019420370 and XP_019422848) suggests that we are in presence of two paralogues, and that, taking into account the likely synteny of these two lupins, the paralogues LmF3h_a and LmF3h_b may be located on different chromosomes. The comparison of the LmF3H_a and LmF3H_b nucleotide sequences to those of F3H genes from other Fabaceae species and to those of Arabidopsis thaliana evidence a considerable level of genetic diversity that partially translates to divergence in the respective proteins, but which contrasts with a highly conserved secondary structure, suggesting a conserved function among them. The inference that LmF3H_a and LmF3H_b are paralogues lays chiefly in the comparison to L. angustifolius sequences. It is important to bear in mind the phylogenetic placement of Lupinus [72] in the genistoid clade of the Papilionoideae subfamily, (mostly composed of woody plant species, including Cytisus, Genista, Spartium, Teline, and Ulex), for which little or no information is available regarding the F3H gene or others genes related to flavonoid biosynthesis, apart from the major grain and forage legumes that are placed in the indigoferoid/millettioid (e.g., Glycine, Phaseolus, Pueraria, and Vigna) or in the Hologalegina clades (e.g., Cicer, Lathyrus, Lens, Lotus, Medicago, Melilotus, Pisum, Trifolium, and Vicia). The estimated 56 million year diverge between the genistoids and the remaining Papilionoideae clades [73] may explain the divergence found in the nucleotide and protein sequence between the putative F3H partial sequences from lupins and F3H from other grain legumes.
RT-qPCR is an extremely powerful and widely used tool for precise quantification of gene expression. In this study we reported a higher level of expression of the LmF3h_a in brown seed coat than in white-coated seeds. The high levels of expression reported in our study corroborate Shirley et al. [74], Koornneef [75], Wisman et al. [67], and Lepiniec et al. [1] studies that associated brown seed coat with F3H gene expression. Similarly, high levels of expression of the F3H-A, F3H-B, and F3H-C alleles were detected in brown seed of Ipomoea nil [76]. Wan et al. [77], studying peanut pigmentation using transcriptomic approaches, found high levels of expression of F3H, F3 H, DFR, and ANR in brown peanuts, suggesting that these four genes represent the key for understanding the pigmentation of the peanut tegument. Himi and Noda [78] studied the expression of the CHS, CHI, F3H, and DFR genes in white and red wheat seeds. These authors noted a high expression of these transcripts in lines with red skin and low expression in white lines. Shao et al. [79] found that there were higher levels of flavonoids in black and red grain rice than in white phenotype lines, as well as in common bean seeds with brown, red and black seed coat compared with white seeds [80,81]. It was interesting to note that although there are differences in the levels of expression in the paralogues during seed development in both phenotypes, the pattern of expression is similar in the two genotypes with the maximum expression for LmF3h_a and LmF3h_b registered in an intermediate stage of seed development, specifically in the 600-700 mg stage. This is not in accordance with the results of Zabala and Vodkin [54], where they found F3H expressed at higher level early in black seed development (100-200 mg), which may reflect differences in seed development between lupins and soybean, namely in anthocyanin accumulation. Moreover, our results point to a redundancy of LmF3h_a and LmF3h_b roles in flavonols accumulation during seed development.

Conclusions
In our study it was possible to identify two putative paralogues of the LmF3H gene (LmF3h_a and LmF3h_b) in brown-coated L. mutabilis seeds and only one in white seed coats (LmF3h_a), with LmF3h_a being overexpressed in brown seeds. These results represent an important basis for developing molecular markers for early detection of plants generating brown coat seeds and to be used in tarwi breeding programmes, as coloured seeds are valued by consumers and seed coat colour in tarwi is not an easily genetically tractable trait. Moreover, it also provides a guideline for future studies of tarwi genome-wide association.
The results of the present study provide ground for further exploration and scientific understanding of the enormous phenotypic variability observed in tarwi seeds. On the other hand, they provide fundamental bases for the development of specific molecular markers for the early identification of the brown seed phenotype in the initial stages of plant development. They also serve as a basis for the study of chemical compounds synthesized in coloured-seed genotypes and their importance as a functional food, allowing, in the near future, breeding programmes to be directed towards the intensification of the brown pigmentation in tarwi, including potential health benefits regarding the bioavailability of minerals relatable to seed coat colour variability.

Supplementary Materials:
The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/agronomy12020450/s1, Supplementary material, Figure S1: Multiple sequence alignment of partial F3H nucleotide sequence from Glycine max, Vigna angularis and Lupinus angustifolius orthologs, illustrating primer location 5 GATTGGAGAGAGATTGTGACATA3 (forward), 5 GTGATCAGCATTCTTGAACC3 (reverse); Figure S2: Alignment of the F3H predicted polypeptide sequence of paralogues LmF3h_a and LmF3h_b of Lupinus mutabilis; Figure S3: PCR amplification patterns of reference genes using cDNA and genomic DNA as templates. (a) PCR from cDNA and (b) from DNA. M-NZYDNA Ladder VII marker; Figure S4: Conserved domains detected in LmF3H_a, LMF3H_b, and other F3H proteins evidencing the putative function of the Lupinus mutabilis proteins. LmF3H although incomplete contain the 2-OG-Fe(ii) conserved domain, which is specific of this superfamily; Figure S5; Melting curve analyses of: (a) Adh3 with detection temperature around 78.5 • C; (b): Hel (80.5 • C); (c) αTub (80 • C), and (d) Ubc (79.5 • C) candidate reference genes for quantitative Reverse-Transcription PCR (RT-qPCR) of Lupinus mutabilis. In the four references genes, pure and single amplicons were obtained; Figure S6: Analysis of the stability of the reference genes (under validation for subsequent analyses of Lupinus mutabilis seed coat development) based on the comprehensive classification of RefFinder. The most stables are Ubc and Hel and the least stables are αTub and Adh3; Figure S7: Variation in pairs to identify the optimal number of genes for normalisation; Table S1: Similarity matrix between Lupinus mutabilis Lm_F3H_a and Lm_F3H_b predicted amino acid sequences, F3H orthologs (with reference to the L. angustifolius chromosome location of the respective genes) and FNS, ANS, FLS, H6H and GA20ox proteins; Table S2: Stability values, classification by algorithm and general for the four reference genes under validation for subsequent analyses of Lupinus mutabilis seed coat development.

Conflicts of Interest:
The authors declare no conflict of interest.