Chromosome-Level Genome Assembly and Multi-Omics Dataset Provide Insights into Isoflavone and Puerarin Biosynthesis in Pueraria lobata (Wild.) Ohwi

Pueraria lobata (wild.) Ohwi is a leguminous plant and one of the traditional Chinese herbal medicines. Its puerarin extract is widely used in the pharmaceutical industry. This study reported a chromosome-level genome assembly for P. lobata and its characteristics. The genome size was ~939.2 Mb, with a contig N50 of 29.51 Mbp. Approximately 97.82% of the assembled sequences were represented by 11 pseudochromosomes. We identified that the repetitive sequences accounted for 63.50% of the P. lobata genome. A total of 33,171 coding genes were predicted, of which 97.34% could predict the function. Compared with other species, P. lobata had 757 species-specific gene families, including 1874 genes. The genome evolution analysis revealed that P. lobata was most closely related to Glycine max and underwent two whole-genome duplication (WGD) events. One was in a gamma event shared by the core dicotyledons at around 65 million years ago, and another was in the common ancestor shared by legume species at around 25 million years ago. The collinearity analysis showed that 61.45% of the genes (54,579 gene pairs) in G. max and P. lobata had collinearity. In this study, six unique PlUGT43 homologous genes were retrieved from the genome of P. lobata, and no 2-hydroxyisoflavanone 8-C-glucoside was found in the metabolites. This also revealed that the puerarin synthesis was mainly from the glycation of daidzein. The combined transcriptome and metabolome analysis suggested that two bHLHs, six MYBs and four WRKYs were involved in the expression regulation of puerarin synthesis structural genes. The genetic information obtained in this study provided novel insights into the biological evolution of P. lobata and leguminous species, and it laid the foundation for further exploring the regulatory mechanism of puerarin synthesis.


Introduction
Pueraria lobata (wild.) Ohwi (hereinafter abbreviated as P. lobata) is an important medicinal and edible plant, mainly distributed in East Asian countries, including China [1]. In traditional Chinese medicine, the root of P. lobata is the main medicinal component, also known as kudzu. A Chinese Pharmacopeia dating back to 200 B.C. mentions the use of the roots of kudzu and their use in various treatments [2]. Kudzu has long been used to treat fever, toxicosis, indigestion, alcoholism and other illnesses in the Chinese Pharmacopoeia [3]. The roots of P. lobata are rich resources of natural product isoflavonoids, including genistein, formononetin, daidzein and puerarin (also called daidzein 8-C-glycoside). Among these isoflavonoids, the main bioactive components are puerarin and daidzein [4,5]. Modern Before the 2005 edition of the Chinese Pharmacopoeia, P. lobata and P. thomsonii were regarded as Pueraria and as a source of traditional Chinese medicine. In the long-term process of introduction, cultivation and domestication, the names and plant species of P. lobata and its varieties are confused in most areas and studies, resulting in the unclear resource location of P. lobata and its varieties [3]. In terms of composition, the root of P. lobata usually contains a high content of puerarin, up to 60-70 mg/g, while the puerarin content of P. thomsonii is usually less than 5 mg/g [12]. In terms of genetic material, the genomes of the two kinds of P. lobata have also changed greatly. At present, many research reports are available on the transcriptome and related isoflavone synthesis genes of P. lobata or P. thomsonii. Additionally, there are two reports on the genome of P. thomsonii and P. montana [13,14], but the data on the genome of P. lobata have not been reported [10]. Before the 2005 edition of the Chinese Pharmacopoeia, P. lobata and P. thomsonii were regarded as Pueraria and as a source of traditional Chinese medicine. In the long-term process of introduction, cultivation and domestication, the names and plant species of P. lobata and its varieties are confused in most areas and studies, resulting in the unclear resource location of P. lobata and its varieties [3]. In terms of composition, the root of P. lobata usually contains a high content of puerarin, up to 60-70 mg/g, while the puerarin content of P. thomsonii is usually less than 5 mg/g [12]. In terms of genetic material, the genomes of the two kinds of P. lobata have also changed greatly. At present, many research reports are available on the transcriptome and related isoflavone synthesis genes of P. lobata or P. thomsonii. Additionally, there are two reports on the genome of P. thomsonii and P. montana [13,14], but the data on the genome of P. lobata have not been reported [10]. The lack of reference genome information on P. lobata hinders further investigation of potential genes in important biological processes related to isoflavone biosynthesis [10,15].
In this study, the leaves of P. lobata were chosen as the material, and the genome sequence of P. lobata was obtained and assembled by PacBio and Illumina Hiseq sequencing technology. After completing assembly and annotation, we explored the unique genes of isoflavones and puerarin metabolism in legumes through genome-wide comparison, gene family clustering, phylogenetic tree and other methods, and explored the relationship between puerarin-specific genes and special biological traits. The evolutionary status of P. lobata species was identified, and the genome evolutionary history of species and even the whole branch was traced. The study further revealed the pathway and mechanism of puerarin synthesis in P. lobata by a transcriptome and metabolome analysis. This research expected to provide reference data support for the comprehensive utilization and classification of medicinal plant resources of P. lobata and P. thomsonii.

Materials, DNA Extraction and Genome Sequencing
The diploid P. lobata were planted in an experimental medicinal botanical garden at Wuhan Polytechnic University (Longitude: 114.23855 latitude: 30.63535, Wuhan, China) and Luotian (31.021052, 115.595175). The roots of four cultivated varieties of P. lobata with varying puerarin levels, PlobLT13, Plob53, Plob17, Plob19 and Plob25, were utilized for the transcriptome and metabolome analysis, and were marked as A, B, C, D and E, respectively, in this research (Table S1). Root, stem, young leaves, flower and seeds of P. lobata were collected and immediately frozen in liquid nitrogen and stored at −80 • C until use for DNA and RNA extraction.
Genomic DNA was extracted from young leaves of P. lobata and used to construct Illumina DNA libraries according to the standard protocols provided by the Illumina HiSeq company (Novogene Biotech Co., Ltd., Tianjin, China). A PacBio library was constructed using a SMRTbell Template Prep Kit 1.0 (PacBio, Menlo Park, CA, USA) and sequenced on a PacBio Sequel II system.

Genome Assembly and Quality Evaluation
All DNA extraction and sequencing procedures were performed by the Novogene Company (Tianjin, China) (http://www.novogene.com/, accessed on 31 October 2022). The Hi-C library was sequenced on the Illumina NovaSeq PE150 platform. PacBio readings were utilized for de novo assembly, and Pilon v1.22 was used to polish them using Illumina data [16]. The corrected contigs were further scaffolded into chromosomal-level genome via Hi-C. Hi-C technology obtained the interaction information between DNA fragments that were spatially connected, that is, DNA fragments that were physically distant, through special experimental techniques. Different contigs or scaffolds were divided into different chromosomes according to the probability of interaction within chromosomes being significantly higher than the probability of interaction between chromosomes. According to the same chromosome, the probability of interaction decreased with the increase in the interaction distance. Contigs or scaffolds of the same chromosome were sorted and oriented. Hifiasm software (v0.16.1) was used for the rapid construction of haplotype from the scratch assembly program of PacBio Hifi reads [17,18]. We used Samtools (v0.1.19) (http://samtools.sourceforge.net/, accessed on 2 October 2022) and other tools to sort. The BWA (v0.7.8) comparison resulted in chromosome coordinates, removing duplicate reads, etc., conducting SNP calling, and filtering and counting the original results. BUSCO (v5.2.1) was applied to evaluate the completeness of assembly by mapping the genome sequence to the embryophyta_odb10 database [19]. CGEMA (v2.5) was used to evaluate the integrity of the assembled genome. The conservative genes (248 genes) existing in 6 eukaryotic model organisms were selected to form the core gene library, and the assembled genome evaluated with tblastn, genewise and geneid software [20]. Based on the K-mer algorithm, Illumina sequencing data and Merqury software (https://github.com/marbl/merqury, accessed on 25 June 2022) were used to evaluate genome quality, which did not need to reference genome [21]. The genome assembly software utilized in this work is listed in Table S2.

RNA-seq Data
Trinity (v2.1.1) was used to construct transcriptome read assemblies for genome annotation. To optimize the genome annotation, RNA-Seq reads from various tissues were aligned to the fasta genome using Hisat (v2.0.4) and TopHat (v2.0.11) with default parameters to identify exons regions, and splice sites. The alignment results were then utilized as input for genome-based transcript assembly using Stringtie (v1.3.3)/Cufflinks (v2.2.1) with default settings. The non-redundant reference gene collection was created by combining genes predicted by three ways with EvidenceModeler (EVM, v1.1.1), adding masked transposable elements as input into gene prediction and employing PASA (Program to Assemble Spliced Alignment) terminal exon support. Individual families of interest were hand-picked by relevant specialists for further human curation.

Genome Annotation
In our repeat annotation workflow, we used a combination technique based on homology alignment and de novo search to find entire genome repetitions. Tandem Repeat was retrieved using ab initio prediction and TRF (http://tandem.bu.edu/trf/trf.html, accessed on 25 June 2022). The Repbase (http://www.girinst.org/repbase, assessed on 2 October 2022) database was used for homolog prediction, and it employed Repeat-Masker (http://www.repeatmasker.org/, assessed on 2 October 2022) software and its in-house scripts (RepeatProteinMask) with default settings to extract repeat regions. Then, using LTR FINDER (http://tlife.fudan.edu.cn/ltrfinder/, assessed on 2 October 2022), RepeatScout (http://www.repeatmasker.org/, accessed on 2 October 2022) and Repeat-Modeler (http://www.repeatmasker.org/RepeatModeler.html, assessed on 2 October 2022) with default parameters, all repeat sequences with lengths >100 bp and gap 'N' less than 5% comprised the raw transposable element (TE) library. For DNA-level repeat detection, a bespoke library (a mix of Repbase and our Denovo TE library processed by uclust to create a non-redundant library) was submitted to RepeatMasker.
To annotate gene models, structural annotation of the genome was employed, which includes AB initio prediction, homology-based prediction and RNA-Seq aided prediction.
The homologous protein sequences were obtained from NCBI. Protein sequences were matched to the genome using TBLASTN (v2.2.26; E-value 10 5 ), and the matching proteins were aligned to the homologous genome sequences for accurate spliced alignments using GeneWise (v2.4.1) software, which predicted the gene structure present in each protein region.
By matching the protein sequences to the Swiss-Prot database using BLASTP (with a threshold of E-value ≤ 10 −5 ), gene functions were given based on the best match. InterPro Scan 70 (v5.31) was used to annotate the motifs and domains by searching against publicly available databases such as PRINTS, Pfam, ProDom, PANTHER, SMRT and PROSITE. Each gene's Gene Ontology (GO) ID was allocated based on the corresponding InterPro entry. We predicted protein function by transferring annotations from the nearest BLAST hit (E-value 10 −5 ) in the Swiss-Prot 20 database and DI-AMOND (v0.8.22)/BLAST hit (E-value 10 −5 ) in the NR 20 database. We also mapped the gene collection to a Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway and determined which genes were the best matches.
The tRNAs were predicted using the tRNAscan-SE tool (http://lowelab.ucsc.edu/ tRNAscan-SE/, assessed on 2 October 2022). Because rRNAs are highly conserved, we used relative species' rRNA sequences as references and used BLAST to predict rRNA sequences. Other ncRNAs, including as miRNAs and snRNAs, were discovered by searching the Rfam database with the default parameters of the infernal program (http://infernal.janelia.org/, assessed on 2 October 2022).
The longest transcript in the coding region was retained to remove redundancy, and the genes encoding polypeptides shorter than 30 amino acids were also abandoned to exclude putative fragmented genes. The similarity relation between all species' protein sequencings was obtained by All-against-all BLASTP (https://blast.ncbi.nlm.nih.gov/ Blast.cgi/, assessed on 2 October 2022) [22] search with a cut-off (E-value = 10 −5 ). The alignment with high-scoring segment pairs was conjoined for each gene pair by solar [23]. A hierarchical clustering algorithm was applied to group orthologs and paralogs using OrthoMCL software (http://orthomcl.org/orthomcl/, assessed on 2 October 2022) [24] with the inflation parameter 1.5.

Phylogenetic Analysis
Single-copy orthologous genes were aligned with Muscle (http://www.drive5.com/ muscle/, assessed on 2 October 2022) [25]. A super alignment matrix was created by concatenating all alignment findings. RAxML was used to generate ML phylogenetic trees based on multiple sequence alignments [26].

Gene Family Expansion and Contraction
To identify gene family evolution as a random birth and death process model, where gene family either expands or contracts per gene per million years independently along each lineage of the phylogenetic tree, the maximum likelihood model originally implemented in the software package CAFE (http://sourceforge.net/projects/cafehahnlab/, assessed on 2 October 2022) was applied to compare the cluster size differences (gain or loss) between the ancestor and each species [29]. A p-value of 0.05 was used to identify families where the size of the species has changed considerably. To determine the relevance of changes in gene family size in each branch, the phylogenetic tree topology and branch lengths were considered.

Positively Selected Genes
Single-copy orthologous genes were aligned with Muscle (http://www.drive5.com/ muscle/, assessed on 2 October 2022) [25]. Likelihood ratio tests (LRTs) based on the branch-site model of PAML were used to detect positive selection sites with A. thaliana as the foreground branch [30]. The p-values were computed using the χ 2 statistic and corrected for multiple testing by the false discovery rate (FDR) method. This analysis calculated Ka/Ks through a likelihood ratio test to detect the probability of positive selection.

Whole-Genome Duplication Inference
An all-against-all BLASTP (https://blast.ncbi.nlm.nih.gov/Blast.cgi/, assessed on 2 October 2022) was used with a threshold (E-value = 10 −5 ) to identify the putative paralogous genes in, and orthologous genes between, each species. Syntenic blocks were performed based on the detected homologous gene pairs using MCscanX (http: //chibba.pgml.uga.edu/mcscan2/, assessed on 2 October 2022) [22,30]. Each duplicate gene pairs of syntenic block were aligned with Muscle [25], and then back-translated to their coding sequences. The four-fold synonymous third-codon transversion rates (4DTv) of syntenic blocks were calculated and were used to detect WGD events. The distribution of 4DTv values was plotted. The synonymous substitution (Ks) values for pairwise comparisons were estimated using the maximum likelihood (ML) method implemented in the codeml program of the PAML package [27].

WGCNA Analysis
A Weighted Gene Co-expression Network Analysis (WGCNA) was performed by the Novogene online tools to discover critical regulatory genes in the correlation between puerarin production and gene expression in various species (https://magic.novogene.com/ customer/, assessed on 2 October 2022). The module eigengene was defined as the first major component of a specific module and was then used to describe the expression profile of module genes in each sample. The Pearson correlations between the eigengenes of each module and the abundance of flavonoids were performed using R package ggplot2.

Metabolome Analysis
The roots of different P. lobata samples were used for the metabonomic analysis (Table S1). The metabolome analysis was completed by Metware Biotech Co., Ltd. (www. metware.cn. Wuhan, China) according to the methods of Cheng [31]. The freeze-dried roots were crushed using a mixer mill (mm 400, retsch) with a zirconia bead for 1.5 min at 30 Hz. Then, 100 mg powder was weighted and extracted overnight at 4 • C with 1.0 mL 70% aqueous methanol. Following centrifugation at 10,000 g for 10 min, the extracts were absorbed (CNWBOND Carbon-GCB SPE Cartridge, 250 mg, 3 mL; ANPEL, Shanghai, China) and filtrated (SCAA-104, 0.22 µm pore size; ANPEL, Shanghai, China) before the LC-MS analysis.
The sample extracts were analyzed using an LC-ESI-MS/MS system (HPLC, Shimpack UFLC SHIMADZU CBM30A system, www.shimadzu. The ESI-Q TRAP-MS/MS. LIT and triple quadrupole (QQQ) scans were acquired on a triple quadrupole-linear ion trap mass spectrometer (Q TRAP), API 6500 Q TRAP LC/MS/MS System, equipped with an ESI Turbo Ion-Spray interface, operating in a positive ion mode, and controlled by Analyst 1.6.3 software (AB Sciex).

Statistical Analysis
Excel 2021 and SPSS (22.0) were used to process experimental data. Chen's approach was used to process the heat map, Circos, chromosomal collinearity and related network diagram content analysis using the TB-tools software. Duncan's test was performed to see whether there were any significant changes (p ≤ 0.05) [32]. The correlation network was created with the OmicStudio tools, which may be found at https://www.omicstudio. cn/tool (assessed on 2 October 2022). The positive correlation criterion was greater than or equal to 0.5, the negative correlation threshold was less than or equal to −0.5 and the p-value threshold was less than 0.5. R version 3.6.1 and igraph1.2.6 [31]. Every experiment had three biological duplicates.

Genome Assembly and Annotation
The P. lobata genome was sequenced using the Illumine Hiseq and PacBio sequel platforms, and the assembled scaffolds were ordered using the Hi-C technique. The project obtained 84.61 Gb data, of which 57.99 Gb data were obtained from Illumina secondgeneration sequencing and 26.62 Gb data from Pacbio third-generation sequencing. The genome sequencing coverage reached 91.18% (Figure 2a and Table S4). The genome size calculated by the KMER number/depth is about 942.73 Mbp, and the revised genome size is 939.2 Mbp, with GC contents of 33.69%, further clustered into 11 pseudochromosomes ( Figure S1, Table 1). The assembled version was evaluated by various methods, and the results showed that the genome consistency, integrity and accuracy were good.
Biomolecules 2022, 12, x FOR PEER REVIEW 7 of 20 project obtained 84.61Gb data, of which 57.99 Gb data were obtained from Illumina second-generation sequencing and 26.62 Gb data from Pacbio third-generation sequencing. The genome sequencing coverage reached 91.18% (Figure 2a and Table S4). The genome size calculated by the KMER number/depth is about 942.73 Mbp, and the revised genome size is 939.2 Mbp, with GC contents of 33.69%, further clustered into 11 pseudochromosomes( Figure S1, Table 1). The assembled version was evaluated by various methods, and the results showed that the genome consistency, integrity and accuracy were good. (c) Gene set evidence supporting statistics, de novo, EVM integrate genes supported by de novo prediction; homolog, genes supported by homology prediction during EVM integration; RNA, RNA SEQ supported genes during EVM integration; each evidence support is based on the criterion that the gene overlap is greater than 50%; numbers indicate the number of genes. Following the assembly assisted by Hi-C technology, 97.82% of the contigs were anchored into 11 pseudochromosomes, resulting in a chromosome-scale genome with a size of 942.73 Mb in P. lobata ( Figure 2, Table S5). The lengths of contig N50, scaffold N50 and the longest scaffold are 29.51 Mb, 29.51 Mb and 60.01 Mb, respectively, indicating high continuity of the assembled gnome (Table S6). ii-iv represent distribution of gene density, all repeat element and tandem repeat, respectively; and v-viii show the distribution of miRNA, tRNA, rRNA and snRNA, respectively. (b) The abscissa is the divergence degree between transposition element (TE) sequences annotated in P. lobata genome and corresponding sequences in Repbase. The ordinate is the percentage of TE sequence in the genome under the divergence degree, and different TE is marked with different colors. (c) Gene set evidence supporting statistics, de novo, EVM integrate genes supported by de novo prediction; homolog, genes supported by homology prediction during EVM integration; RNA, RNA SEQ supported genes during EVM integration; each evidence support is based on the criterion that the gene overlap is greater than 50%; numbers indicate the number of genes. Following the assembly assisted by Hi-C technology, 97.82% of the contigs were anchored into 11 pseudochromosomes, resulting in a chromosome-scale genome with a size of 942.73 Mb in P. lobata ( Figure 2, Table S5). The lengths of contig N50, scaffold N50 and the longest scaffold are 29.51 Mb, 29.51 Mb and 60.01 Mb, respectively, indicating high continuity of the assembled gnome (Table S6). The BUSCO evaluation results showed that the genome of P. lobata had assembled 99.2% of the complete BUSCO gene, indicating that the assembly result was complete. In addition, the CEGMA evaluation showed that 248 CEGs (Core Eukaryotic Genes) assembled 240 genes, accounting for 96.77%, indicating that the assembly results were complete (Table S6). Sequence consistency evaluation results showed that the alignment rate of small reads to genome was about 98.79%, and the genome coverage was about 99.97%, indicating that reads and the assembled genome had good consistency (Table S7). The heterozygous SNP ratio of the P. lobata genome was 0.468784%, and the homozygous SNP ratio was 6.9 × 10 −5 %. This result shows that the assembly had a high single base correct rate (Table S8). The results of sequence accuracy evaluation showed that the accuracy value of the genome sequence was 46.0206, indicating that the accuracy of this genome was good.
The repeat sequence statistical results showed that the proportion of repeat was 63.5% in the genome of P. lobata. According to the statistics of repeat sequences, the proportions of Denovo + RepBase, TE proteins and Combined TEs were 61.98%, 7.91% and 62.42%, respectively (Tables S9 and S10). Taking RepBase as the library, the TE divergence distribution was obtained by the Repeat Masker annotation. The abscissa was the divergence between the TE sequence annotated in P. lobata genome and the corresponding sequence in the RepBase. The ordinate was the percentage of TE sequences in the genome under this divergence, and different TEs were marked with different colors (Figure 2b).
Vigna angularis, G. max, Medicago truncatula, Arabidopsis thaliana and Arachis duranensis were selected as annotation species, and the genome of P. lobata was annotated with the software Augustus, Glimmerhmm, SNAP, GeneID and GENSCAN. The results showed the P. lobata genome sequence had 27,356, 27,237, 30,958, 24,987 and 35,325 structural genes homologous with V. angularis, A. duranensis, M. truncatula, A. thaliana and G. max, respectively ( Figure 2c and Table S11). Compared with the annotated species, the average transcript length of genomic structural genes of P. lobata was 3785.5 bp, the average CDS length was 1170.28 bp, the average exons of each gene were 4.98 and the average length was 234.81 bp. The average intron length was 656.43 bp (Figure 3a and Table S12).
Compared with the existing protein library, 97.34% of the structural genes in the P. lobata genome can predict their functions. Compared with the existing protein database, 32,190 of 33,171 structural genes in P. lobata genome can predict the function, accounting for 97.34%; 881 structural genes were not annotated, accounting for 2.66% (Figure 3b and Table S13).
The statistical analysis of non-coding RNA in the P. lobata genome showed that miRNA, tRNA, rRNA and snRNA accounted for 0.024961%, 0.019503%, 0.23% and 0.009531% of the genome sequence, respectively ( Figure 3c and Table S14). The protein sequences predicted by gene structure were aligned with known protein libraries (Swiss_Prot, Nr, KEGG, InterPro, GO and Pfam), and finally 97.34% of structural genes were able to predict the function. (c) By comparing with the known ncRNA library, the ncRNA distribution information of P. lobata genome was obtained. miRNA can degrade its target genes or inhibit the translation of target genes into proteins and has the function of silencing genes. tRNA and rRNA are directly involved in protein synthesis. snRNA is mainly involved in the processing of RNA precursors and is the main component of RNA shears.

Phylogenetic Relationships and Comparative Genomic Analysis
Through the cluster analysis of gene families, the results showed a total of 30,367 orthologous groups with representatives in the 14 species, of which 9449 were shared orthologs and 160 were single-copy orthologs (Figure 4a). The corresponding clustering results of the genomes of P. lobata, P. vulgaris, V. unguiculata, G. max, M. truncatula and A. hypogaea were extracted, and the Venn diagram was drawn. The Venn diagram showed that compared with other species, P. lobata comprised 757 species-specific gene families, including 1874 genes. These included 285 unique gene families between P. lobata and G. max (Figure 4b). The protein sequences predicted by gene structure were aligned with known protein libraries (Swiss_Prot, Nr, KEGG, InterPro, GO and Pfam), and finally 97.34% of structural genes were able to predict the function. (c) By comparing with the known ncRNA library, the ncRNA distribution information of P. lobata genome was obtained. miRNA can degrade its target genes or inhibit the translation of target genes into proteins and has the function of silencing genes. tRNA and rRNA are directly involved in protein synthesis. snRNA is mainly involved in the processing of RNA precursors and is the main component of RNA shears.

Phylogenetic Relationships and Comparative Genomic Analysis
Through the cluster analysis of gene families, the results showed a total of 30,367 orthologous groups with representatives in the 14 species, of which 9449 were shared orthologs and 160 were single-copy orthologs (Figure 4a). The corresponding clustering results of the genomes of P. lobata, P. vulgaris, V. unguiculata, G. max, M. truncatula and A. hypogaea were extracted, and the Venn diagram was drawn. The Venn diagram showed that compared with other species, P. lobata comprised 757 species-specific gene families, including 1874 genes. These included 285 unique gene families between P. lobata and G. max (Figure 4b).
Phylogenetic trees of P. lobata and other leguminous plants were constructed by the maximum likelihood method (ML TREE). Among the 14 species selected for comparison, P. lobata and G. max had the most recent evolutionary relationship (Figure 5a).   (Figure 5b).
The expansion and contraction analysis showed that the most recent common ancestor had 30,353 gene families. Compared with their recent ancestors, P. lobata significantly expanded 1415 gene families (including 6641 genes) and significantly contracted 2354 gene families, including 472 genes (Figure 5c). In the isoflavone metabolic gene family of P. lobata, 41 genes were expanded, and 35 genes were enriched in the molecular function pathway; 7 genes contracted and were all enriched in the molecular function pathway ( Figure S2).  Branches: each branch corresponds to a species (blue end point); branch length: the length of each black horizontal line, which represents the speed of species evolution. The longer it is, the faster it evolves. The lowest black ruler can measure the branch length; node: the intersection of two branches represents the ancestor node of two branches, which is represented by red solid points in the figure. The number next to each solid point represents the support rate. The larger the value is, the more reliable the evolutionary relationship is. The heat map on the far right is the heat map of the support rate of all branches in the evolutionary tree; (b) estimation of divergence time. The number of the node position represents the divergence time of species or species ancestors, in millions of years. The number in brackets represents the confidence range of the divergence time; (c) expansion and contraction in gene families. Green numbers indicate the number of gene families that have expanded during the evolution of species, and red numbers indicate the number of gene families that have contracted.

Positive Selection and WGD Analysis
Taking P. lobata as the foreground branch of the positive selection analysis, and other species M.truncatula, G. max, P. vulgaris, V. unguiculata and A. hypogaea as the background branch, 95 candidate genes subject to positive selection were found through the likelihood ratio test (Q-value < 0.05). The Fisher test and FDR correction (Q-value < 0.05) were used to analyze the functional enrichment of these positively selected genes. These were enriched in the GO terms of RNA processing, hydrolase and nuclease activity ( Figure S3a), and in KEGG terms, including cell cycle, Fanconi anemia and ABC transporter ( Figure S3b). In the positive selection event, the flavonol 7-O-beta-glucosyltransferase (Plob17707_Plob) gene related to puerarin synthesis was preserved and expanded in evolution.
Five legumes, including M.truncatula, G. max, C. arietinum, V. unguiculata and A. hypogaea were compared with P. lobata for the Muscle multiple sequence alignment to further analyze the WGD events that occurred in the evolution of P. lobata. Based on the alignment results, the 4DTv value was calculated, and the frequency distribution map of 4DTv and gene pairs was drawn. The smaller the abscissa value, the closer it was to the present, and the larger the value, the older it was. The peak within species indicated the WGD, and the peak between species indicated species differentiation. This distribution map showed that P. lobata had two WGDs, about 0.65 once (a gamma event shared by the core dicotyledons), and about 0.25 once (WGD shared by Leguminosae before differentiation with several Leguminosae species), and about 0.05 separated from the nearest G. max. Since then, its own WGD has not occurred ( Figure 6). hypogaea were compared with P. lobata for the Muscle multiple sequence alignm further analyze the WGD events that occurred in the evolution of P. lobata. Based alignment results, the 4DTv value was calculated, and the frequency distribution 4DTv and gene pairs was drawn. The smaller the abscissa value, the closer it wa present, and the larger the value, the older it was. The peak within species indica WGD, and the peak between species indicated species differentiation. This distr map showed that P. lobata had two WGDs, about 0.65 once (a gamma event shared core dicotyledons), and about 0.25 once (WGD shared by Leguminosae differentiation with several Leguminosae species), and about 0.05 separated fr nearest G. max. Since then, its own WGD has not occurred ( Figure 6).     Figure S4). Compared with the G. max genome, six unique PlUGT43 homologous genes were retrieved from the P. lobata genome ( Figure S5).  Figure  S4). Compared with the G. max genome, six unique PlUGT43 homologous genes were retrieved from the P. lobata genome ( Figure S5).

Combined Transcriptome and Metabolome Analysis Reveals the Metabolic Mechanism of Puerarin
In this experiment, five P. lobata cultivars were selected, and the content of total flavonoids in their roots was significantly different. The content of total flavonoids in samples A, B, C, D and E was 7.13%, 7.96%, 11.57%, 9.34% and 10.21%, respectively (Table  S1). The transcriptome and metabolome analysis of P. lobata was tested from different cultivars to better explore the relationship between the puerarin synthesis and related metabolic genes (Figure 8). The PCA analysis results of metabolome and transcriptome data showed that the correlation of biological duplicate samples within genotypes was much greater than that between genotypes, which proved the reliability of experimental sample data. We selected the group with the largest differential metabolites and differentially expressed genes (DEGs) for the comparative analysis. The heat map of direct correlation between DEGs and differential metabolites showed that isoflavones and flavonoid metabolites had highly correlated with expressed genes. The results showed that there were 934 different metabolites in group A and group C, including 223 flavonoids and 42 isoflavones. Among the differential metabolites, isoliquiritigenin, liquiritigenin, daidzin, daidzein, puerarin and puerarin-4′-o-glucoside which are unique to P. lobata, were detected (Table S15). A total of 348 DEGs were identified in the 2 groups, including GLYMA_01G232400 4-coumarate CoA ligase (4CL), GLYMA_ 18G285800 chalcone reductase (CHR), GLYMA_03G181600 and GLYMA_ 19G182300 (PAL), which

Combined Transcriptome and Metabolome Analysis Reveals the Metabolic Mechanism of Puerarin
In this experiment, five P. lobata cultivars were selected, and the content of total flavonoids in their roots was significantly different. The content of total flavonoids in samples A, B, C, D and E was 7.13%, 7.96%, 11.57%, 9.34% and 10.21%, respectively (Table S1). The transcriptome and metabolome analysis of P. lobata was tested from different cultivars to better explore the relationship between the puerarin synthesis and related metabolic genes ( Figure 8). The PCA analysis results of metabolome and transcriptome data showed that the correlation of biological duplicate samples within genotypes was much greater than that between genotypes, which proved the reliability of experimental sample data. We selected the group with the largest differential metabolites and differentially expressed genes (DEGs) for the comparative analysis. The heat map of direct correlation between DEGs and differential metabolites showed that isoflavones and flavonoid metabolites had highly correlated with expressed genes. The results showed that there were 934 different metabolites in group A and group C, including 223 flavonoids and 42 isoflavones. Among the differential metabolites, isoliquiritigenin, liquiritigenin, daidzin, daidzein, puerarin and puerarin-4 -o-glucoside which are unique to P. lobata, were detected (Table S15). A total of 348 DEGs were identified in the 2 groups, including GLYMA_01G232400 4-coumarate CoA ligase (4CL), GLYMA_ 18G285800 chalcone reductase (CHR), GLYMA_03G181600 and GLYMA_ 19G182300 (PAL), which were positively correlated with the synthesis of liquiritigenin, daidzein and puerarin, and the correlation coefficient was higher than 0.8 (Table S16). were positively correlated with the synthesis of liquiritigenin, daidzein and puerarin, and the correlation coefficient was higher than 0.8 (Table S16). Nine quadrant diagram of differential metabolites and genes. The difference multiples of substances with Pearson's correlation coefficient greater than 0.80 and p-value less than 0.05 in each difference group are shown by the 9 quadrant diagram, which is divided into 1-9 quadrants from left to right and from top to bottom with black dotted lines. The abscissa represents log2FC of genes and the ordinate represents log2FC of metabolites. (c) KEGG enrichment analysis of differential metabolites and expressed genes. The abscissa represents the enrichment factor of this pathway in different omics, and the ordinate represents the name of KEGG pathway; The steady transition from red to yellow to green shows the shift in the significance of enrichment from high to medium to low., expressed by p-value; the shape of bubbles represents different omics, and the size of bubbles represents the number of different metabolites or genes. The larger the number, the larger the dot. (d) Correlation heat map of differential genes and metabolites. Each row is a gene, and each column is a metabolite. Red represents a positive correlation between genes and metabolites, and green represents a negative correlation between genes and metabolites.

WGCNA Analysis of DEGs
The WGCNA was performed on the expressed genes of four P. lobata samples with large differences in the isoflavone content to further identify the gene co-expression modules of isoflavone and puerarin metabolism in P. lobata. After filtering out the lowexpression genes, 19,163 genes were reserved for network construction. Figure 9a,b show the fitting index and average connectivity under different soft thresholds. The analysis set the fitting index as R 2 = 0.85, and the weighting coefficient β = 14 and the scale-free network construction was carried out. Nine quadrant diagram of differential metabolites and genes. The difference multiples of substances with Pearson's correlation coefficient greater than 0.80 and p-value less than 0.05 in each difference group are shown by the 9 quadrant diagram, which is divided into 1-9 quadrants from left to right and from top to bottom with black dotted lines. The abscissa represents log2FC of genes and the ordinate represents log2FC of metabolites. (c) KEGG enrichment analysis of differential metabolites and expressed genes. The abscissa represents the enrichment factor of this pathway in different omics, and the ordinate represents the name of KEGG pathway; The steady transition from red to yellow to green shows the shift in the significance of enrichment from high to medium to low., expressed by p-value; the shape of bubbles represents different omics, and the size of bubbles represents the number of different metabolites or genes. The larger the number, the larger the dot. (d) Correlation heat map of differential genes and metabolites. Each row is a gene, and each column is a metabolite. Red represents a positive correlation between genes and metabolites, and green represents a negative correlation between genes and metabolites.

WGCNA Analysis of DEGs
The WGCNA was performed on the expressed genes of four P. lobata samples with large differences in the isoflavone content to further identify the gene co-expression modules of isoflavone and puerarin metabolism in P. lobata. After filtering out the low-expression genes, 19,163 genes were reserved for network construction. Figure 9a,b show the fitting index and average connectivity under different soft thresholds. The analysis set the fitting index as R 2 = 0.85, and the weighting coefficient β = 14 and the scale-free network construction was carried out. tween the module characteristic genes and five isoflavones (daidzein, 2,7,4′-trihydroxy-isoflavone, isoliquiritigenin, puerarin and liquiritigenin). As shown in the figure, the midnight blue and yellow modules were positively correlated with the content of puerarin, while the blue module negatively correlated with the content of puerarin. The characteristic genes of these three modules highly correlated with the accumulation of puerarin. Figure 9c shows the correlation network diagram between the module characteristic genes and isoflavones and puerarin metabolites. The diagram shows that the turquoise module had a high correlation at various concentrations in various parts. The core genes and transcription factors related to isoflavone metabolism were screened out from the modules, and the Pearson correlation analysis was carried out between these genes and transcription factors and five isoflavones (daidzein, 2,7,4′-trihydroxyisoflavone, isoliquiritigenin, puerarin and liquiritigenin). Figure 10a shows the correlation heat map between the module characteristic genes and isoflavones and puerarin metabolites. As shown in the figure, the turquoise module had a high correlation in all parts and concentrations. Figure 10b shows the network relationship between isoflavone synthesis-related genes, transcription factors (TFs) and puerarin isoflavones. From the midnight blue and yellow modules, we screened 68 TFs positively related to the synthesis of puerarin and isoflavones. The results showed that MYBs (GLYMA_06g160500 and GLYMA _11g215800), WRKYs (GLYMA_08g118200 and GLYMA_09g254800), bHLH (GLYMA_02g174800) and other TFs were involved in the synthesis of puerarin. The clustering of co-expressed genes was dynamically cut, and 16 modules were generated after combining similar clusters. The genes contained in the gray module showed chaotic expression patterns, which were not considered in the subsequent co-expression analysis. Therefore, we finally determined 15 co-expression modules. According to the gene module cluster tree (Figure 9a), all co-expression modules were divided into two categories, and two different modules had different degrees of correlation. In the correlation heat map (Figure 9c) of the module, blue represented negative correlation, and the closer the value was to 0, the greater the degree of negative correlation. Red represented positive correlation, and the closer the value was to 1, the greater the positive correlation.
The WGCNA analysis yielded 15 modules with similar gene expression patterns, each containing 5 to 432 genes (Figure 9a). Figure 9b shows the correlation analysis between the module characteristic genes and five isoflavones (daidzein, 2,7,4 -trihydroxyisoflavone, isoliquiritigenin, puerarin and liquiritigenin). As shown in the figure, the midnight blue and yellow modules were positively correlated with the content of puerarin, while the blue module negatively correlated with the content of puerarin. The characteristic genes of these three modules highly correlated with the accumulation of puerarin. Figure 9c shows the correlation network diagram between the module characteristic genes and isoflavones and puerarin metabolites. The diagram shows that the turquoise module had a high correlation at various concentrations in various parts.
The core genes and transcription factors related to isoflavone metabolism were screened out from the modules, and the Pearson correlation analysis was carried out between these genes and transcription factors and five isoflavones (daidzein, 2,7,4 -trihydroxyisoflavone, isoliquiritigenin, puerarin and liquiritigenin). Figure 10a shows the correlation heat map between the module characteristic genes and isoflavones and puerarin metabolites. As shown in the figure, the turquoise module had a high correlation in all parts and concentrations. Figure 10b shows the network relationship between isoflavone synthesis-related genes, transcription factors (TFs) and puerarin isoflavones. From the midnight blue and yellow modules, we screened 68 TFs positively related to the synthesis of puerarin and isoflavones. The results showed that MYBs (GLYMA_06g160500 and GLYMA _11g215800), WRKYs (GLYMA_08g118200 and GLYMA_09g254800), bHLH (GLYMA_02g174800) and other TFs were involved in the synthesis of puerarin. Figure 10. The correlation heat map and network relationship between gene expression patterns and four different isoflavone forms contents in Blue and Grey60 modules. (a) Heat map of correlation coefficient between different isoflavone forms' contents and module characteristic genes; red and blue blocks represent positive and negative correlation, respectively. (b) Regulating network between the selected genes and four different isoflavone forms' compounds in P. lobata. The positive correlation threshold is set to be greater than or equal to 0.5, the negative correlation threshold is set to be less than or equal to −0.5 and the p-value threshold is less than 0.5, R version 3.6.1, igraph1.2.6.

Discussion
P. lobata is an important medicinal and edible homologous plant that is widely cultivated in Asian countries [33]. The stem skin fiber of P. lobata is often used as a raw material for weaving and papermaking in the industry [3]. P. lobata has been considered as an important traditional Chinese medicine and homologous food for hundreds of years, with economic market potential. Its roots are not only nutritious, but also have many pharmacological properties, including flavonoids and isoflavones, which are widely used to treat and prevent various diseases [10,34]. In the query of specimens, it is found that the identification of P. lobata, P. thomsonii and P. montana in most specimens is confusing, especially in the older specimens, where many of them identify P. thomsonii, and P. montana as P. lobata. At present, only P. lobata and P. thomsonii are used for traditional Chinese medicine and food, respectively [35]. The root tubers of P. lobata show a higher isoflavone content, especially puerarin, which is called kudzu in the Chinese Pharmacopoeia. However, the root tubers of P. thomsonii show a higher starch content but a lower isoflavone content, and therefore it is called starch kudzu, which is generally used for food [36]. The genome sequence, transcriptome and metabolite analyses reported in the study might help understand the biosynthesis of these natural products.
Previously, Shang reported the first high-quality chromosome-scale genome of P. thomsonii [13]. The genome size was ~1.37 GB, with a contig N50 of 593.7 kb. The genome structural annotation resulted in 869.33 Mb repeat regions (62.7% of the genome) and 45,270 protein-coding genes. A total of 572 genes that were upregulated in the puerarin biosynthesis pathway were identified, and 235 candidate genes were further enriched by transcriptome data [13]. As another wild species of Pueraria, the genome size of P. montana Figure 10. The correlation heat map and network relationship between gene expression patterns and four different isoflavone forms contents in Blue and Grey60 modules. (a) Heat map of correlation coefficient between different isoflavone forms' contents and module characteristic genes; red and blue blocks represent positive and negative correlation, respectively. (b) Regulating network between the selected genes and four different isoflavone forms' compounds in P. lobata. The positive correlation threshold is set to be greater than or equal to 0.5, the negative correlation threshold is set to be less than or equal to −0.5 and the p-value threshold is less than 0.5, R version 3.6.1, igraph1.2.6.

Discussion
P. lobata is an important medicinal and edible homologous plant that is widely cultivated in Asian countries [33]. The stem skin fiber of P. lobata is often used as a raw material for weaving and papermaking in the industry [3]. P. lobata has been considered as an important traditional Chinese medicine and homologous food for hundreds of years, with economic market potential. Its roots are not only nutritious, but also have many pharmacological properties, including flavonoids and isoflavones, which are widely used to treat and prevent various diseases [10,34]. In the query of specimens, it is found that the identification of P. lobata, P. thomsonii and P. montana in most specimens is confusing, especially in the older specimens, where many of them identify P. thomsonii, and P. montana as P. lobata. At present, only P. lobata and P. thomsonii are used for traditional Chinese medicine and food, respectively [35]. The root tubers of P. lobata show a higher isoflavone content, especially puerarin, which is called kudzu in the Chinese Pharmacopoeia. However, the root tubers of P. thomsonii show a higher starch content but a lower isoflavone content, and therefore it is called starch kudzu, which is generally used for food [36]. The genome sequence, transcriptome and metabolite analyses reported in the study might help understand the biosynthesis of these natural products.
Previously, Shang reported the first high-quality chromosome-scale genome of P. thomsonii [13]. The genome size was~1.37 GB, with a contig N50 of 593.7 kb. The genome structural annotation resulted in 869.33 Mb repeat regions (62.7% of the genome) and 45,270 protein-coding genes. A total of 572 genes that were upregulated in the puerarin biosynthesis pathway were identified, and 235 candidate genes were further enriched by transcriptome data [13]. As another wild species of Pueraria, the genome size of P. montana was~978.59 Mb, with contig N50 of 80.18 Mb. A comparative genomics analysis showed that the genome size of P. montana was smaller than that of P. thomsonii because of fewer repetitive sequences and duplicated genes [14]. Compared with the previous two species of pueraria, the genome of P. lobata was smaller and had fewer repetitive sequences and duplicated genes: the genome size of P. lobata was only~939 Mb. The genome reported in P. lobata shows that the repetitive sequences accounted for 63.50% of the P. lobata genome, and a total of 33,171 coding genes were predicted, of which 97.34% could predict the function. A total of 224 genes related to flavonoid metabolism were enriched in this study, including 40 genes related to the isoflavone synthesis. P. lobata and P. montana were both wild species of pueraria, with less artificial cultivation and intervention. However, because of the low content of medicinal ingredients, P. montana was not used as traditional Chinese medicine [3,36]. P. thomsonii was more edible than medicinal in evolution, with faster growth speed and stronger edible roots, due to the artificial introduction and the change in the cultivation environment. The growth of P. lobata was slower, and it had higher isoflavone content. These findings may be related to genome duplication and changes in evolution.
Although P. lobata and P. thomsonii both were recorded in the Chinese Pharmacopoeia as having high medicinal value, modern studies have confirmed large differences between the two varieties in material basis and efficacy [37]. Shang's study showed that P. lobata had higher amounts of syringaresinol-4 O-glucoside and disinapoyl glucoside, and P. thomsonii had higher amounts of glycycoumarin and 2-hydroxyadenosine. It was suggested that the specific compounds found in a particular variety could be used to differentiate the Pueraria varieties [36]. We detected 223 flavonoids in P. lobata, including 42 isoflavones. Besides the daidzein and puerarin commonly found in Pueraria plants, more isoflavones were also detected, such as daidzein-7-O-(2 '-benzoyl) rhamnoside, daidzein-7-O-Glucoside-4 -O-Apioside, genistein-7-O-(6 '-malonyl) glucoside, licoisoflavone B, calycosin-7-O-glucoside and other substances. These substances not only enriched isoflavones in P. lobata, but also increased the content of its medicinal active ingredients. These new active substances, which were different from those in P. thomsonii, might be related to the expansion of the glycosidase gene in the P. lobata genome.
As an isoflavonoid, puerarin is biosynthesized via the phenylpropanoid pathway by the hydroxylation of liquiritigenin at the C-2 position to yield its isoflavonoid skeleton. However, the reaction step for C-glucosylation in puerarin biosynthesis remains an enigma [4]. Based on the labeling studies in P. lobata roots, the chalcone substrate (isoliquiritigenin), but not the isoflavone substrate (daidzein), was purported to be an intermediate in the pathway to puerarin [38]. Wang's study revealed that PlUGT43 possessed an activity for the C-glucosylation of daidzein to puerarin, and it showed activity with the isoflavones daidzein and genistein, but displayed no activity towards other potential acceptors, including flavonoids [4]. In this study, six unique PlUGT43 homologous genes were retrieved from the genome of P. lobata, and no 2-hydroxyisoflavanone 8-C-glucoside was found in the metabolites. This also confirmed that puerarin was synthesized mainly from the glycation of daidzein (Figure 1).
Besides the influence of structural genes in the metabolic pathway on puerarin synthesis, TFs are also important switches for regulating isoflavone synthesis. However, reports about the role of TFs in the regulation of puerarin synthesis are rare [4,39]. The TF regulation of the flavonoid pathway has been extensively studied in many plant species, such as Z. mays, A. thaliana, M. domestica and so on [40][41][42]. Among these, TFs of MYB, bHLH and WD40 function individually or collaborate as an MBW complex to control multiple enzymatic steps in the flavonoid pathway [43]. The biosynthesis of flavonoids or isoflavonoids has been extensively studied in model plants, but not in non-model plants due to the lack of genomic and genetic information. Shen found that the transcription levels of PlMYB1, PlHLH3-4 and PlWD40-1 genes were closely correlated with isoflavonoid accumulation profiles in different tissues and cell cultures of kudzu [39]. The over-expression of PlMYB1 in A. thaliana significantly increased the accumulation of anthocyanins in leaves and proanthocyanidins in seeds by activating AtDFR, AtANR and AtANS genes [39]. The combined analysis of the genome and transcriptome of P. thomsonii indicated that 41 PlbHLHs showed root-specific expression patterns, and seven of them exhibited upregulated expression after MeJA treatment. Using qRT-PCR validation, five puerarin synthesis-related genes had a similar expression pattern as the seven PlbHLH genes in response to MeJA within 24 h [13]. In this study, 68 TFs related to isoflavone and puerarin synthesis were obtained in the P. lobata genome, including two bHLH, six MYB and four WRKY. The data suggested that these key TFs might be involved in the expression regulation of the structural genes of the puerarin synthesis pathway.

Conclusions
This study reported a chromosome-level genome assembly for P. lobata and its characteristics. P. lobata has a smaller genome size and fewer annotated genes compared with the cultivated variety P. thomsonii. The genome size is~934 Mb, with a contig N50 of 29.51 Mbp. Approximately 97.82% of the assembled sequences are represented by 11 pseudochromosomes. We identified that the repetitive sequences accounted for 63.50% in the genome, and a total of 33,171 coding genes were predicted, of which 97.34% could predict the function. Compared with other leguminous plants, the genome of P. lobata has undergone significant contraction and expansion. The genome evolution analysis revealed that P. lobata was closely related to G. max and underwent two ancient WGD events. The multi-omics analysis revealed that not only did 4CL, CHR, CHI and other genes participate in the synthesis of puerarin and daidzein, but also WRKY, bHLH, MYB and other TFs participated in the expression regulation of structural genes. In particular, six unique PlUGT43 homologous genes were retrieved from the genome of P. lobata, and no 2-hydroxyisoflavanone 8-C-glucoside was found in the metabolites. We speculated that puerarin was synthesized mainly from the glycation of daidzein, and homologous PlUGT further catalyzed the conversion of puerarin into puerarin-4 -o-glucoside. This study developed the important multi-omics data for exploiting and improving P. lobata as an economic crop for edible traditional Chinese medicine.
Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/biom12121731/s1, Figure S1. According to the HIC data obtained by sequencing, mount the assembled contings/scaffolds sequence to the near chromosome level with Allhic software, and then manually correct it according to the intensity of chromosome interaction with juicebox software to finally obtain the genome at the chromosome level. Figure S2. Analysis of gene expansion and contraction enrichment in P. lobata. Figure S3. Positively selected genes enrichment analysis in genome of P. lobata. Figure S4. Collinearity analysis of chromosomes between P. lobata and other leguminous plants. Figure S5. Evolutionary analysis of PlUGT43 gene family in puerarin synthesis pathway. Table S1 P. lobata sample collection information. Table S2 P. lobata genome assembly was accomplished using analysis tools.

Informed Consent Statement: Not applicable.
Data Availability Statement: This Whole Genome Shotgun project and Transcription project of Pueraria lobata (wild.) Ohwi has been deposited in DDBJ/ENA/GenBank and SRA under the umbrella of BioProject Accession PRJNA872914. The genome accession described in this paper is SRR21196526. The relevant data of metabolome have been uploaded to the GSA database, with the login number of OMIX002296-01, and the login numbers of transcriptome data are OMIX002296-02 and OMIX002296-03.