Exploring the Phytochemical Landscape of the Early-Diverging Flowering Plant Amborella trichopoda Baill.

Although the evolutionary significance of the early-diverging flowering plant Amborella (Amborella trichopoda Baill.) is widely recognized, its metabolic landscape, particularly specialized metabolites, is currently underexplored. In this work, we analyzed the metabolomes of Amborella tissues using liquid chromatography high-resolution electrospray ionization mass spectrometry (LC-HR-ESI-MS). By matching the mass spectra of Amborella metabolites with those of authentic phytochemical standards in the publicly accessible libraries, 63, 39, and 21 compounds were tentatively identified in leaves, stems, and roots, respectively. Free amino acids, organic acids, simple sugars, cofactors, as well as abundant glycosylated and/or methylated phenolic specialized metabolites were observed in Amborella leaves. Diverse metabolites were also detected in stems and roots, including those that were not identified in leaves. To understand the biosynthesis of specialized metabolites with glycosyl and methyl modifications, families of small molecule UDP-dependent glycosyltransferases (UGTs) and O-methyltransferases (OMTs) were identified in the Amborella genome and the InterPro database based on conserved functional domains. Of the 17 phylogenetic groups of plant UGTs (A–Q) defined to date, Amborella UGTs are absent from groups B, N, and P, but they are highly abundant in group L. Among the 25 Amborella OMTs, 7 cluster with caffeoyl-coenzyme A (CCoA) OMTs involved in lignin and phenolic metabolism, whereas 18 form a clade with plant OMTs that methylate hydroxycinnamic acids, flavonoids, or alkaloids. Overall, this first report of metabolomes and candidate metabolic genes in Amborella provides a starting point to a better understanding of specialized metabolites and biosynthetic enzymes in this basal lineage of flowering plants.


Introduction
Amborella (Amborella trichopoda Baill.), a short shrub native to the tropical rainforests of New Caledonia, is the only living species in the Amborellales, the earliest diverging order of flowering plants (angiosperms) [1]. Despite its widely recognized importance in understanding flowering plant phylogeny and land plant evolution, little is known about the metabolomes, in particular specialized metabolites (secondary metabolites), of Amborella. To date, only three specialized metabolites, procyanidin, kaempferol-3-O-glucoside, and kaempferol-3-O-rutinoside, have been tentatively identified from Amborella leaves according to the R f values of the compounds measured using paper and column chromatography [2]. processing, 63, 39, and 21 compounds were tentatively identified in leaves, stems, and roots of Amborella, respectively, with peak areas ranging from 5.4 × 10 6 to 2.7 × 10 10 (Tables S1-S3). In Amborella leaves, the detectable free amino acids include isoleucine, arginine, aspartic acid, glutamine, and glutamic acid as well as the three aromatic amino acids, phenylalanine, tyrosine, and tryptophan, and their acetylated derivatives acetylphenylalanine and acetyltryptophan (Table S1). A group of organic acids including gluconate, citrate, malic acid, citramalate, 2-isopropylmalic acid, glutaric acid, 2-hydroxyisocaproic acid, and azelaic acid was found. Several cofactors, such as oxidized glutathione, flavin adenine dinucleotide, and riboflavin, as well as the nucleobase adenine and its nucleoside derivative adenosine, were also detected. In addition, Amborella leaves contain simple sugars (sucrose and raffinose), nucleotide sugars (UDP-glucose and UDP-xylose), and phosphate sugars (mannose 6-phosphate) (Table S1).
Phenylalanine serves as the biosynthetic precursor of phenylpropanoids. In addition to phenylalanine (in the free amino acid form), metabolites in the general phenylpropanoid pathway and their derivatives were found in high abundance in Amborella leaves, including coumaric acid, methyl cinnamate, 4-hydroxy-3-methoxycinnamate, 3,5-dimethoxycinnamic acid, 3-phenyllactic In Amborella leaves, the detectable free amino acids include isoleucine, arginine, aspartic acid, glutamine, and glutamic acid as well as the three aromatic amino acids, phenylalanine, tyrosine, and tryptophan, and their acetylated derivatives acetylphenylalanine and acetyltryptophan (Table S1). A group of organic acids including gluconate, citrate, malic acid, citramalate, 2-isopropylmalic acid, glutaric acid, 2-hydroxyisocaproic acid, and azelaic acid was found. Several cofactors, such as oxidized glutathione, flavin adenine dinucleotide, and riboflavin, as well as the nucleobase adenine and its nucleoside derivative adenosine, were also detected. In addition, Amborella leaves contain simple sugars (sucrose and raffinose), nucleotide sugars (UDP-glucose and UDP-xylose), and phosphate sugars (mannose 6-phosphate) (Table S1).

Figure 2.
A simplified scheme of the general phenylpropanoid, flavonoid, anthocyanin, and proanthocyanidin biosynthetic pathways in Amborella trichopoda. These aglycones lead to the formation of diverse glycosidic derivatives (Tables S1-S3). Metabolites present in Amborella leaves are shown in blue, and methylated compounds are shaded in gray. Dotted arrows denote multiple enzymatic steps.

Diverse Metabolites in Amborella Stems and Roots
Amborella stems contain the free amino acids lysine, 5-oxo-proline, and 2-aminoadipic acid, in addition to arginine, aspartic acid, glutamic acid, glutamine, tyrosine, phenylalanine, and acetyltryptophan that are also identified in leaves (Table S2). Three organic acids, malic acid, Figure 2. A simplified scheme of the general phenylpropanoid, flavonoid, anthocyanin, and proanthocyanidin biosynthetic pathways in Amborella trichopoda. These aglycones lead to the formation of diverse glycosidic derivatives (Tables S1-S3). Metabolites present in Amborella leaves are shown in blue, and methylated compounds are shaded in gray. Dotted arrows denote multiple enzymatic steps.

Diverse Metabolites in Amborella Stems and Roots
Amborella stems contain the free amino acids lysine, 5-oxo-proline, and 2-aminoadipic acid, in addition to arginine, aspartic acid, glutamic acid, glutamine, tyrosine, phenylalanine, and acetyltryptophan that are also identified in leaves (Table S2). Three organic acids, malic acid, saccharic acid, and 2-hydroxyisocaproic acid, are present in stems. Besides sucrose, raffinose, mannose 6-phosphate, and UDP-glucose that are detectable in both leaves and stems, stems also accumulate trehalose (Table S2).

Large Families of UDP-Dependent Glycosyltransferases (UGTs) and O-Methyltransferases (OMTs) in the Amborella Genome
To explore the UGTs and OMTs that generate the diverse glycosylated and/or methylated specialized metabolites in Amborella tissues, the fully sequenced Amborella genome [14] and the InterPro database for protein sequence analysis and classification were searched using functional domains common to plant small molecule UGTs or OMTs. There are 87 putative UGTs in Amborella that contain the classic PSPG motif conserved in plant UGTs and are over 340 aa in length (the minimum size of a functionally characterized plant UGT) ( Figure S1). To understand the evolutionary relationship among the UGTs, a neighbor-joining tree was built with the Amborella UGTs and selected UGTs representing different plant UGT phylogenetic groups ( Amborella UGTs per group, only one Amborella UGT is present each in groups F, K, and Q. On the other hand, groups G, A, and E are relatively abundant with Amborella UGTs, containing 7, 9, and 13 UGTs, respectively. Mostly notably, 25% of the Amborella UGTs (22 out of 87) belong to group L ( Figure 3). There are also 9 Amborella UGTs in the outgroup clade of plant UGTs that glycosylate sterols and lipids [15]. Interestingly, the Amborella UGT W1PRF4 is not associated with any of the currently defined plant UGT clades ( Figure 3).  Twenty-five Amborella proteins (51 to 395 aa, average size 216 ± 111 aa) were predicted to contain the O-methyltransferase, class I-like SAM-dependent O-methyltransferase, O-methyltransferase COMT-type, or plant methyltransferase dimerization domain ( Figure 4). The protein sequences of Amborella OMTs and selected functionally characterized plant small-molecule OMTs were aligned ( Figure S2). Because some of the pairwise distances could not be estimated from the multiple sequence alignment, a character-based method, the maximum likelihood method, was used for building the OMT phylogeny instead of neighbor-joining, which requires a distance matrix ( Figure  4). Eighteen Amborella OMTs (including a tight cluster of 13 OMTs) were associated with plant OMTs functioning towards hydroxycinnamic acids, flavonoids, or alkaloids ( Figure 4). Interestingly, within the same clade, W1NLC7 (358 aa) grouped closely (bootstrap value 72) with two monocot OMTs, ZmCOMT (maize) and TaOMT2 (wheat), which utilize caffeic acid as substrate. Seven Amborella OMTs clustered with plant CCoA OMTs, including U5CZ55, U5D229, U5CZN4, U5D0E8, and U5D520 that fall in the same group as SlCCoAOMT and McCCoAOMT, U5CX90 that is associated with other plant CCoA OMTs, and W1PM29 that is more distant from the other Amborella CCoA OMTs (Figure 4). It should be noted that the SABATH OMTs from Arabidopsis thaliana, Antirrhinum majus, and Clarkia breweri were used as the outgroup for the phylogenetic analysis. Thirteen Amborella proteins contain the SAM-dependent carboxyl methyltransferase domain (present in SABATH OMTs) but were not included in the phylogenetic analysis.  Figure S2). Because some of the pairwise distances could not be estimated from the multiple sequence alignment, a character-based method, the maximum likelihood method, was used for building the OMT phylogeny instead of neighbor-joining, which requires a distance matrix (Figure 4). Eighteen Amborella OMTs (including a tight cluster of 13 OMTs) were associated with plant OMTs functioning towards hydroxycinnamic acids, flavonoids, or alkaloids ( Figure 4). Interestingly, within the same clade, W1NLC7 (358 aa) grouped closely (bootstrap value 72) with two monocot OMTs, ZmCOMT (maize) and TaOMT2 (wheat), which utilize caffeic acid as substrate. Seven Amborella OMTs clustered with plant CCoA OMTs, including U5CZ55, U5D229, U5CZN4, U5D0E8, and U5D520 that fall in the same group as SlCCoAOMT and McCCoAOMT, U5CX90 that is associated with other plant CCoA OMTs, and W1PM29 that is more distant from the other Amborella CCoA OMTs (Figure 4). It should be noted that the SABATH OMTs from Arabidopsis thaliana, Antirrhinum majus, and Clarkia breweri were used as the outgroup for the phylogenetic analysis. Thirteen Amborella proteins contain the SAM-dependent carboxyl methyltransferase domain (present in SABATH OMTs) but were not included in the phylogenetic analysis.

Discussion
Overall, this nontargeted, metabolite-profiling study revealed the presence of diverse groups of phytochemicals in Amborella tissues. Putative metabolite identification was accomplished by querying the Amborella metabolites against authentic standards in multiple mass spectral libraries followed by manual inspection of the matched spectra. The accumulation of phenolic specialized metabolites in Amborella tissues suggests a major role of these compounds in Amborella interactions with the environment. On the other hand, the low abundance of terpenoids and alkaloids in these tissues could be due to either the lack of biosynthetic genes and enzymes or inducible biosynthesis that only occurs under stress conditions.
An interesting observation was that feruloyl tyramine and coumaroyl tyramine were identified in stems and/or roots, but not leaves (Tables S1-S3). Hydroxycinnamoyl tyramines are reportedly phytoalexins with increased production in response to wounding [16] and inoculation of pathogens [17] in other plant species. This poses the question of whether feruloyl tyramine and coumaroyl tyramine constitute a form of chemical defense against abiotic and biotic stresses in Amborella stems and roots. If this is the case, it remains to be answered whether these compounds in Amborella have coevolved with pathogens in the environment.
Notably, various glycosylated and methylated phenolics are produced by Amborella tissues, particularly leaves (Tables S1-S3). The diverse glycosylated flavonoids, anthocyanins, and proanthocyanidins present in Amborella leaves may protect the plants from UV radiation, as have been demonstrated in other plants [18]. Although the role of methylated flavonoids in foliar tissues has not been well characterized, methylated isoflavonoids were shown to act as phytoalexins in different plants [19], suggesting that methylated flavonoids in Amborella may also be involved in defense against biotic stress. In addition, the phenylpropanoid pathway derivative methylcinnamate acts as a signaling molecule in plant-insect interactions [20]. Methylcinnamate and its associated compounds, 3,5-dimethoxycinnamic acid and 4-hydroxy-3-methoxycinnamate, were identified in Amborella leaves, suggesting that they could be mediators of Amborella and insect relations.
Intrigued by the occurrence of multiple glycosylated and methylated specialized metabolites, the fully sequenced Amborella genome was explored to identify candidate genes encoding modification enzymes of these compounds [14] (Figures 3 and 4). Phylogenetic analysis showed that a large number of Amborella UGTs (22 out of 87) belonged to group L ( Figure 3). Retention of duplicated genes after whole-genome duplication (WGD) may have led to the large group L UGTs in Amborella. On the other hand, recent gene duplications after the divergence of Amborella from other flowering plants may have also contributed to the expansion of group L UGTs, as suggested by W1NFN9 and W1NFL7 that share 97.1% identity and 97.5% similarity (Figure 3). Group L UGTs from different plants have shown to form glycosides and glucose esters of a wide range of compounds, such as flavonoids, isoflavonoids, benzoic acid derivatives, cinnamic acid derivatives, lignans, hydroxy coumarins, phenylethanoids, hydroquinones, diterpenes, triterpenes, glucosinolates, epoxy sesquiterpenoids, auxins, and xenobiotics [21][22][23][24][25][26][27]. Functional characterization of the group L UGTs in Amborella will help understand whether they are responsible for generating the diverse glycosylated specialized metabolites reported here. Amborella UGTs are absent in groups B, N, and P ( Figure 3). Although the activity of group N UGTs has not been elucidated, group B UGTs are active towards flavonoids, benzoic acid derivatives, and xenobiotics [23,25], whereas group P UGTs glycosylate monoterpenes and triterpenes [26,28]. The lack of groups B and P UGTs in Amborella suggests that glycosylation of flavonoids and terpenoids relies on UGTs in other phylogenetic groups (e.g., group L).
Seven Amborella OMTs are clustered with plant CCoA OMTs (bootstrap value 89) and may be involved in monolignol biosynthesis (Figure 4). Interestingly, of the seven Amborella CCoA OMTs, five group with two CCoA OMTs from the Caryophyllales, SlCCoAOMT and McCCoAOMT, which exhibited activities towards caffeoyl esters and a broad range of flavonoid substrates [29]. U5CX90 is located in the branch of CCoA OMTs from various plant species for lignin biosynthesis. On the other hand, W1PM29 is more distantly related to the other six Amborella CCoA OMTs within the same clade (Figure 4). Thirteen Amborella OMTs are clustered together and group with plant OMTs that use hydroxycinnamic acids, flavonoids or alkaloids as substrates. Notably, another OMT of this clade, W1NLC7, is strongly associated (bootstrap value 72) with two monocot COMTs, ZmCOMT and TaOMT2, which are able to methylate caffeic acid (Figure 4). The recombinant TaOMT2 protein also carried out sequential methylations of the flavone tricetin [30]. These phylogenetic associations of Amborella OMTs with plant OMTs of various activities instigate an exciting next step of characterization of their biochemical properties.
Overall, Amborella tissues are rich in phenolic specialized metabolites, and its genome is abundant in enzymes that modify the core structure of compounds. In the future, the role of these specialized metabolites may be investigated within the context of Amborella interacting with the environment. Functional characterization of the candidate UGTs and OMTs in Amborella will allow for comparative analysis with UGTs and OMTs from other plant lineages for convergence or divergence in enzyme evolution. Glycosylation and methylation have shown to improve the bioavailability and bioactivity of the core molecules (e.g., flavonoids) [31,32]. Elucidating the catalytic features of Amborella enzymes capable of producing specialized metabolites with unique structures (e.g., regiospecific) will enable valuable pharmaceutical biotechnology applications.

Plant Materials
An Amborella plant was obtained from UC Santa Cruz in 2014 by the UC Davis Botanical Conservatory; this plant was propagated vegetatively from an Amborella plant that was originally collected from New Caledonia by Dr. Ray Collett (Amborella is endemic to New Caledonia). A voucher specimen was deposited at the UC Davis herbarium (No. b.2014.123). Leaf and stem tissues were collected from the Amborella plant growing at the UC Davis Botanical Conservatory in April 2017. Cuttings were previously made from this plant in August 2016 for vegetative propagation. Root tissues were obtained from one of the rooted cuttings in April 2017. The plant tissues were harvested using a razor blade and immediately frozen in liquid nitrogen. The leaf, stem, and root tissues were taken and analyzed in triplicate.

Metabolite Analysis
Amborella leaves, stems, and roots were ground into fine powder in liquid nitrogen using a mortar and pestle and then freeze-dried. Fifty milligrams of the lyophilized tissue was extracted using 1 mL of 70% methanol with sonication, followed by centrifugation at 13,000 rpm for 10 min. The supernatant was filtered through a syringe filter (MilliporeSigma, Burlington, MA, USA) and subjected to LC-HR-ESI-MS analysis on an ultra-performance liquid chromatography (UPLC) (Waters, Milford, MA, USA) coupled to a Q Exactive mass spectrometer (Thermo Scientific, Waltham, MA, USA) as previously described [33]. Briefly, metabolite separation was conducted using a BEH C 18 column (Acquity UPLC ® , 100 mm × 2.1 mm, particle size 1. Mass spectra were obtained in the positive and negative ion modes over the mass range of m/z 120-1500 at an ion spray voltage of 4 and 3 kV, respectively. For both types of analysis, the capillary temperature was kept at 320 • C and source temperature at 200 • C. Sheath gas and auxiliary gas were nitrogen at a flow rate of 35 arbitrary units (arb) and 8 arb, respectively. The stepped normalized collision energy (NCE) for MS/MS analysis was at 15% and 40%.

Metabolite Identification
The raw data obtained from the LC-HR-ESI-MS analysis were analyzed using MS-DIAL version 3.20 [34]. Multiple reference mass spectral libraries were used for querying the unknown metabolites, including MassBank [35], RIKEN tandem mass spectral database for phytochemicals (ReSpect) (http://spectra.psc.riken.jp/), the Global Natural Product Social Molecular Networking system (GNPS) (https://gnps.ucsd.edu/ProteoSAFe/libraries.jsp), the Critical Assessment of Small Molecule Identification (CASMI) 2016 library [36], the Fiehn lab hydrophilic interaction liquid chromatography (HILIC) MS/MS library (available at http://prime.psc.riken.jp/Metabolomics_Software/ MS-DIAL/index.html), the Bruker MetaboBASE plant library (https://www.bruker.com/products/massspectrometry-and-separations/ms-oftware/metabolomics-spectral-libraries/overview.html), RIKEN PlaSMA (http://plasma.riken.jp/), and Karolinska Institute and Gunma (GIAR) zic-HILIC deconvoluted MS2 spectra in data independent acquisition [37]. Tentative compound identification was based on the weighted similarity score of accurate mass, isotope ratio, and MS/MS spectra with a cut-off value set at 80. The accurate mass tolerance for MS was set at 0.01 Da and MS/MS at 0.05 Da. The tentatively identified metabolites were further examined by careful manual inspection of the matching MS/MS spectra between the Amborella metabolites and the authentic standards.

Phylogenetic Analysis
The Amborella UGT sequences were obtained from the Ensembl Plants database (https://plants. ensembl.org/info/website/ftp/index.html). There are 87 Amborella proteins that contain the conserved PSPG motif and are longer than 340 amino acids, which is the shortest length reported for a functionally