The Combination of Untargeted Metabolomics and Machine Learning Predicts the Biosynthesis of Phenolic Compounds in Bryophyllum Medicinal Plants (Genus Kalanchoe)

Phenolic compounds constitute an important family of natural bioactive compounds responsible for the medicinal properties attributed to Bryophyllum plants (genus Kalanchoe, Crassulaceae), but their production by these medicinal plants has not been characterized to date. In this work, a combinatorial approach including plant tissue culture, untargeted metabolomics, and machine learning is proposed to unravel the critical factors behind the biosynthesis of phenolic compounds in these species. The untargeted metabolomics revealed 485 annotated compounds that were produced by three Bryophyllum species cultured in vitro in a genotype and organ-dependent manner. Neurofuzzy logic (NFL) predictive models assessed the significant influence of genotypes and organs and identified the key nutrients from culture media formulations involved in phenolic compound biosynthesis. Sulfate played a critical role in tyrosol and lignan biosynthesis, copper in phenolic acid biosynthesis, calcium in stilbene biosynthesis, and magnesium in flavanol biosynthesis. Flavonol and anthocyanin biosynthesis was not significantly affected by mineral components. As a result, a predictive biosynthetic model for all the Bryophyllum genotypes was proposed. The combination of untargeted metabolomics with machine learning provided a robust approach to achieve the phytochemical characterization of the previously unexplored species belonging to the Bryophyllum subgenus, facilitating their biotechnological exploitation as a promising source of bioactive compounds.


Introduction
Bryophyllum constitutes a subgenus within the Kalanchoe genus (Crassulaceae family) that contains several plant species commonly known as "Bryophyllum" in Ethnomedicine [1,2]. Accordingly, different Bryophyllum-derived formulations have been traditionally used worldwide for the treatment of diabetes, cardiovascular, and neoplastic diseases [3,4]. Bryophyllum spp. medicinal properties are a consequence of the production of phenolic compounds as recently established [4][5][6]. Thus, different extracts from Bryophyllum have been reported to exhibit valuable health-promoting properties because of the high contents of phenolic compounds, as determined through different in vitro assays. For instance, antioxidant activity was assessed in terms of free radical scavenging activity, inhibition of lipid peroxidation, and prevention of oxidative hemolysis [6,7], together with anti-inflammatory activity, antimicrobial activity towards a wide range of both bacterial and fungal strains, and anticancer activity. Indeed, the anticancer activity of Bryophyllum was determined towards a wide panel of cancer cell lines, such as MCF-7 breast adenocarcinoma, NCI-H460 non-small cell lung carcinoma, HeLa cervical carcinoma, and HepG2 hepatocellular carcinoma cell lines [6]. Consequently, due to the aforementioned health properties associated with Bryophyllum extracts, novel strategies should be proposed to ensure the large-scale production of phenolic compounds from these underexplored medicinal plants.
Nevertheless, little information is available on the biosynthesis of phenolic compounds in Bryophyllum plants, making difficult their exploitation as a valuable source of bioactive compounds. Concerning the biosynthetic pathway of phenolic compounds, in brief, the major precursor is cinnamic acid, synthesized after the action of phenylalanine ammonia-lyase on the amino acid phenylalanine [8]. Afterwards, cinnamic acid may either be incorporated into the biosynthesis of phenolic acids (C6-C1 and C6-C3 compounds) or undergo an enzymatic transformation to produce coumaroyl-CoA that is assumed to be the common basic structure for the biosynthesis of other subfamilies, namely: lignans ((C6-C3)2), stilbenes (C6-C2-C6), and flavonoids (C6-C3-C6) [9]. Finally, these subfamilies are later subjected to condensation to give rise to polymeric phenolic compounds, as is the case for lignin and tannins [9]. In particular, phenolic acids (protocatechuic acid, caffeic acid, and ferulic acid), flavonoids (myricetin, quercetin, and kaempferol glycosides), and anthocyanins (malvidin glycosides) have been identified as the main phenolic compounds in Bryophyllum plants [5,6,10]. In this sense, to gain insight into the biosynthesis of phenolic compounds, the application of untargeted metabolomics (UM) becomes an essential high-throughput approach: it confers a fast, reliable, and detailed perspective of the metabolite pool found in plants, thus contributing to the identification of their wide array of compounds present in cells and tissues [11,12] but also facilitating the rapid industrial exploitation of those promising bioactive compounds.
Additionally, the establishment of plant tissue culture (PTC) has emerged as a solid methodology to characterize phenolic compound biosynthesis. PTC constitutes a reliable and controlled biotechnological system able to achieve homogeneous and standard bioactive compound production [13][14][15]. For this purpose, the design of optimized culture media formulations is crucial. Its development must be carefully carried out to achieve a correct balance between bioactive compounds accumulation and the preservation of tissue culture integrity [15,16]. Additional factors, such as the genotype or the type of explant, also play important roles in the biosynthesis of phenolic compounds [17,18]. Thus, the great number of factors affecting this process limits the interpretation of results and the achievement of general conclusions [19][20][21].
In the last decade, the application of machine learning (ML) technology has been replacing the traditional statistical methods to easily reveal exhaustive information about multivariate processes in which occluded patterns and complex interactions occur [22][23][24]. Among the different ML algorithms available, the combination of artificial neural networks (ANNs) with fuzzy logic (neurofuzzy logic, NFL) has already been successfully applied in the field of PTC for the characterization and optimization of diverse multifactorial processes, including seed germination [25], micropropagation [26], and the identification of physiological disorders [27]. The application of ANNs provides the establishment of predictive mathematical models obtained after training the empirical data, including the independent variables or factors as inputs and the dependent variables or parameters as outputs, to predict the key factors involved in each parameter, as well as their potential interactions [28]. To enhance the interpretation of the resulting ANN model, the application of fuzzy logic facilitates this task by the formulation of 'IF-THEN' rules, which confers an understandable linguistic definition of the model results [29]. In this way, NFL contributes to the characterization and understanding of complex processes and, simultaneously, it may be regarded as a useful decision-making tool for optimization, as it has been previously used for maximizing the production of phenolic compounds [18].
In this work, a combinatorial approach including UM and ML is proposed in order to decipher the critical factors affecting the biosynthesis of phenolic compounds in three Bryophyllum species cultured in vitro. We hypothesize that the combination of both cuttingedge methodologies will assist in the phytochemical valorization of these unexplored medicinal plants and will confer a novel approach to contribute to their biotechnological exploitation in different sectors, including the food, cosmeceutical, and pharmaceutical industries. Furthermore, due to the vast information provided by both UM and ML, and thanks to their plasticity, their combination will be priceless to increase the knowledge of novel sources of bioactive compounds with beneficial properties on human health, from unexplored plant sources, thus conferring a multidisciplinary workflow regarding the large-scale production of those phytochemicals.
The resulting profile showed a total of 485 putatively annotated compounds. The full list of annotated compounds is provided accompanied by their retention time and composite mass spectrum (Table S1). Flavonoids were the most abundant subfamily of phenolic compounds, mainly characterized by anthocyanins, flavonols, and flavones, followed by phenolic acids, low-molecular-weight phenolics, and other subfamilies. Among the flavonoids, malvidin, and pelargonidin presenting 3-O-or 3,5-O-glycoside bonds were the most abundant anthocyanins, followed by myricetin 3-O-, kaempferol 3-O-, and quercetin 3-O-glycosides as the most representative members of the flavonol subfamily, and apigenin 7-O-, apigenin 6,8-C-, luteolin 6-O-, and luteolin 6-C belonging to the flavone subfamily (Table S1). Concerning the phenolic acid subfamily, hydroxybenzoic acids and hydroxycinnamic acids were the most prevalent compounds, with protocatechuic acid 4-O-glucoside, caffeoylquinic acid mono-and di-glycoside, and cinnamic acid as the most abundant ones. In addition, a significant number of low-molecular-weight phenolics, i.e., alkylphenols, hydroxybenzaldehydes, hydroxycoumarins, hydroxyphenylpropenes, tyrosols, and other simple phenylpropanoids, were also detected (Table S1).
Afterward, a semi-quantification of phenolic compounds was performed, using reference compounds for each subfamily. The results are shown in Figure 1, for both aerial parts ( Figure 1A-H) and roots ( Figure 1I-P). As it can be seen by the statistical analysis performed by factorial analysis of variance (ANOVA), all the factors tested: genotypes, plant organs, and culture media composition, influenced the accumulation of phenolic compounds, as well as their potential interactions (p < 0.001; Table S2). Due to a large number of factors and data collected, the information provided by such analysis for the identification of simple patterns and/or identification of interactions between variables was limited. Consequently, a multivariate statistical approach was carried out to determine the influence of the genotypes and the organs on the biosynthesis of phenolic compounds. An Orthogonal Projection in Latent Structures-Discriminant Analysis (OPLS-DA) was used, revealing the discriminant compounds between species (BD, BH, and BT) and their organs (aerial parts and roots) ( Figure 2). Additionally, the Variable Importance in Projection (VIP) selection method was used to identify the VIP markers implicated in the discrimination between genotypes (Table S3) and plant organs (Table S4).  The three different genotypes, BD, BH, and BT, presented a differential composition in terms of phenolic compound production (Figure 2A), as assessed by the high quality of the generated model in terms of linearity and predictability (R 2 Y = 0.907 and Q 2 Y = 0.856, respectively). Regarding the compounds with the highest contribution to the discrimination between genotypes, anthocyanins, flavonols, and phenolic acids were the most prevalent subfamilies ( Figure 3A), accounting for > 60% of total discriminant compounds. Anthocyanins were mainly represented by cyanidin, malvidin, and pelargonidin glycosides; flavonols were mainly represented by quercetin, kaempferol, and myricetin glycosides; and phenolic acids showed a great heterogeneity, but caffeic, ferulic, and gallic acid derivatives had a higher prevalence (Table S3). Thus, such compounds are predicted to be present in a genotype-dependent manner. Regarding the discrimination of the phenolic profiles of plant organs ( Figure 2B), there was a clear differential pattern between aerial parts and roots, as reflected by the OPLS-DA model with linearity and predictability parameters of high quality (R 2 Y = 0.960 and Q 2 Y = 0.926, respectively). Similar to the genotype-based discrimination, anthocyanins, flavonols, and phenolic acids were predicted as the most relevant subfamilies contributing to such differences, together with low-molecular-weight phenolics (LMW), which included more than 70% of total annotated compounds ( Figure 3B). In this case, anthocyanins were mainly represented by malvidin and cyanidin glycosides, flavonols were represented by quercetin, kaempferol, and myricetin glycosides, phenolic acids were represented by cinnamic acids derivatives, and LMW compounds were mainly represented by catechols and coumarins (Table S4). Accordingly, these compounds are suggested to present an organ-restricted distribution in Bryophyllum.

Machine Learning Prediction of the Biosynthesis of Phenolic Compounds
Once the influence of genotypes and organs on the biosynthesis of phenolic compounds was assessed, the composition of culture media formulations was the remaining factor whose influence had to be determined, as well as the interactions between them. For that purpose, ML modeling was carried out to identify the critical factors affecting the biosynthesis of phenolic compounds in Bryophyllum and deciphering the potential multivariate interactions that may occur. The NFL results are shown in Table 1. The model efficiently predicted seven out of the eight outputs evaluated, showing a high predictability with training set R 2 values > 70% (Table 1). MSE values also assessed the model quality, together with the ANOVA performed, which showed no statistical differences between the experimental and predicted values (F ratio > f critical; Table 1). Only one output could not be predicted by the model: the flavone production (training set R 2 < 70%), probably due to the heterogeneous composition of this subfamily employed for its semi-quantification, which included flavones, flavanones, and related compounds. To interpret how each output was affected by the predicted inputs, the model was accompanied by the generation of 'IF-THEN' rules together with their membership degree, which are shown in Table 2.
The ranked values provided for the inputs are displayed in Figure S1.   LMW biosynthesis was mainly predicted as a function of the interaction organ × sulfate (Table 1). However, additional submodels indicated that the interaction genotype × copper and phosphate concentration played a secondary role in LMW biosynthesis. In fact, LMW concentration was the output with the highest number of submodels, which can be explained by the great heterogeneity of compounds that make up this subfamily: tyrosols, coumarins, and catechols (Table S1). Due to their different biosynthetic origins, it is reasonable to find many factors causing the production of LMW compounds. According to the model, the interaction of roots with high sulfate concentrations (>1.43 mM) caused a high LMW content with the highest membership degree ( Table 2; rule 8). Generally, a high LMW content was observed in both aerial parts and roots under mid-to high sulfate concentrations (0.94 mM; Table 2; rules 3-4, 7-8). In contrast, the combination of aerial parts and low sulfate concentrations (<0.61 mM) caused a low LMW content with the highest membership degree ( Table 2; rule 1). In the case of copper, high LMW content values were obtained by low concentrations in the case of BD and BT (<0.03 mM; Table 2; rules 9 and 15, respectively) and low and mid concentrations in the case of BH (<0.08 mM; Table 2, rules 12-13). In the same way, high LMW contents were caused by low phosphate concentrations (<0.71 mM), thus suggesting an inhibitory role ( Table 2; rule 18).
Phenolic acid biosynthesis was mainly predicted by the interaction genotype × copper and, secondarily, by the organ ( Table 1). The rules for phenolic acid content indicated that high values were primarily due to BD and high copper concentrations (> 0.08 µM; Table 2; rule 24) and, to a lower extent, aerial parts (Table 2; rule 20). In contrast, a low phenolic acid content was determined for the rest of the conditions, being the combination of BD with moderate copper concentrations (0.03-0.08 µM), the condition showing the low value with the highest membership degree ( Table 2; rule 23).
In the case of lignans, only one model was generated, predicted by the interaction genotype × sulfate × organ (Table 1). In this way, high levels were only observed in the case of BH combined with high sulfate concentrations (>1.11 mM; Table 2; rules 37 and 38), but aerial parts presented the highest membership degree (0.84; Table 2; rule 37). On the contrary, a low lignan content was observed for the rest of the conditions, being the combination of BT, roots, and low sulfate concentrations (<1.11 mM) that showed the highest membership degree (0.90; Table 2; rule 40).
Stilbene biosynthesis showed the most complex prediction since it was mainly predicted by the combination of calcium × organ × genotype and, secondarily, by the interaction genotype × phosphate × organ (Table 1). Thus, a high stilbene content was predominantly caused by the combination of BH, aerial parts, and low calcium concentrations (<1.68 mM; Table 2; rule 44), whereas a low stilbene content was obtained also by the combination of BH and aerial parts but with a high calcium concentration (>1.68 mM; Table 2; rule 50). In the second submodel, high stilbene concentrations were caused by low phosphate concentrations (<0.43 mM) in BD and BT (Table 2; rules 55-56 and 67-68, respectively), but high phosphate concentrations (>0.98 mM) were required for BH to achieve high levels ( Table 2; rules 65 and 66).
For flavonol and anthocyanin contents, only one model was generated in both cases, represented by the interaction genotype × organ, with the independence of culture media formulation (Table 1). In the case of flavonols, high values were only obtained in the case of aerial parts from BD (  Table 2; rules 79 and 80). Similar to flavonols, the low anthocyanin content was predominantly caused by the combination of roots from BT (Table 2; rule 84).
Finally, the flavanol content was mainly caused by the combination of genotype × organ and, secondarily by magnesium × organ (Table 1). BT was the most critical genotype associated with flavanol biosynthesis showing the highest contribution to the high flavanol content in roots (Table 2; rule 94) and a low content in aerial parts (Table 2; rule 93). Concerning the influence of magnesium, only high levels were observed under low magnesium concentrations (<0.85 mM) in roots (Table 2; rule 86).

Proposed Mechanism of Phenolic Compound Biosynthesis of Bryophyllum Plants Cultured In Vitro
The wide variety of interactions predicted between all the factors involved in the biosynthesis of phenolic compounds in Bryophyllum makes the interpretation of the obtained results difficult. As a solution, the generation of the NFL model provided valuable knowledge on this complex process, which is in accordance with the previously performed by ANOVA and OPLS-DA models, since all the outputs predicted by NFL modeling showed a significant influence of genotype, organ, and culture media composition, thus conferring strong evidence: the biosynthesis of phenolic compounds in these plants followed a genotype-and organ-dependent behavior, which was affected by mineral nutrition. Due to the complexity associated with the great number of rules given by the NFL, a graphical representation better represents all of the integrative information obtained. Thus, a proposed biosynthetic pathway, reflecting all the factors involved in the production of phenolic compounds for each species included in this study, is shown in Figure 4.
Similar patterns are shown for the biosynthesis of LMW compounds in all three species, whereas a differential behavior was observed for the rest of the phenolic subfamilies ( Figure 4). In the case of phenolic acids, they mostly accumulated in aerial parts, but copper played a pivotal role depending on the species, causing a positive effect on BD, whereas it was shown as an inhibitor on BH and BT. Concerning lignans and stilbenes, BH followed a differential pattern with respect to BD and BT, since lignans mainly accumulated in BH, whereas the mineral requirements for stilbene biosynthesis were contrary to those found for BD and BT. Flavonols mainly accumulated in the aerial parts of BD, together with anthocyanins, the latter being also present in the aerial parts of BH, whereas both subfamilies were present in low concentrations in BT. Finally, flavanols showed a characteristic pattern, since they mostly accumulated in the roots of BT, with magnesium playing a positive role, whereas in the case of BD, they accumulated in aerial parts and were inhibited by magnesium.
In consequence, the combination of UM with NFL emerged as a promising approach to characterize highly complex processes by providing exhaustive information that is easy to interpret. Our results clearly show that, although these three Bryophyllum species are closely related, a genotype and organ-dependent pattern was observed for the biosynthesis of phenolic compounds in Bryophyllum cultured in vitro, depending on the composition of culture media. Such results are the consequence of ML modeling of the experimental results (Figure 1), which displayed cryptic information that did not show clear patterns and, therefore, conferred scarce information, thus limiting the enormous potential offered by the phenolic profile obtained by the untargeted metabolomics approach.

Discussion
Phenolic compounds play an essential role in the therapeutic properties of Bryophyllum plants since they have been seen to be efficient antioxidant, cytotoxic, anti-inflammatory, and antimicrobial agents [6,30]. In recent years, these metabolites have attracted much attention from a biotechnological point of view due to their pleiotropic beneficial effects on human health [31,32]. The establishment of PTC constitutes a reliable biological platform for the perpetual production of industrially important bioactive compounds, largely exploited in the field of plant biotechnology [33,34]. PTC has been recently assessed as an efficient approach for achieving the phytochemical valorization of multiple Bryophyllum species, including B. daigremontianum (BD), B. × houghtonii (BH), and B. tubiflorum (BT) [6,35]. In fact, the establishment of PTC promoted the production of several phenolic compounds, such as anthocyanins, which have not been described in plants based on conventional breeding [6].
In this work, the above-mentioned species were selected because of their wide application in the traditional medicine for the treatment of several prevalent diseases, ranging from wound healing and cough alleviation to chronic diseases, such as diabetes, neoplastic, cardiovascular, and neurological diseases, among others [1,2]. Nonetheless, most investigations have been exclusively focused on the study of Bryophyllum pinnatum (Lam.) Oken [2], resulting in a gap of knowledge on the phytochemical valorization of BD, BH, and BT. Moreover, to date, the study of phenolic compounds of Bryophyllum has been limited to their identification and description, with phenolic acids and flavonols as the main polyphenols found in this subgenus [5,36,37]. Little is known about the biosynthesis of phenolic compounds in Bryophyllum plants, and untargeted approaches are required for rapid and robust metabolic profiling of unexplored plant species [38]. The application of UM revealed new subfamilies of phenolic compounds in Bryophyllum plants, such as tyrosols, coumarins, catechols, lignans, stilbenes, and flavanols, together with the previously described phenolic acids, flavonols, and anthocyanins, although all of these subfamilies were already identified in elicited plant cell suspension cultures (PCSCs) from BD and BH [39]. According to our results (Figure 1), all three species have been determined as a potent source of phenolic compounds from different subfamilies.
Regarding the OPLS discriminant analyses performed, a genotype-dependent biosynthesis of phenolic compounds was revealed (Figure 2A). Although the three species are genetically close [40], our results indicated a differential phenolic profile for each one, in agreement with previous results, where these species showed different patterns related to important physiological processes, such as organogenesis [41], mineral nutrition [35], and the production of phenolic compounds [18].
The biosynthesis of phenolic compounds also followed an organ-dependent pattern ( Figure 2B): anthocyanins, flavonols, and phenolic acids were the metabolites with the highest contribution to such discrimination. This compartmentalization may be a consequence of the physiological features associated with polyphenols since anthocyanins and flavonols usually accumulate in the aerial parts due to their role as protectants of the oxidative burst upon environmental stresses, UV-light absorbers, and pollinator attractants [42][43][44][45]. Our results are in agreement with previous reports, which determined that flavonols and anthocyanins predominantly accumulate in the aerial parts compared to roots [6,18]. Furthermore, the existence of specialized cell types within leaf tissues devoted to the storage of anthocyanins and other flavonoids, known as idioblasts, has been described in BD and BT [36,46].
The influence of genotype and organ on the biosynthesis of phenolic compounds, together with their interaction with nutrients, was assessed by the NFL predictive model ( Table 1). The accuracy of the NFL-based prediction was assessed by the coefficient of determination of the training dataset (training set R 2 ), together with the ANOVA parameters (F ratio > f critical), as described by Shao and co-workers [47]. Due this high predictability, the ML application enabled the identification of critical factors in the biosynthesis of each phenolic subfamily, thus conferring useful information that is easily understandable through the generation of the model rules (Table 2). Such a computer-based tool was successfully applied to predict the critical factors affecting the total phenolic and flavonoid contents of Bryophyllum cultured in vitro, revealing a significant influence of genotype and organs [16]. Concerning mineral nutrients, among the 18 different ions present in the universal Murashige and Skoog medium formulation [48], the NFL model only selected five as critical in the biosynthesis of phenolic compounds: sulfate, phosphate, calcium, magnesium, and copper (Table 1). This efficiency in the selection of mineral factors was previously demonstrated for other physiological processes, thus contributing to the optimization of PTC protocols [24,49,50].
With respect to the experimental design proposed, a reduction in both macronutrients and micronutrients from the universal Murashige and Skoog medium [48] was established (Section 4.2). Such a nutrient decrease was previously determined to exert a positive impact on the growth and multiplication of Bryophyllum cultured in vitro, motivated by their enhanced adaptation to arid regions where they are naturalized, with poor mineral accessibility that leads to low mineral requirements [35,51,52]. Furthermore, the reduction in mineral concentrations was already reported to promote an elicitor effect on Bryophyllum cultured in vitro [6,18], thus promoting an efficient strategy to assess the viability of this biotechnological system in the production of bioactive compounds. It must be considered that the modification of mineral concentrations may eventually have a significant effect on the buffer capacity, osmolarity, etc., thus promoting a possible effect on the modulation of phenolic compound biosynthesis that cannot be excluded.
Among the different subfamilies of phenolic compounds obtained by UM-mediated annotation, flavonols and anthocyanins were the only subfamilies that did not show a significant dependence on the mineral composition of the media employed, being predicted as a function of the interaction of genotype and organ ( Table 1). The same behavior was previously reported for hydroethanolic Bryophyllum extracts, in which a genotype-dependent content of both flavonol and anthocyanin glycosides was observed [6]. Due to the high plasticity that flavonols and anthocyanins exhibit on plant physiology depending on mineral nutrition [7], our results suggested that the biosynthesis of these phenolic compounds could be stimulated by other mineral compositions different from those performed in this work for Bryophyllum spp. cultured in vitro.
The NFL model predicted that the biosynthesis of LMW compounds, which includes mainly tyrosols, coumarins, and catechols, was mainly influenced by sulfate concentration and, secondarily, by copper and phosphate. Sulfate was required in high concentrations (>0.94 mM) to promote LMW biosynthesis, probably due to its role in alleviating the autotoxicity caused by the prooxidant effects associated with the overaccumulation of tyrosols, as was demonstrated for hydroxytyrosol [53]. In addition, copper sulfate was reported as an elicitor of the biosynthesis of the coumarin scopoletin in PCSCs of Angelica archangelica [54], thus revealing the role of sulfate in plant stress tolerance [55]. The effect of copper and phosphate identifed by the NFL model also agreed with previous observations: a minimal copper concentration is required for the biosynthesis of tyrosols since this metal ion constitutes part of the active center of copper amine oxidase, which catalyzes the generation of hydroxytyrosol from dopamine [56]. In contrast, low phosphate requirements were predicted to enhance LMW biosynthesis, thus suggesting an inhibitory role of this salt, in agreement with the results reported for the roots of Arabidopsis thaliana, where coumarin biosynthesis is controlled by phosphate deficiency [57].
In the case of phenolic acids, the predictive model identified copper, in combination with genotype and organs, as the only nutrient involved in their biosynthesis. In this sense, phenolic acids predominantly accumulated in the aerial parts of the three Bryophyllum species, but copper was suggested to play a positive role in BD, causing an inhibitory effect on BH and BT (Figure 4). Thus, the results found for BD agree with those found for other medicinal plants, such as Catharanthus roseus [58], and Raphanus sativus [59], in which copper promoted the accumulation of phenolic acids. Such influence is driven by the copper-mediated stimulation of nitric oxide production, which acts as an inductor of phenylalanine ammonia lyase, driving the transformation of phenylalanine into cinnamic acid [58,60].
Lignan biosynthesis was predominantly found in BH, and it was enhanced by high sulfate concentrations, according to the predictive NFL model (>1.11 mM; Figure 4). The impact of sulfate, as a sulfur-containing ion, on lignan biosynthesis was already studied in Linum album hairy roots [61]. The authors proved that sulfur-containing signaling molecules, such as hydrogen sulfide, regulate the shift between lignan and flavonoid biosynthesis. This shifting behavior of copper might also play a role as a master regulator of lignan biosynthesis in Bryophyllum plants, due to the differential effects found on BH against its parental species, BD and BT.
Stilbene biosynthesis was predicted to be mainly affected by calcium concentration together with genotype and organ (Table 1). In this case, the genotypes showed the same pattern for lignans, since the mineral requirements for BD and BT were the same, being opposite to those predicted for BH ( Figure 4). Thus, in the case of BD and BT, high calcium concentrations (>1.68 mM) were shown to enhance stilbene biosynthesis. Such an observation could be explained on the basis of the role of stilbenes as calcium complexing agents, thus providing evidence of the role of this subfamily of phenolic compounds as metal ion scavengers [62]. Moreover, this intraspecific differential role of calcium in stilbene biosynthesis was reported in Vitis spp., since calcium promoted stilbene biosynthesis in PCSCs of Vitis amurensis by stimulating stilbene synthase (STS) expression, via induction of calcium-dependent kinases [63], whereas calcium did not affect STS in PCSCs of Vitis vinifera [64].
Finally, flavanols constituted the last subfamily of phenolic compounds potentially affected by mineral nutrients in Bryophyllum spp., exhibiting a genotype-dependent accumulation of these compounds, as predicted by the NFL model (Table 1): high values for the flavanol content were observed in the roots of BT, mainly, and in the aerial parts of BD (Table 2). In addition, magnesium was found to exert a positive role on the flavanol accumulation in roots, showing an inhibitory effect on aerial parts (Figure 4). Since flavanols are the major phytoconstituents found in tea, the influence of magnesium on their production has been analyzed. The beneficial effects of magnesium on flavanol biosynthesis have been thoroughly investigated, being considered a relevant factor [65]. Thus, the exogenous soil addition of magnesium in open field experiments promoted the production of flavanols in black tea, via amino acid transferase induction [66]. In addition, molecular studies indicated that the metal complexing properties of catechins efficiently promoted the formation of stable complexes with magnesium [67], providing evidence of the role of magnesium as a regulator of catechin biosynthesis. On the other hand, the improved flavanol biosynthesis predicted for BT roots may be supported by the observations found for Centaurea maculosa, in which (-)-catechin is present in roots exudates developing an allelochemical effect responsible for the invasiveness of this species [44,68]. Since BT, together with other Bryophyllum species, is considered an invasive species [69], this enhanced production of flavanols by roots may contribute to the enhancement of such invasive mechanism. Greenhouse-grown plants were selected as the source of epiphyllous plantlets. Thus, plantlets from the three species were collected, subjected to surface sterilization, and transferred to in vitro conditions, following the previously described protocol [70]. After disinfection, plantlets were cultured in pairs in glass culture vessels containing 25 mL of previously autoclaved Murashige and Skoog medium [48], supplemented with 3% (w/v) sucrose and solidified with 0.8% (w/v) agar at pH = 5.8. Then, plant cultures were randomly placed in growth chambers under a photoperiod of 16 h light and 8 h dark at 25 ± 1 • C. Periodical subcultures were performed every 12 weeks, by transferring new epiphyllous plantlets to fresh culture media.

Experimental Design
An experimental design for three variables at three, two, and seven levels was established: genotype (BD, BH, BT species), organ (aerial parts or roots), and culture media (7 formulations), resulting in a total of 42 treatments.
Seven culture media formulations, derived from Murashige and Skoog medium [48] were used for the nutrition experiment (Table 3). Murashige and Skoog-derived formulations contained a reduced content of both macronutrients (M) and micronutrients (µ) [35]. Treatments consisted of a half-strength medium (1/2MSM and 1/2MSµ), a quarter-strength medium (1/4MSM and 1/4MSµ), and an eighth-strength medium (1/8MSM and 1/8MSµ). Full-strength Murashige and Skoog medium was used as a control. To prevent additional interactions, EDTA-chelated iron, vitamin, and organic molecule concentrations were maintained for all the treatments as in the original formulation. All media were supplemented with 3% (w/v) sucrose and solidified with 0.8% (w/v) agar at pH = 5.8. Epiphyllous plantlets with their own root system were selected from 12-week-old Murashige and Skoog-grown plants. The growth conditions were the same as previously described. Cultures were maintained in the same culture media formulation for successive four subcultures every 12 weeks. Plantlets were cultured in pairs, using 10 glass culture vessels per media formulation, accounting for a total of 20 replicates per treatment and species. After each subculture, plants were divided into aerial parts and roots and were separately stored at −20 • C until use.

Sample Preparation and Extraction
Collected plant materials were frozen-dried and powdered to obtain fine particles that were stored at −20 • C until extraction.
Sample extraction was performed using the solvent mixture MeOH:HCOOH:H 2 O (80:0.1:19.99) at a final concentration of 50 mg mL −1 . The mixture was homogenized by a high-speed rotor (Polytron PT 1600-E) for 2 min and centrifuged at 8000 g for 10 min at 4 • C (Eppendorf 5810R, Hamburg, Germany). Supernatants were collected and filtered throughout syringe filters (pore size: 0.22 µm). Finally, extracts were transferred to vials and subsequently analyzed or stored at −20 • C until use.

Phenolic Profiling Using Untargeted Metabolomics
Phenolic compounds were profiled through an UM approach based on UHPLC-QTOF/MS, as previously reported [71,72]. Briefly, reverse phase chromatographic separation was achieved using a water-acetonitrile gradient, and then compounds were detected in SCAN mode (100-1200 m/z) at a nominal resolution of 40,000 FWHM. Quality controls were prepared by pooling each sample and were analyzed under the same chromatographic conditions, with acquisition using data-dependent tandem mass spectrometry [73].
The annotation of phenolic compounds was carried out using the Profinder B.07 software tool (Agilent Technologies), following mass (5-ppm tolerance) and retention time (0.05 min) alignment, as previously reported [71,74]. For this aim, the database exported from Phenol-Explorer 3.6 [75] was used, and annotation used the whole isotopic pattern of aligned features (namely, the monoisotopic accurate mass, isotopic ratio, and isotopic accurate spacing) [71,76]. Compounds were filtered by abundance (signal-to-noise >8) and by frequency (only the features annotated in 75% of replications within a treatment were used). A further step of annotation was then carried out in MS-DIAL 4.48 from tandem MS information, using publicly available MS/MS experimental spectra (Mass Bank of North America) and MS-Finder 3.50 for in-silico fragmentation (using Lipid Maps, FoodDB, and PlantCyc). The list of MS/MS compounds annotated is provided in the Supplementary (Table S1). Overall, compound annotation was done under Level-2 identification (putatively annotated compounds, COSMOS Metabolomics Standard Initiative) [77]. Total ion chromatograms were included in Figures S2 and S3.
Finally, identified phenolic compounds were classified into different subclasses and quantified using appropriate calibration curves for a reference standard per class. Results were expressed as equivalents of the reference compounds in mg/g of sample: cyanidin was selected for anthocyanins; catechin for flavanols; quercetin for flavonols; luteolin for flavones and other related flavonoids (flavanones and chalcones); sesamin for lignans; tyrosol for low-molecular-weight phenolics (LMW, including tyrosols, phenolic terpenes, quinones, coumarins, alkylphenols, and other phenylpropanoids); ferulic acid for phenolic acids; and resveratrol for stilbenes. Results were expressed as tyrosol equivalents (TE) for the LMW content, ferulic acid equivalents (FE) for the phenolic acid content, sesamin equivalents (SE) for the lignan content, resveratrol equivalents (RE) for the stilbene content, luteolin equivalents (LE) for the flavone content, quercetin equivalents (QE) for the flavonol content, cyanidin equivalents (CyE) for the anthocyanin content, and catechin equivalents (CaE) for the flavanol equivalents.

Statistical Analysis
Metabolomic profiling was performed with raw data using the software Agilent Mass Profiler Professional B.12.06. The data normalization was performed as previously indicated [78]: compounds were filtered according to their abundance and frequency, normalized at the 75th percentile, and baselined to the median of all samples. Bonferroni multiple testing correction was adopted in multivariate analyses. The obtained dataset was then exported to the software SIMCA 16 (Umetrics, Malmo, Sweden) for orthogonal projection to latent structures discriminant analysis (OPLS-DA). The cross-validation (CV) of the OPLS-DA model generated was developed using CV-ANOVA (α = 0.05), and its fitness and prediction ability were evaluated by the goodness-of-fit R 2 Y and goodnessof-prediction Q 2 Y parameters, respectively. Finally, to determine the most discriminant compounds, a variable importance in projection (VIP) analysis was performed setting a threshold VIP score >1.
Moreover, in order to assess the influence of genotypes, organs, and culture media formulations and their interactions on the production of phenolic compounds, a factorial ANOVA was performed, using the software STATISTICA v. 12 (StatSoft). The significance level was adjusted at α = 0.05.

Modeling Tools
After data collection, all experimental values were included in a database (Table S5) in which salts from culture media formulations were split into their containing ions to avoid ion confounding [79]. Consequently, 18 factors were selected as inputs for modeling: genotype, organ, and 16 ion concentrations from all media formulations tested. The parameters derived from phenolic quantification, including eight subclasses, were selected as outputs.
Data modeling was carried out using FormRules ® commercial software (v. 4.03; Intelligensys LTD, Cheshire, UK), as previously described [33]. The training parameters used for model establishment were described as follows: the Adaptive Spline Modeling of Data (ASMOD) was selected for the parameter minimization, as it improves model accuracy by reducing its complexity [80], with a ridge regression factor of 1 × 10 −6 . FormRules ® software includes several fitness criteria, such as Cross Validation (CV), Leave One Out Cross Validation (LOOCV), Bayesian Information Criterion (BIC), Minimal Description Length (MDL), and Structural Risk Minimization (SRM). All were tested in this study and the best fitted result, which provided the simplest and most intelligible rules with minimum generalization error, was SRM [35]. The rest of the training parameters were: C1 = 0.884, C2 = 4.8; number of set densities: 2; set densities: 2, 3; adapt nodes: TRUE; maximum inputs per submodel: 4; maximum nodes per input: 15. Thus, the model was divided into submodels in order to achieve an easier interpretation of results by the generation of "IF-THEN" rules [50,81]. Independent models were developed for each output, and a model assessment criterion was selected to avoid data over-fitting [22,82,83]. The application of NFL confers an advantage as a knowledge-obtaining tool since the predicted values for the inputs were expressed by words, ranging from low to high, combined with a corresponding membership degree, which takes values between 0 and 1 [29]. Furthermore, the predictive models for each output were quality-assessed in terms of the coefficient of determination of the training set (training set R 2 ), expressed as a percentage given by Equation (1) [47], and mean square error, MSE, given by Equation (2).
where y i is the experimental value from the dataset, y i is the value predicted by the model, and y i is the mean value of the dependent variable. Acceptable predicted values given by training set R 2 range between 70-99.9% since higher values indicate model overfitting [22,84]. MSE represents the random error component associated with the built model, providing insight into model prediction due to a smaller incidence of random error [21,35]. Finally, to assess model accuracy, ANOVA was performed to check statistical differences between experimental and predicted data.

Conclusions
In this work, a combinatorial approach including three cutting-edge technologies, plant tissue culture, untargeted metabolomics, and machine learning, was established to gain insight into the biosynthesis of phenolic compounds of medicinal plants responsible for their associated therapeutic properties. The results indicate that Bryophyllum plants can be considered a promising source of phenolic compounds including the previously identified, flavonols, phenolic acids, and anthocyanins, together with new subfamilies reported for the first time in these species: tyrosols, catechols, lignans, stilbenes, flavones, flavanones, and flavanols. The knowledge derived from this investigation contributes to the phytochemical valorization of these unexplored medicinal plants. At the same, it may facilitate their exploitation as a natural source of bioactive compounds, promoting the large-scale application of Bryophyllum by-products to different biotechnological sectors with limitless purposes in food, cosmeceutical, and pharmacological industries. In addition, thanks to the robustness and plasticity of this multidisciplinary approach, the workflow proposed here can be applied to a plethora of poorly characterized plant species with medicinal potential, thus conferring a rapid and reliable methodology to provide insight into their biosynthetic capacity. In fact, the robustness and high performance associated with the combination of UM and ML presents limitless applications, thus opening new perspectives in the field of natural products research, facing the introduction of uncharacterized plant sources as efficient biofactories of health-promoting compounds of natural origin at an industrial level. The use of NFL as a predictive ML tool confers useful information about the key factors involved in complex processes, as was demonstrated here for the biosynthesis of phenolic compounds. This predicted information should be further validated in order to assess the knowledge conferred by this deep learning approach. Additionally, the application of other ML tools, such as genetic algorithms, will contribute to the computer-based optimization of such a multifactorial process. In this sense, this multidisciplinary strategy has proven extremely useful to improve the current paradigm of plant biotechnology, facilitating the knowledge on the production of phenolic compounds by medicinal plants, with the ability of being easily applied to economically important sectors, as is the case for agricultural, food, and related industries.  Table S1: sheet 1, Dataset of annotated compounds; sheet 2, list of compounds annotated by MS/MS; Table S2: Statistical parameters associated with the factorial ANOVA performed for each subfamily involved in the semiquantification of phenolic compounds. df, degrees of freedom; SS, the sum of squares; MS, mean squares.; Table S3: Metabolites with the highest contribution to discrimination between Bryophyllum genotypes, according to the OPLS-DA predictive model, followed by VIP selection method. Metabolites are grouped into their subfamilies and are accompanied by their VIP score and standard error; Table S4: Metabolites with the highest contribution to discrimination between organs: aerial parts and roots, according by the OPLS-DA predictive model, followed by VIP selection method. Metabolites are grouped into their subfamilies and are accompanied by their VIP score and standard error; Table S5: Dataset subjected to NFL modeling.