1. Introduction
Syzygium samarangense, commonly known as wax apple, jambu air, water apple, or bell fruit, is a nonclimacteric tropical fruit plant that has been cultivated in Malaysia and other neighboring countries such as Thailand, the Philippines, Vietnam, and Taiwan [
1]. The three major
S. samarangense cultivars are Giant Green, Masam Manis Pink, and Jambu Madu Red [
2]. Traditionally, it is a medicinal plant: various parts are used to treat some health problems such as edema, cracked tongue, asthma, diarrhea, bronchitis, fever, ulcer, sore throat, and to reduce blood pressure [
3,
4]. Additionally, the Giant Green cultivar contains an abundance of valuable phytochemicals such as phenolic acids, flavonoids, anthocyanins, and carotenes [
5], which show antioxidant, antibacterial, antidiabetic, anticancer, and anti-inflammatory activities [
6,
7,
8].
Metabolomics is the comprehensive analysis of a metabolite profile, either as a targeted or global application in drug discovery, phytomedicine, toxicology, and disease development [
9]. Various spectroscopy and chromatography techniques are applied to detect and characterize the presence of metabolites in experimental samples [
10]. Two analytical techniques for this task are Fourier transform infrared spectroscopy (FTIR) and gas chromatography–mass spectrometry (GC-MS). FTIR is an important technique used to identify the type of functional groups present in a compound. It also is a useful spectroscopic tool for profiling and fingerprinting molecular structures because it is non-destructive, simple to use, quick, and accurate [
11]. The common absorption range used in plant studies is the mid-infrared (mid-IR) range. In the mid-IR range, infrared radiation is passed through a sample with a range of absorbance from 4000 cm
−1 to 400 cm
−1 [
12]. Not all of the infrared radiation is absorbed by the sample: some of it passes through the sample and is transmitted to a detector. The resulting spectrum represents the molecular absorption and transmission, creating a molecular fingerprint of the sample. GC-MS is the most commonly used instrument for the separation and identification of compounds, especially in the drug discovery, pharmacology, and food industry fields [
13]. The advantages of GC-MS are its low viscosity, higher sensitivity, rapid mass transfer velocity, and high resistance, so it has been widely used in chemical fingerprinting [
14].
However, the abundance of metabolites present in plants poses challenges: analyzing them precisely without using a comprehensive method is difficult. Therefore, for several decades, many researchers have applied chemometric analyses coupled with spectroscopy and chromatography techniques to analyze the metabolites present in medicinal plants. Unsupervised multivariate analysis (MVDA) including principal component analysis (PCA); hierarchical clustering analysis (HCA); and supervised MVDA, including partial least squares discrimination analysis (PLS-DA) and partial least squares (PLS), are required to handle the huge dataset of the whole spectra recorded from plant samples. For example, Wijayanti et al. [
15] successfully classified and discriminated the
Curcuma xanthorrhiza from different regions using PCA and PLS-DA tools. Basyirah et al. [
16] used PCA and HCA to classify and discriminate
Heterotrigona itama propolis using different extraction methods (maceration, sonication, and Soxhlet). Additionally, PLS correlated the antioxidant activity and chemical contents of five varieties of Pegaga (Centella) extract [
17]. From these studies, it can be concluded that chemometric analysis coupled with spectroscopy or chromatography techniques is a reliable tool that can be used in the metabolomics field.
Judging from the literature, it can be concluded that the metabolites in plants can be identified using spectroscopy and chromatography techniques. Additionally, the discrimination between experimental samples and the relationship between metabolites and biological activity can also be determined using multivariate data analysis (MVDA). Hence, this work aimed to discriminate the leaves of Syzygium samarangense cv. Giant Green at different stages of maturity and to correlate these maturity stages with their antioxidant and alpha-glucosidase inhibitory activities using FTIR- and GCMS-based metabolomics coupled with chemometrics. The findings help with identifying the most promising stages of Giant Green leaves to be used in pharmaceuticals.
4. Discussion
In the present study, the fingerprints of old, mature, and young leaves were found to be similar but slightly different in peak intensity and this might be due to the presence of the same type or quantity of metabolites. The similarities between the absorbance peaks also are related to the insignificance among leaf stages in antioxidant and antibacterial activities reported in our previous study. The peak at 1710 cm
−1 which is assigned to the C=O in the phenolic group is believed to have contributed to these bioactivities. Besides, this is in agreement with the finding by Easmin et al. [
24], where they found that FTIR spectra for ethanol and water extracts of
Phaleria macrocarpa fruit look similar because of the similarity in their chemical composition. Besides, the variation of peak detected between the leaf samples also might be related to different levels of enzyme activities for each maturity stage [
21]. In addition, Kharbach et al. [
29] reported that the resulting compound fingerprint is mostly related to plant maturity, variation of season, and location of geographic. However, the comparison among spectra only cannot provide the final conclusion about the specific fingerprint that contributed to variations between ‘Giant Green’ leaves at three maturity stages. So, the data of FTIR was further analyzed and subjected to chemometric analysis.
Principal Component Analysis (PCA) is unsupervised multivariate data analysis (MVDA) that is used to reduce the dimensional large dataset and at the same time has preserved important information. The most important information from the dataset is explained in PC1 and the second most important information is explained in PC2. The score plot was used to differentiate among the samples and the loading plot was used to determine the variable contributed to the samples cluster. In the present study, the PCA was established to find the relationship between ‘Giant Green’ leaves at three maturity stages and identify the functional group that contributed to the sample separation. The PCA result showed that the young leaves were clearly separated from mature and old leaves. But one biological replicate from mature leaves (ML1) is located near the old leaf samples. This might be why some metabolites in ML1 are also present in old leaf samples or their quantity is almost similar. Some of the previous studies also found no defined cluster between
Ficus deltodeia syconia varieties [
30], cabbage cultivar [
31], and
Ipomoea aquatica [
32] because of the identicalness of their chemical contents. Besides,
Eugenia uniflora leaves showed a clear distance of cluster between the different fruit color biotypes due to their distinctive volatile compounds [
33].
Unsupervised Hierarchical Cluster Analysis (HCA) is complementary to PCA analysis. HCA was applied to determine the similarities and dissimilarities between the individual experimental samples. The sample with similar in investigated variable matched in the same cluster but the sample showed the highest dissimilarity was arranged in other clusters. The position of the cluster in the dendrogram also takes into account the far position among clusters that shows the highest dissimilarity between the individual samples [
34]. The results obtained from this study demonstrate that the leaf extracts at three maturity stages were arranged in three different clusters and might be influenced by metabolite biosynthesis. Lee et al. [
35] reported that metabolite presence varies at the young, mature, and old stages of
S. samarangense cv. pink leaves. The HCA model also revealed that most leaf extracts present in the same maturity stages formed a homogeneous cluster. It was expected that the samples with the same maturity stage was similar because they consisted of metabolites of the same quality and quantity. However, one of the mature leaves (ML1) deviated away from other mature leaf samples, but arranged in cluster 1 belonging to old leaf samples, indicating that the metabolite can also develop differences among leaves at the same maturity stage. This phenomenon also might be affected by environmental factors, cultivar practices, plant ages, and soil factors of wax apple cultivar. The environmental factors such as temperature, light intensities, and climatic change influence the changes of metabolites in plants [
36].
Partial Least Square-Discriminant Analysis (PLS-DA) is a supervised multivariate data analysis (MVDA) tool that is gaining more interest, especially in the analysis of metabolomics data. PLS-DA has the capability to improve the classification of experimental data that cannot be achieved by using PCA. Unlike PCA, PLS-DA is focused on class reductive in achieving the separation between the samples. In this study, PLS-DA is performed to discriminate the ‘Giant Green’ leaves at three maturity stages based on their FTIR dataset. In good accord with PCA and HCA analysis, one of the samples from the mature leaf stage that is ML1 cannot clearly be separated from the samples of the old leaf stage might be due to the similarities of metabolites in both samples. However, the other samples from young, mature, and old leaves were well improved in their separation than in PCA. The prediction ability of this PLS-DA model in the classification of the leaf samples with different maturity stages has been validated by the achievement of a 93.33% score in the confusion matrix. The maximum data among class (maturity stage) were collected in PLS-DA analysis, making the variable that discriminates in this model may be unlike those with PCA. Most of the discriminative variables detected in FTIR spectra contributed to the classification of young, mature, and old leaves according to variable importance to projection (VIP) coefficients. The highest VIP score represented the stronger variable attributed to the clustering of the sample. The peaks at 2857, 2928, 1709, and 924 cm
−1 were mostly related to describing the differences between the maturity stages of leaf samples. The previous literature reported that the asymmetrical (2970 cm
−1) and symmetrical (2856 cm
−1) of the methylene group (CH
2) and C=H bond (980 cm
−1) in the FTIR spectrum were commonly related to flavonoid structure [
37]. Then, the peak around 1718 cm
−1 could be attributed to the presence of the ester compound [
26]. This finding revealed that the flavonoid and ester compounds could be the largest influencer in separating between maturity stages in leaf samples. So, it can be concluded that the variation of metabolites could be attributed to discrimination between three maturity stages of ‘Giant Green’ leaves. In good accordance with the previous study, as reported by Lee et al. [
35], the chemical compounds such as terpene and terpenoid of wax apple leave cv. pink changes during the maturation stages. Other than that, Gouvinhas et al. [
38] reported that the oil from three stages of olive fruit was successfully discriminated by using supervised MVDA. They found that the changes in biochemicals happened with the ripening stages of the olive fruit. Considering the discrimination explained in the PLS-DA model, it is proven that the ‘Giant Green’ leaves were well-classified according to their maturity stages than using PCA.
Partial Least Square (PLS) belongs to supervised MVDA where the Y-axis represented a dependent variable and X-axis represented an independent variable in the PLS model. PLS is used to find the correlation between the two variables that are generated from the dataset of spectroscopic or chromatographic analysis and bioactivity. The validation and prediction of the goodness of the model are evaluated based on R
2Y (variance explained in predictor variable), R
2X (variance explained in response variable), and Q
2Y (variance predictive of the goodness of fit according to cross-validation). A cross-validated correlation coefficient (Q
2) value higher than 0.5 indicates a good PLS model. In the current study, the relationship between FTIR spectra absorbance (wavenumber) with biological activities such as antioxidant (DPPH, NO, and ABTS) and alpha-glucosidase inhibitory activity were investigated. However, the information accessed from FTIR was limited because it just provided a clue about the class of metabolite but the specific metabolite that is responsible for activeness in biological activities is still unknown. The relationship between biological activity (Y-axis) and wavenumber (X-axis) of leaf samples at three maturity stages was illustrated in the bi-plot. Bi-plot was the combination of a score plot and a loading plot. Based on the present results, the bi-plot of leaf extracts revealed that the Y-variables (DPPH, NO, ABTS, and alpha-glucosidase) were located near the sample of the young leaf stage. It revealed that young leaf samples were highly correlated with biological activities. The strongest peaks were obtained at 2857, 2928, and 3300 cm
−1 which possessed the highest value in the loading plot and VIP score and may be responsible for antioxidant and alpha-glucosidase inhibitory activities of young leaves. The peak at 2857 cm
−1 and 2928 cm
−1 may be due to methylene stretching of asymmetrical and symmetrical vibration in methoxyl derivative and aldehyde group, and at 3300 cm
−1 may be assigned to intermolecular hydrogen bond in alcohol, phenol or carboxylic acid. These peaks showed that the possibility of primary metabolites such as carbohydrates, proteins, lipids, and polysaccharides, and secondary metabolites such as phenolic acids, flavonoid, terpenes, and terpenoids were present in the leaf sample. In good accordance with a previous study as reported by Christou et al. [
39] where they found that the most important peaks in the FTIR spectrum were at the 4000–2500 cm
−1 which indicates the presence of carbohydrate, protein, lipid, and polysaccharide groups. Saidan et al. [
40] also revealed that the sharp peak in the range of 1760–1600 cm
−1 may be characterized by the presence of flavonoid and terpenoid groups. In addition, the leaves of
S. samarangense have been reported with an abundance of valuable metabolites such as quercetin, ellagic acid, myricetin, lupeol, sitosterol, triterpenes, betulin,
p-cymene α-pinene, β-pinene and limonene [
41]. These metabolites have been proven to have a significant effect on bioactivities such as antioxidant and alpha-glucosidase inhibitory activities [
41,
42,
43].
Gas chromatography (GC) is the most intensive instrument used for separation of compounds in a mixture [
44]. It becomes the crucial tool in identification of compounds especially in drug discovery or pharmacology and food industry fields. In this study, 37 compounds were detected in three maturity stages of ‘Giant Green’ leaf extracts. However, only six major compounds were identified and present at the same retention time in all of the leaf extracts. The variation of metabolites in the samples may influent their potency in biological activities. Thus, the strongest antioxidant, antibacterial and alpha-glucosidase activities of ‘Giant Green’ leaves in our previous study [
18] could be related to the greater number of metabolites present in each of the leaf extracts. Some of the metabolites from classes of phenolic, triterpenes, ester, alkane, and carbohydrate have been given more attention by researchers because these metabolites can exhibit various pharmacological activities [
45,
46,
47]. Previous literature had reported that alkane-based compounds such as tetradecane, hexadecane [
48], and nonadecane [
49] showed antibacterial and antifungal effects. The presence of metabolites such as methyl benzoate; methyl (9Z,15Z)-9,15-octadecadienoate [
50,
51], diethylene glycol dibenzoate [
52] and 9-Eicosene [
53] also have potent antibacterial activity. Other than that, Saleh et al. [
46] reported that the metabolites based on fatty acid, organic acid, phenolic acid, carbohydrate, alkane, and sterol may possess alpha-glucosidase inhibitory activity. Fatty acids such as palmitic acid and stearic acid were known to exhibit potent alpha-glucosidase inhibitory activity [
54,
55] as well as possess strong antioxidant and antibacterial, antitumor, anticholesteremic, immunostimulant properties and anti-inflammatory activities [
56,
57]. Another metabolite that had the strongest alpha-glucosidase inhibitory activity is phytol [
54,
58]. Phytol is an acyclic diterpene alcohol and is commonly produced through the degradation process of the plant cell wall [
54]. The same metabolite also was reported by other researchers to inhibit the strongest antimicrobial, antioxidant, antinociceptive, and anticancer activities [
59,
60]. However, the other metabolites found in this analysis might not yet be described in detail by previous literature. Hence, this study revealed that GC-MS is an efficient tool to profile the untargeted peak of the ‘Giant Green’ cultivar of wax apple leaf samples. However, the huge dataset which was obtained from hundreds of peaks of GC-MS analysis provided a barrier to providing a significant conclusion in terms of specific metabolites that contribute to discrimination between ‘Giant Green’ leaves at three different maturity stages. Thus, a more manageable size of GC-MS data was obtained by chemometric analysis that applied multivariate data analysis (MVDA).
PCA is performed to reduce the dataset aiming at the structuring of data and clustering of experimental samples. PCA detected the similarities between the samples and classified them into similar clusters. In this study, the ‘Giant Green’ cultivar of
S. samarangense leaves did not provide good separation between their maturity stages. The grouping in PCA is based on the strength of variables in the loading plot on PC1 and PC2 axis. Similarly, our findings agreed with the work of Steingass et al. [
21]. In their study, one of the green-ripe pineapple fruit did not match with other samples with the same maturity stage and the authors ascribed the variation due to the development of metabolites among the individual fruits that were different even at the same maturity stages. However, our result contradicted a previous study as reported by Maamoun et al. [
61] in which there was clear discrimination between two stages of the ripening stage of
Luffa egyptiaca Mill fruit. They noted that young fruit exhibits a negative score along PC1 and old mature fruit exhibits a positive score along PC2. Zhang et al. [
62] also found a good separation between the three stages of tobacco leaves. The rosette and vigorous growth stages are located along PC1 whereas the mature leaves are located along PC2. The accumulation of certain compounds such as nicotine, sucrose, D-glucose, L-proline, D-fructose, quinic acid, glyceric acid, L-threonic acid, inositol, and DL-malic acid at various quantities in tobacco growth stages were indicated may contribute to this separation. Hence, it can be concluded that the variation and quantity of metabolites in each of the experimental samples played a significant role in the discrimination between them.
Complex chemical reactions occur at each of the maturity stages of plants suitable for their growth and cell development process. So, this process automatically changes the composition of metabolite in the plant. Hierarchical cluster analysis (HCA) is an unsupervised MVDA used to identify the natural grouping between the plant samples characterized by the values of a set of measured properties [
63]. The similarity and dissimilarity of the entire set of samples are displayed in the HCA dendrogram. The results revealed that ‘Giant Green’ leaves were discriminated into three clusters, similar to the results in the PCA. However, each of the clusters did not represent the different maturity stages of leaf samples as expected. It showed that the data as accessed from GC-MS analysis was not able to well-discriminate between ‘Giant Green’ leaves at three maturity stages. Many factors could be influencing this result such as the similarities of metabolites in each of the samples, location of sampling, and biological replication of samples [
30,
64,
65]. Despite this fact, all of the samples and their biological replicates were collected at a similar location, which has been attributed to the slight differences between metabolites as compared to those samples collected from other locations.
Partial Least Square-Discriminant Analysis (PLS-DA) is further adapted from the unsupervised classification of PCA. The supervised PLS-DA model was applied to investigate the metabolites that contributed to discrimination between ‘Giant Green’ leaves at three maturity stages. Its results were not in accordance with previous PCA and HCA results where all stages of leaves improved their separation from each other. The validation of the model was also proven with 100% of the confusion matrix result. The metabolites involved in this separation were confirmed with variable importance in the projection (VIP) values. The twenty-four metabolites were identified that consisted of VIP values greater than 0.85 in leaf extracts. From the results, it showed that the variation of metabolites from the groups of alcohol, ester, alkene, alkane, and ketone were involved in the discrimination of leaves (young, mature, and old leaf stages) samples. However, the understanding of factors that influence the discrimination among samples was very complex. Some researchers revealed that the factors of climate, soil, temperature, maturity stage, irrigation, and fertilizer vary the composition of metabolites in plants [
31,
36,
66]. According to Yunusa [
30], two possible factors responsible for the separation between the samples are the particular metabolite presence in all samples but different in concentration, and undetected particular metabolites in certain samples. All of these factors also affected the results of PLS-DA analysis. In addition, it was also expected that the PLS-DA model showed better performance in the classification of ‘Giant Green’ leaves at three maturity stages than PCA since PLS-DA was most effective in discriminating the samples based on their similarities and dissimilarities of metabolite profile.
Partial least square (PLS) is applied to find the correlation between the biological activities (antioxidant and alpha-glucosidase) and metabolites in three maturity stages of ‘Giant Green’ leaves. From the PLS bi-plot, the young leaves were located near the bioactivities. This finding confirmed the biological activity results, which showed that the sample from young leaf stages had the highest activity compared to samples from other stages. Based on the VIP score, the metabolites contributing the highest to these activities in leaf extract along PC1 were fatty acid (methyl palmitate, 18) and alkane-based compound (tetradecane, 4; decane, 1; cyclotetradecane, 3). This result was consistent with previous literature showing that the metabolites from fatty acid and alkane derivatives possessed antioxidant and alpha-glucosidase activities [
46,
54,
67]. Anh et al. [
68] found that the methyl palmitate presence in
Clausena indica fruit possesses potent antioxidant and antidiabetic activities. Another study as reported by Murugesu et al. [
54], revealed that methyl palmitate and pentadecanoic acid presence in
Clinacanthus nutans Lindau leaves inhibited the stronger alpha-glucosidase inhibitory activity. The alkanes, fatty acid, methyl ester, and aromatics chain in the essential oil of
Daphne mucronata Royle leaves and stems showed good antioxidant and antibacterial activities [
69]. Considering the relationship as described in the PLS model, it was proven that the metabolites and biological activities in the ‘Giant Green’ leaves were well-correlated. Then, young leaves also showed the strongest relationship with antioxidant and alpha-glucosidase inhibitory activities than mature and old leaves.