Metabolome Mining of Curcuma longa L. Using HPLC-MS/MS and Molecular Networking

Turmeric, Curcuma longa L., is a type of medicinal plant characterized by its perennial nature and rhizomatous growth. It is a member of the Zingiberaceae family and is distributed across the world’s tropical and subtropical climates, especially in South Asia. Its rhizomes have been highly valued for food supplements, spices, flavoring agents, and yellow dye in South Asia since ancient times. It exhibits a diverse array of therapeutic qualities that encompass its ability to combat diabetes, reduce inflammation, act as an antioxidant, exhibit anticancer properties, and promote anti-aging effects. In this study, organic extracts of C. longa rhizomes were subjected to HPLC separation followed by ESI-MS and low-energy tandem mass spectrometry analyses. The Global Natural Product Social Molecular Networking (GNPS) approach was utilized for the first time in this ethnobotanically important species to conduct an in-depth analysis of its metabolomes based on their fragments. To sum it up, a total of 30 metabolites including 16 diarylheptanoids, 1 diarylpentanoid, 3 bisabolocurcumin ethers, 4 sesquiterpenoids, 4 cinnamic acid derivatives, and 2 fatty acid derivatives were identified. Among the 16 diarylheptanoids identified in this study, 5 of them are reported for the first time in this species.


Introduction
Curcuma longa L., a member of the family Zingiberaceace, is commonly referred to as turmeric and is a perennial rhizomatous herbaceous plant, native to and cultivated in the tropical region of Southeast Asian countries. It has been a part of South Asian culture, used for coloring, preservatives, and as a spice for more than 4000 years. It is a popular ingredient in traditional medical practices, such as Siddha, Ayurveda, and Unani, commonly used as a natural remedy for various health conditions [1]. C. longa L. is rich in secondary metabolites and known to have antidiabetic, anticancer, antioxidant, anti-inflammatory, antibacterial, antifungal, antiviral, cardiovascular, and neuroprotective activities [2][3][4].
Diarylheptanoids and sesquiterpenoids are the major metabolites found in C. longa L. Diarylheptanoids are a distinct group of natural products comprising a heptane core structure with two phenyl rings at the one-and seven-positions. Due to the distinct characteristics of these compounds, various researchers have thoroughly investigated their therapeutic potential. The pharmacological activity of diarylheptanoids may be attributed to the high degree of flexibility in their core chemical structure and the presence of few hydroxyl or ketone functionalities, thus making them tolerant to biological molecules [5].
Curcuminoids, a subclass of diarylheptanoids, include curcumin and its derivatives such as bisdemethoxycurcumin and demethoxycurcumin, which are natural phenols with therapeutic potential [6,7]. Curcumin, one of the most abundant curcuminoids with a long Metabolites 2023, 13, 898 2 of 20 history of medicinal importance, is present in the Curcuma species. It is found in high concentration in the rhizomes, making up about 3-6% of the dry weight [8].
The metabolic composition of plants may change in response to various physiological and environmental factors, and may also be influenced by their genetic makeup [9]. To analyze and compare all biological metabolites with a molecular mass up to 1500 Da, metabolomics is an appealing tool. Metabolomics, which is a rapidly growing research field, comprises methods and techniques to analyze metabolites in biosynthetic pathways, thereby providing insights into the biochemical conditions of biological systems.
Targeted and untargeted approaches are the two strategies used in metabolomics. In targeted metabolomics, preselected specific metabolites are identified, whereas untargeted metabolomics involves the detection and identification of all metabolites, including unknown chemicals [10]. In the field of metabolomics, a combination of chromatography with mass spectrometry is regarded as the fundamental and essential analytical technique, and it is frequently utilized due to its ability to analyze complex biological samples, as well as its large dynamic range and reproducibility [11,12]. Moreover, advancements in metabolomics have ramped up its development as a crucial tool in the medical field, particularly in the investigation of biomarkers associated with diseases and toxic chemicals, as well as in the exploration of molecular mechanisms to deliver thorough insight into human biochemistry [13].
The complex MS/MS data acquired in metabolomics experiments can be visualized and analyzed employing a computer-based approach, molecular networking, which establishes a network-shaped map based on similarity in CID-MS/MS fragmentation patterns of two or more molecules. Global Natural Products Social Molecular Networking (GNPS) is a crucial online bioinformatics tool that is currently being utilized to perform molecular networking; it can detect possible resemblance among all MS 2 datasets, which further aids in the annotation of unknown but closely related metabolites [14].
Curcuma longa has been extensively investigated in the past for its metabolites [15,16]. Sesquiterpenoids and terpecurcumins extracted from C. longa L. have been studied for their anti-inflammatory, anti-atherosclerotic, and cytotoxic properties [17][18][19]. Recently, the antioxidant potential of diarylheptanoids has been explored [20,21]. Additionally, recent studies have analyzed metabolite differences between five Curcuma species using UPLC-MS/MS and reported that the quantity of curcuminoids in C. longa L. is higher than that in Curcuma species [22].
The main goal of this research is to explore secondary metabolites present in the rhizomes of Curcuma longa L. through the application of high-performance liquid chromatography coupled with tandem mass spectrometry (HPLC-MS/MS) and molecular networking techniques.

Plant Collection and Extract Preparation
Fresh rhizomes (400 g) were collected by harvesting 10 Curcuma longa L. plants from Bardiya district (GPS coordinates: 28 • 14 25.9 N, 81 • 31 22.3 E) of Lumbini Province, Nepal, and were thoroughly washed, cut into slices, and left to dry in the sunlight for a week. After drying, sliced rhizomes were milled into fine powder and stored in air-tight plastic bags. The pulverized powder (46.564 g) was macerated with 100 mL methanol. The powdered sample was subjected to a 24 h soaking process in methanol, followed by filtration. The aforementioned procedure was repeated thrice; and, by setting the temperature of the rotatory evaporator at 40 • C, the diluted extract was concentrated each time until a solid mass was obtained. Fractionation of crude extract was carried out by dissolving it into distilled water and subsequently extracting it using ethyl acetate and hexane.

Mass Spectrometry and Compound Annotation
The high-performance liquid chromatography-high resolution-mass spectrometry (HPLC-HR-MS/MS)-based metabolic profiling of ethyl acetate and hexane fractions of C. longa rhizome was carried out on a Bruker maXis II LC-ESI-QTOF mass spectrometer at Gross lab, Department of Pharmaceutical Biology, University of Tübingen, Germany. An LC method was applied as follows: with 0.1% formic acid in H 2 O as solvent A and acetonitrile as solvent B, a linear gradient of 10% to 100% B for 35 min, 100% B for an additional 10 min, using a flow rate of 0.5 mL/min; 3 µL injection volume and UV detector (UV/Vis) wavelength monitoring at 190-800 nm. The separation was performed using a Luna Omega Polar C18 column (3 µm, 250 × 4.6 mm). The following experimental parameters were utilized for the analysis: a capillary voltage of 4500 V, nebulizer gas pressure (nitrogen) of 2 (1.6) bar, ion source temperature set at 200 • C, and a dry gas flow rate of 7-9 L per minute at source temperature. The MS data acquisition was performed in the range of m/z 100-1800. Both modes of ionization were employed to measure HRMS data and spectral acquisition rates were set to 3 Hz for MS 1 and 10 Hz for MS 2 , as described by Aryal et al. [23]. To obtain MS/MS fragmentation data, the experiment employed a selection process where the 10 most intense ions per MS 1 were chosen for collision-induced dissociation (CID). Stepped CID energy was then applied to induce the fragmentation process. The parameters used for tandem MS were applied according to the previously described method by Garg et al. [24]. In this experiment, Sodium formate and Hexakis (2,2difluoroethoxy) phosphazene (Apollo Scientific Ltd., UK) were used as internal calibrants and as lock mass, respectively.
The raw data were manually skimmed for quality and then analyzed in Bruker Compass Data Analysis (Version 4.4, Bruker Daltonics GmbH, Billerica, MA, USA). Subsequently, raw data files were converted into .mzXML format and further annotated using CSI: Fin-gerID (a graphical user interface for SIRIUS) [25]. The calculated mass, absolute error, RDBE, and molecular formulae were generated by using Bruker Data Analysis software and were compared with the formula generated by SIRIUS. Furthermore, the annotated compounds were validated via the SIRIUS score, literature survey, and natural productsbased servers and databases, such as PubChem [26], LOTUS [27], and ChemSpider [28]. The higher the value of the SIRIUS score, the higher the confidence of molecular annotation.

GNPS-Based Molecular Networking
GNPS platform (https://gnps.ucsd.edu/) (accessed on 12 May 2023) leverages complex MS/MS data in metabolomics experiments for the visualization and further annotation of metabolites based on similarity in fragmentation patterns [29]. The raw data files (.d format) in positive ionization mode of ethyl acetate and hexane fractions were first converted to .mzXML format using open-source MSConvert software (Version: 3.0). The converted files were uploaded to Mass Spectrometry Interactive Virtual Environment (Mas-sIVE) dataset (https://massive.ucsd.edu/) (Accession number: MSV000092243, accessed on 12 May 2023) using FTP client CoffeeCup. The precursor ion and fragment ion mass tolerance was set at 2.0 Da and 0.5 Da, respectively. Then, GNPS was performed to construct a network by setting the cosine score value greater than 0.7. The generated molecular networks were then exported to Cytoscape software (Version: 3.10.0) in '.graphml' format to visualize the networks.

Metabolite Profiling Using HPLC-MS/MS
The LC-HR-ESI-MS/MS-based metabolite profiling of the rhizomes of C. longa L. displayed a significant abundance of therapeutically active compounds belonging to various classes, including phenolic compounds, cinnamic acid derivatives, sesquiterpenoids, and fatty acids. The base peak chromatograms of ethyl acetate fraction for positive and negative modes of ionization are shown in Figures 1 and 2, respectively.
The LC-HR-ESI-MS/MS-based metabolite profiling of the rhizomes of C. longa L. displayed a significant abundance of therapeutically active compounds belonging to various classes, including phenolic compounds, cinnamic acid derivatives, sesquiterpenoids, and fa y acids. The base peak chromatograms of ethyl acetate fraction for positive and negative modes of ionization are shown in Figure 1 and Figure 2, respectively.   Table 1.
The MS 1 and MS 2 profiles of the observed metabolites are displayed in Supplementary Figures S1-S30. The structures of the annotated metabolites are displayed in Figure  3. The LC-HR-ESI-MS/MS-based metabolite profiling of the rhizomes of C. longa L. displayed a significant abundance of therapeutically active compounds belonging to various classes, including phenolic compounds, cinnamic acid derivatives, sesquiterpenoids, and fa y acids. The base peak chromatograms of ethyl acetate fraction for positive and negative modes of ionization are shown in Figure 1 and Figure 2, respectively.   Table 1.
The MS 1 and MS 2 profiles of the observed metabolites are displayed in Supplementary Figures S1-S30. The structures of the annotated metabolites are displayed in Figure  3. A total of 30 s metabolites annotated from the HR-MS data of ethyl acetate and hexane fractions ionized in both positive and/or negative modes are listed in Table 1.
Compound 16 had a molecular ion at m/z 309.1127 in (+)-ESI ionization and m/z 307.0979 in (-)-ESI ionization, and was identified as bisdemethoxycurcumin, which was previously observed in C. longa [35].  3 ]attributed to the removal of a methyl radical that either eliminated a CO moiety to give a peak at m/z 108 or lost a CO 2 molecule to give a peak at m/z 92. Thus, compound 23 was annotated as vanillin, previously reported in the rhizome of C. longa L. [37].  [42]. Compound 27 was eluted at 28.0 min and it displayed a molecular ion with m/z 293.2125. Its MS 2 spectrum exhibited fragment ions at m/z 275 (base peak), m/z 235, m/z 231, m/z 232, m/z 171, and m/z 121. Therefore, compound 27 was putatively identified as 9-hydroxy-10, 12, 15-octadecatrienoic acid, which was already reported in the leaf of Isatis tinctoria [43]. Compound 29 was eluted at 29.6 min and exhibited a molecular ion with m/z 295.2282 [M-H] -. Its MS 2 spectrum revealed product ions with m/z 277 (base peak), m/z 195, m/z 183, and m/z 171. Hence, compound 29 was tentatively annotated as coriolic acid, which was previously reported in Deprea subtriflora [44].

GNPS-based Molecular Networking
Molecular networking analysis is an analytical method to analyze and visualize metabolites from HR-MS/MS data within the molecular network, where each metabolite is depicted as a node with its corresponding m/z value. This network consists of multiple clusters based on the resemblance of molecular fragmentation pa erns, which indicates that they share similar core chemical structures [45]. A total of 476 individual ions were observed as nodes and 576 as edges in the molecular network, in which three clusters A, B, and C were formed, as shown in Figure 6.

GNPS-based Molecular Networking
Molecular networking analysis is an analytical method to analyze and visualize metabolites from HR-MS/MS data within the molecular network, where each metabolite is depicted as a node with its corresponding m/z value. This network consists of multiple clusters based on the resemblance of molecular fragmentation patterns, which indicates that they share similar core chemical structures [45]. A total of 476 individual ions were observed as nodes and 576 as edges in the molecular network, in which three clusters A, B, and C were formed, as shown in Figure 6.
A large cluster A formed in molecular networking was characterized by precursor ions with m/z 309.127, m/z 311.132, m/z 267.103, m/z 313.145, and m/z 293.118, which were identified as compounds 16, 17, 12, 11, and 13, respectively; these were previously identified on manual annotation. In cluster A, a precursor ion with m/z 295.135 showed similarity in MS 2 spectra with m/z 293.118 and had a difference in m/z only 2. This showed that there should be one double bond difference between these precursor ions. Thus, precursor ion m/z 295.135 was identified as 1,7-bis(4-hydroxyphenyl)hepta-4,6-dien-3-one, isolated and reported previously from the rhizome of C. kwangsiensis [46]. Moreover, another small cluster, B, consists of three precursor ions (m/z 333.171, m/z 297.150, and m/z 313.182), and an ion with m/z 333.171 was identified as 3,5-dihydroxy-1-(3,4-dihydroxyphenyl)-7-(4-hydroxyphenyl)heptane. The neutral loss of 36.021 Da from the precursor ion at m/z 333.171 and the cosine value of 0.8827 suggests the precursor ion m/z 297.150 has a similarity in MS 2 spectra with m/z 333.171. Thus, the precursor ion at m/z 297.150 was putatively identified as 1,7-bis(4-hydroxyphenyl)hept-6-ene-3-one. A large cluster A formed in molecular networking was characterized by precursor ions with m/z 309.127, m/z 311.132, m/z 267.103, m/z 313.145, and m/z 293.118, which were identified as compounds 16,17,12,11, and 13, respectively; these were previously identified on manual annotation. In cluster A, a precursor ion with m/z 295.135 showed similarity in MS 2 spectra with m/z 293.118 and had a difference in m/z only 2. This showed that there should be one double bond difference between these precursor ions. Thus, precursor ion m/z 295.135 was identified as 1,7-bis(4-hydroxyphenyl)hepta-4,6-dien-3-one, isolated and reported previously from the rhizome of C. kwangsiensis [46]. Moreover, another small cluster, B, consists of three precursor ions (m/z 333.171, m/z 297.150, and m/z 313.182), and an ion with m/z 333.171 was identified as 3,5-dihydroxy-1-(3,4-dihydroxyphenyl)-7-(4-hydroxyphenyl)heptane. The neutral loss of 36.021 Da from the precursor ion at m/z 333.171 and the cosine value of 0.8827 suggests the precursor ion m/z 297.150 has a similarity in MS 2 spectra with m/z 333.171. Thus, the precursor ion at m/z 297.150 was putatively identified as 1,7-bis(4-hydroxyphenyl)hept-6-ene-3-one.
In this research, most of the metabolites detected were non-volatile, polar molecules, since the LC-HR-ESI-MS/MS-based analysis was limited to the detection of compounds with heteroatoms. As a result, therapeutically valued volatile compounds found in the C. longa rhizomes may be excluded by this approach and GC-MS-based analysis may become a choice for the detection of such compounds. Some of the metabolites were detected in hexane fractions despite their polarity. This may be due to the incomplete fractionation of rhizome extracts. Similarly, precursor ions eluted at a retention time of 19.3 min with m/z 353.1024, and at a retention time of 19.5 min with m/z 383.1132 in the positive mode of ionization of ethyl acetate fraction, were not further analyzed because these precursor ions have not undergone fragmentation. Further, due to the low abundance of some metabolites in the ethyl acetate fraction, these could not exhibit intense peaks in the base peak chromatogram (Figure 1).
Moreover, most of the diarylheptanoids were detected in the negative ion mode. The reason why diarylheptanoids are appropriate for detection in the negative ion mode is that they contain multiple hydroxyl groups. These hydroxyl groups make it effortless for the ionization in negative mode. Moreover, diarylheptanoids with low abundance were easily detected in the positive ionization mode, but not in the negative ionization mode. This observation indicates the low sensitivity of the negative mode in comparison with the positive mode of ionization. Additionally, it was found that the absence of a keto group in the heptyl chain affected the protonation of low-abundance diarylheptanoids in the positive ion mode and imposed difficulty for the fragmentation in negative ion modes, as mentioned previously [30].
The molecular networking strategy enables the simultaneous analysis of multiple mass spectra by creating multiple clusters based on similarity in the spectral data of molecules, thus simplifying the interpretation and visualization of complex datasets. Moreover, it gives information about the structural relationships among the compounds belonging to a particular cluster, thereby facilitating the identification of known and unknown metabolites and derivatives [55]. However, manual annotation of MS spectral data is tedious and timeconsuming, and sometimes it may lead to erroneous interpretation of complex datasets. The majority of the metabolites detected in this research were already reported in C. longa L.; therefore, additional research is required to explore the unidentified nodes and edges present in molecular networking.

Conclusions
Turmeric has been widely used in food as a spice and in herbal medication and is a rich source of therapeutically active compounds. We chose liquid chromatography coupled with mass spectrometry owing to its high sensitivity and selectivity. This hyphenated technique has gained popularity in the past two decades in metabolomics studies to explore, identify, and validate naturally occurring bioactive compounds as well as biomarkers in the medicinal field. We used an HPLC-HR-ESI-MS/MS-based metabolomics approach along with molecular networking to study the metabolites in the turmeric extracts. The metabolic profiling of ethyl acetate and hexane fractions in both ionization modes showed the presence of 30 annotated metabolites, including 16 diarylheptanoids, 1 diarylpentanoid, 4 sesquiterpenoids, 3 bisabolocurcumin derivatives, 4 cinnamic acid derivatives, and 2 fatty acid derivatives. Five diarylheptanoids were identified for the first time in C. longa L. rhizomes. We have initiated this project where we analyzed the overall metabolome of C. longa L. rhizomes. In the future, we plan to work with several other traditionally important species to discover the differences in metabolite profiles and evaluate their bioactivities. Additional research is recommended to isolate and validate newly identified diarylheptanoid compounds, explore compounds in different Curcuma species, and check their bioactivities through in silico, in vitro, and in vivo experiments to develop potential drug candidates and food supplements.