Comparative Investigation of the Differences in Chemical Compounds between Raw and Processed Mume Fructus Using Plant Metabolomics Combined with Chemometrics Methods

Mume Fructus is a well-known herbal medicine and food with a long history of processing and application. Different processing methods impact the intrinsic quality of Mume Fructus. Thus, it is of great significance to investigate the changes in chemical components during processing (i.e., raw compared to the pulp and charcoal forms). In this study, plant metabolomics methods based on mass spectrometry detection were established to analyze the chemical ingredients of Mume Fructus comprehensively. Chemometric strategies were combined to analyze the profile differences of Mume Fructus after different processing methods. The established strategy identified 98 volatile and 89 non-volatile compounds of Mume Fructus by gas chromatography-mass spectrometry (GC-MS) and ultra-high performance liquid chromatography coupled with quadrupole time of flight mass spectrometry (UHPLC-Q-TOF-MS/MS), respectively. Moreover, the orthogonal partial least squares discriminant analysis (OPLS-DA) indicated that raw Mume Fructus and the Mume Fructus pulp and charcoal were distributed in three regions. Subsequently, 19 volatile and 16 non-volatile components were selected as potential chemical component markers with variable importance in the projection using (VIP) >1 as the criterion, and the accuracy was verified by a Back Propagation Neural Network (BP-NN). To further understand the difference in the content of Mume Fructus before and after processing, 16 non-volatile chemical component markers were quantitatively determined by ultra-high performance liquid chromatography-mass spectrometry (UHPLC-MS/MS). The results revealed that, compared with raw Mume Fructus, the total content of 16 components in the pulp of Mume Fructus increased while it decreased in the charcoal. Therefore, this study used GC-MS, UHPLC-Q-TOF-MS/MS and UHPLC-MS/MS modern technology to analyze the differences in chemical components before and after the processing of Mume Fructus and provided a material basis for further research on the quality evaluation and efficacy of Mume Fructus.


Introduction
Mume Fructus (MF) is derived from the immature fruit of Prunus mume Sieb. etZucc [1]. It is also known as wumei in China, Japanese apricot or ume in Japan, and maesil or oumae in Korea. The plant is native to Japan and South Korea and is widely planted in Yunnan, Sichuan, Xinjiang, and other regions in China [2]. As a common commercial food, it is also used to prepare plum sauce, plum juice, and plum wine, which can be consumed as snacks, condiments, or food additives. Phytochemical studies have shown that MF contains various chemical components, including non-volatile and volatile components. The research on nonvolatile components mainly focuses on organic acids, flavonoids, terpenoids, amino acids, polysaccharides, and nucleotides. The organic acids are one of the main active components, and the citric acid content was used as a detection index for quality control of MF in the 2. Results and Discussion 2.1. Volatile Ingredients Analysis 2.1. 1

. Method Validation
The GC-MS method was verified in terms of precision, stability, and repeatability. The RSDs of the retention time and peak area were less than 0.65% and 9.12%, as shown in Supplementary Table S1, suggesting that the GC-MS method was precise for analyzing MF samples.

Volatile Ingredient Identification
The total ion chromatogram (TIC) of the MF samples is provided in Supplementary Figure S1. Based on the automatic peak identification procedures, the volatile compounds were identified against the GC-MS NIST08 and NIST08s databases. Compounds were identified with a match similarity higher than 75%, and the peak area data were obtained by peak area integration and expressed as a relative content using the area normalization method. A total of 98 compounds (Table 1) were detected in different processed MF samples, mainly aldehyde ketones, phenols, carboxylic acids, and esters. The compounds in the MF pulp were the most diverse, up to 68, whereas raw MF and MF charcoal had 53 and 44 components, respectively. After the MF is processed, the relative content of volatile compounds of raw MF and MF pulp and charcoal differed, as shown in Figure 1. The aldehyde ketones and carboxylic acids have a high content in all samples, while the esters contents were low. The carboxylic acids, phenols and esters of raw MF exhibited an increasing trend by removing the core. The aldehydes ketones were increased after the raw MF was processed into charcoal. In general, after removing the core, the types of volatile components of MF increased, while the types of volatile components of MF charcoal decreased, which was related to the chemical and physical changes during the charcoal frying process, such as reduction and oxidation reactions.

Plant Metabolomics Analysis and Identification of Volatile Chemical Markers
Plant metabolomics analysis has been used to determine different chemical compositions in different environments. Using the XCMS online data analysis platform, all mass spectrum data obtained by GC-MS were converted to a three-dimensional matrix containing Rt, m/z, and peak intensity information. A total of 487 variables were acquired and imported to SIMCA-P 14.1 for multiple statistical analyses. Orthogonal partial least squares discriminant analysis (OPLS-DA), as a supervised multivariate analysis method, can eliminate differences between groups, exclude irrelevant variations, and make it easier to identify system information and noise. The OPLS-DA results ( Figure 2A) showed that raw MF and MF pulp and charcoal were clearly distributed in different regions using 487 variables, and the volatile ingredients of the three groups had clear differences. However, using 487 variables to differentiate three groups of MF samples was difficult. Therefore, the different contributions were analyzed to obtain the variable importance in projection value (VIP) based on OPLS-DA analysis. The components with VIP > 1 were used for the subsequent analysis. Then, 99 of the 487 variables were screened, as shown in Figure 2B, and the raw and processed MF samples were well distinguished through these 99 variables.
The identification of the compound was mainly based on accurate molecular mass, retention time, and MS/MS information. The potential markers were determined against the NIST08 and NIST08s databases. Finally, 19 differential volatile chemical markers were identified from 99 variables. They were compounds 1, 4, 7, 8, 10, 11, 13, 17, 19, 26, Figure 2C) demonstrated that three groups of MF samples could be distinguished using 19 potential markers. The compounds with loadings that were distant from the origin on the OPLS-DA loading plots (Supplementary Figure S2) were inferred to make the greatest contribution to class separation, 19 differential volatile chemical markers were major contributors to the separation among the raw and processed MF samples. In the meantime, the accuracy of the selected differential compounds should be further verified. BP-NN, a supervised learning model, was used to determine the accuracy of each step's variables. The batches of R1-11, P1-11, and C1-11 were set as the raw MF and MF pulp and charcoal training sets. The batches R12-14 of raw MF, P12-14 of MF pulp, and C12-14 of MF charcoal were identified as the validation sets. The remaining batches (R15-17, P15-17, and C15-17) were defined as the testing sets. The results (Supplementary Table S2) showed that the accuracy of all variables exceeded 85%, indicating that the 19 differential components could represent volatile compounds in MF to distinguish the raw MF and MF pulp and charcoal samples.  The retention time and peak area of the twenty selected chromatographic peaks were used to calculate the RSD values, which were considered an important evaluation index for precision, repeatability, and stability. The RSD of the precision values was all below 6.75%, indicating that the method has high accuracy. The repeatability of the RSDs ranged from 0.05-6.88%, demonstrating the consistency of the results of the method. The RSDs of stability were within 0.01-6.99%, which illustrated that the sample solution was stable over 24 h. All the above results (Supplementary Table S3) displayed that the UHPLC-Q-TOF-MS/MS method was reliable for the plant metabolomics data.

Compound Identification in Mume Fructus
The identification of compounds was crucial for screening candidate markers in the subsequent studies. The plant metabolomics data of raw and processed MF samples was acquired in both positive and negative ESI modes, and the TIC figures are illustrated in Supplementary Figure S3. The obtained mass spectrograms were verified by: (a) matching with the molecular formula generated by the instrument; (b) analyzing the compound information acquired from the Metlin database (http://metlin.scripps.edu, accessed on 29 June 2022); (c) comparing with the fragment information of the standard products; (d) taking reference to the compound information of previous reports. The requisite criteria were applied, which are exact mass-to-nucleus ratio of the precursor ions within an error of 10 ppm, and then inferred the chemical composition based on the fragment ions and the structural formula of the compound. By the above-mentioned data acquisition and mining strategies, 89 compounds, mainly organic acids, amino acids, flavonoids, and triterpenes, were tentatively identified. The detailed information on the composition is shown in Table 2.    As listed in Table 2

Plant Metabolomics Data Analysis and Verification of Differential Markers
R software and SIMCA software were used to analyze the UHPLC-Q-TOF-MS/MS results. Using the R software, all the mass spectrometry data of MF samples obtained from UHPLC-Q-TOF-MS/MS were converted into a three-dimensional matrix, including retention time (Rt), m/z value, and peak intensity. Then, 2986 and 3605 variables were obtained in negative and positive ion modes and were used in OPLS-DA analysis in the SIMCA software. The OPLS-DA diagrams ( Figure 3A,E) showed that raw MF and MF pulp and charcoal samples were distributed in three different regions using 2986 and 3605 variables, suggesting that the processed methods impact the chemical composition of MF. However, using this volume of variables to distinguish the MF samples is impractical. Thus, the substances with VIP > 1 were used as potential difference markers for subsequent analysis of the MF samples. A total of 420 and 674 variables were filtered from the 2986 and 3605 variables, respectively. The raw MF and MF pulp and charcoal samples were distinguished well using the 420 and 674 variables ( Figure 3B,F).
The 420 and 674 variables with VIP > 1 were accurately identified based on Rt, m/z, and fragment information. From this, 26 and 12 compounds were accurately identified.  (Table 2). In addition, the OPLS-DA results ( Figure 3C,G) indicated that the 38 components have the potential to distinguish raw MF and MF pulp and charcoal samples. To easily quantify and quickly distinguish the three types of MF samples, 16 compounds (succinic acid, L-malic acid, 3,4-Dihydroxybenzaldehyde, protocatechuic acid, caffeic acid, D-quinic acid, citric acid, ferulic acid, syringic acid, cryptochlorogenic acid, neochlorogenic acid, chlorogenic acid, amygdalin, maslinic acid, corosolic acid, and rutin) were selected as potential differential markers. The OPLS-DA figures ( Figure 3D) suggested that 16 markers could separate the raw MF and MF pulp and charcoal samples. The OPLS-DA loading plot showed the variables that contributed to the separation on MF samples (Supplementary Figure S4). However, the accuracy of the selected variables was unknown. Therefore, BP-NN was used to predict the accuracy of each step to generate variables. The training, validation, and testing sets were defined as the same GC-MS analysis. The results (Supplementary Table S2) showed that the accuracies of all variables in the positive and negative ion modes were higher than 79%. Interestingly, the accuracy of 16 variables was equivalent to using 420 variables and even close to using 2986 variables. In conclusion, the 16 compounds could be used for the quality evaluation of MF samples.

UHPLC-MS/MS Quantitative Method Validation
Quantitative method validation of the established UHPLC-MS/MS method was performed to determine linearity, LLODs (Lower Limit of Detections), LLOQs (Lower Limit of Quantitations), intra-and inter-day precision, repeatability, stability, recovery, and the dilution effect. The results were displayed in Supplementary Tables S4-S6. The correlation coefficient values (r ≥ 0.9991) for the 16 constituents indicated good linearity within the concentration range. The range of LLOQs and LLODs were from 0.13-40.19 ng/mL and 0.04-12.06 ng/mL, respectively. The RSDs of intra-and inter-day precisions of 16 analytes were within 0.63-6.77% and 1.06-6.07%, respectively. The method could determine multiple samples due to the RSDs of repeatability of less than 5.90%. As for stability, the RSDs were lower than 6.90%. The results indicated that the quantitative method could accurately determine the samples over several days. The developed method also had acceptable accuracy, recovering 88.81-110.45% of all compounds. The RSD values of the dilution effect were less than 6.94%, and the RE ranged from −7.31-5.69%, indicating that the content measured was accurate when the samples were diluted within a certain range. In general, the established UHPLC-MS/MS method was suitable for analyzing 16 components in the MF samples. The analyte's multiple reaction monitoring (MRM) diagram is shown in Supplementary Figure S5.

Analysis of Different Processed Methods of Mume Fructus Samples
Six batches of raw MF and MF pulp and charcoal samples from Sichuan province were analyzed using the same OPLS-DA analysis with the 16 differential markers to eliminate the impact of origin on the quality markers. The results ( Figure 4A) showed that the 16 compounds could divide the MF samples into three groups, including raw MF and MF pulp and charcoal. Moreover, the ROC curve was generated to verify the classification capabilities of the model. As shown in Figure 4B, the ROC curve passed through the left upper corner and AUC (the region under the ROC curve) close to 1, suggesting that the 16 markers could accurately classify these three groups of MF samples. Therefore, processing could alter the content of the 16 compounds in the raw MF samples, leading to differences between the raw MF samples and the MF pulp and charcoal samples. The validated UHPLC-MS/MS method was used to simultaneously determine the 16 active compounds (succinic acid, L-malic acid, 3,4-Dihydroxybenzaldehyde, protocatechuic acid, caffeic acid, D-quinic acid, citric acid, ferulic acid, syringic acid, cryptochlorogenic acid, neochlorogenic acid, chlorogenic acid, amygdalin, maslinic acid, corosolic acid, and rutin) in raw MF and MF pulp and charcoal samples. The contents of 16 components in MF samples are presented in Supplementary Table S7.
As shown in Figure 5, there were differences in the total content of 16 components in the two processing methods compared to that of raw MF. Compared with raw MF, the total content of organic acids in the MF pulp was higher, showing that the organic acids are mainly located in the pulp. The pharmacological effects of raw MF and MF pulp are similar, but the efficacy of MF pulp is stronger, which may be related to the higher content of organic acids in MF pulp. And the organic acid content in MF charcoal is the lowest, indicating that heating and drying during charcoal production can reduce the acidity of raw MF, which is the same as the statement that "the damage to the teeth can be avoided after MF charcoal" [32]. In terms of individual components, the citric acid content of the MF charcoal was significantly lower than in the raw MF (p < 0.01), indicating that it can be broken down into other products under high temperature conditions. Compared with raw MF, the amygdalin content of the MF pulp was lower, which may be attributed to the presence of amygdalin mainly in the core shell and kernel. The raw MF have obvious antitussive effect, but the MF pulp has no antitussive effect, which may be related to the low content of amygdalin in the pulp [16,17]. Moreover, the content of amygdalin in MF charcoal was significantly reduced (p < 0.01), revealing that heating conditions may accelerate the isomerization and decomposition of amygdalin.

Discriminant Analysis
Discriminant analysis was used to predict the classification of raw MF and MF pulp and charcoal in unknown samples. The raw MF (R1-R12), MF pulp (P1-P12), and MF charcoal (C1-C12) were marked as group 1, group 2, and group 3 (Table 3), respectively. The contents of 16 components of these samples were used as modeling data to construct a discriminant analysis model using SPSS software. The discriminant function equations of the MF samples were as follows (S1: succinic acid, S2: L-malic acid, S3: 3,4-Dihydroxybenzaldehyde, S4: protocatechuic acid, S5: caffeic acid, S6: D-quinic acid, S7: citric acid, S8: ferulic acid, S9: syringic acid, S10: cryptochlorogenic acid, S11: neochlorogenic acid, S12: chlorogenic acid, S13: amygdalin, S14: maslinic acid, S15: corosolic acid, S16: rutin):  The content of each chromatographic peak of different batches of MF samples was used in the functional equation to obtain the Y value. We tested 15 batches of MF samples of known origin (R13-R17, P13-P17, and C13-C17) using the obtained discriminant function, and the discriminant analysis results were compared with the actual sources, as shown in Table 3. The results indicated that most MF samples were correctly classified, only one sample (C15) was incorrectly predicted, and the classification model's accuracy model was 93%. This demonstrated that simultaneous determination of 16 components combined with discriminant analysis could accurately predict the classification of raw MF and MF pulp and charcoal samples.

Sample Collection
A total of 51 batches of raw and processed MF samples were used in this study. Among them, 17 batches of raw MF (Supplementary Table S8) were collected from May to July 2020 in four provinces (Yunnan, Sichuan, Xinjiang, and Anhui) of China. Moreover, according to Chinese Pharmacopeia (2020 edition), we processed 17 batches of MF pulp (P1-P17) and charcoal (C1-C17) using the raw MF (R1-R17).

Processing Methods of Mume Fructus
MF Pulp: the raw MF samples were pressed, the pulp was taken out, and dried in a heating-air drying oven at approximately 50 • C. MF Charcoal: take the raw MF samples and put them in a metallic pan, heat them with a strong fire, fry until black outside and brown inside, take them out and dried in a heating-air drying oven at approximately 50 • C.

Apparatus
The volatile components were analyzed by a QP 2010 GC-MS (Shimadzu, Kyoto, Japan), equipped with an HSS 86.50 headspace sampler and AOC-20i autosampler.

Sample Preparation and Measurement
All batches of raw MF and MF pulp and charcoal were dried and pulverized to finer than 60 mesh; a 2.0 g sample was then sealed in the headspace bottle (20 mL) for analysis. The heating box, quantitative ring, and transmission line temperatures were 100 • C, 120 • C, and 140 • C, respectively. The equilibrium time was 20 min, and the injection time was 1 min.
Chromatographic separation was achieved on a DB-17 column (0.25 mm × 30 m × 0.25 µm). The initial oven temperature was set at 80 • C, warmed to 200 • C at 10 • C/min, 210 • C at 2 • C/min, 260 • C at 6 • C/min, and then maintained for 10 min. The injector temperature was 250 • C. High purity helium was used as carrier gas at a flow rate of 1 mL/min. The injection volume was 1 mL with a 20:1 split ratio. MS detection was performed with an electronic bombardment source in full scan mode at m/z 20-700. The ion source and interface temperatures were 230 • C and 250 • C, respectively. The detector voltage was 1.3 kV.

Data Pre-Processing
The collected data were converted into MZ data by GC-MS Postrun Analysis (Shimadzu, Kyoto, Japan). The data of all batches of raw and processed MF samples were introduced to R 2.7.2 software (R Foundation for Statistical Computing, Vienna, Austria) to obtain a three-dimensional matrix including retention time (Rt), mass/charge ratio (m/z), and peak intensities. The data obtained was imported into SIMCA-P 14.1 statistical software (Umetrics AB, Umea, Sweden) for multivariate statistical analysis to screen differential markers. The selected differential components' accuracy was calculated by the BP-NN algorithm using Matlab R2014a (Mathworks, Natick, MA, USA).

Sample Preparation and Measurement
The standards (syringic acid, caffeic acid, chlorogenic acid, cryptochlorogenic acid, neochlorogenic acid, corosolic acid, maslinic acid, rutin, ferulic acid, 3,4-Dihydroxybenzalde hyde, amygdalin) were accurately weighed and dissolved with methanol solvent at a concentration of 1 mg/mL. Citric acid, L-malic acid, succinic acid, D-quinic acid, and protocatechuic acid were prepared in water at a concentration of 1 mg/mL. The individual standard solutions were mixed as a stock solution and further diluted with methanol to a working standard.
All batches of MF samples were dried, powdered and passed through a 60 mesh, 0.3 mm aperture sieve. Pulverized samples (1 g) were accurately weighed and then extracted in an ultrasonic bath (40 kHz, 180W) for 40 min at 25 ± 2 • C with 25 mL 80% methanol in water. All sample solutions were centrifuged at 14,000 rpm for 10 min, and the supernatant was filtered through a 0.22 µm membrane.

Data Pre-Processing
The UHPLC-Q-TOF-MS/MS plant metabolomics data were converted into MZ data using Agilent Masshunter Qualitative Workstation Analysis B.07.00 (Agilent Technologies Inc., Santa Clara, CA, USA). The data were then imported to R software and SIMCA-P 14.1 software for further analysis as the processing of GC-MS analysis.

Chemicals and Apparatus
The quantitative analysis was carried out on an Agilent 1290 UHPLC instrument (Agilent Technologies, Waldbronn, Germany) coupled with an Agilent 6470 series triple quadrupole mass spectrometer (Agilent Technologies, Singapore, Singapore). The same chemicals prepared for the UHPLC-Q-TOF-MS/MS analysis were used.

Sample Preparation and Measurement
The standard solution preparation was the same as described in Section 3.4.2. The quantitative sample powder was accurately weighed (50 mg), and the subsequent ultrasound step was the same as for the UHPLC-Q-TOF-MS/MS analysis. The sample solution was diluted 50 times to determine citric acid.

Method Validation
Stock solutions containing 16 standard compounds were prepared and diluted to a series of appropriate concentrations to construct the calibration curve. The linearity for each compound was determined by weighted (1/X) least-squares linear regression of the standard peak areas (Y) against the normalized standard concentrations (X). Under the present chromatographic conditions, lower limits of detections (LLODs) and quantifications (LLOQs) were detected by diluting the standard solution when the signal-to-noise ratios (S/N) were approximately 3 and 10, respectively. The raw MF sample (batch 8) was used to validate the method, including precision, repeatability, stability, and recovery. The dilution effect was verified using a known concentration of the standard solution. The intra-and inter-day precisions were determined by analyzing six replicates on three consecutive days. Six independent samples of the raw MF (batch 8) were extracted and analyzed to determine the repeatability. The stability test was obtained using one sample solution stored at 25 ± 2 • C and analyzed at 0, 2, 4, 6, 12, and 24 h. The recovery test was used to evaluate the accuracy of this method. A certain amount of 16 standards mixture was added to six accurately weighed (25 mg) samples of raw MF (batch 8) and extracted using the methods mentioned above. The recovery was calculated according to the following equation: recovery (%) = (determined amount − original amount)/spiked amount × 100%.
All of the above variations were assessed by RSDs.

Data Analysis
The UHPLC-MS/MS method was employed to determine the content of partially differential markers simultaneously. The compound content data was imported into SPSS 21.0 (IBM, San Diego, CA, USA) for discriminant analysis to predict the classification of unknown samples.

Conclusions
In this study, a GC-MS and UHPLC-Q-TOF-MS/MS plant metabolomics method were applied to reflect the general characteristics of MF. A chemometrics strategy was used to distinguish the MF samples from different processing methods. According to the OPLS-DA diagrams of volatile and non-volatile components, the raw MF and MF pulp and charcoal samples were classified into three groups, indicating that the processing method greatly influenced the MF samples. A total of 98 volatile compounds were identified, and 19 constituents with a VIP > 1 were selected as potential markers in GC-MS analysis. Through UHPLC-Q-TOF-MS/MS analysis, 89 compounds were identified, and 16 were selected as quality control markers to distinguish the MF samples. Furthermore, UHPLC-MS/MS analysis was used for quantitative analysis of the 16 differential chemical components, and the discriminant analysis showed that the quantification of the above components can accurately distinguish MF samples with different processing methods. In conclusion, the developed plant metabolomics method coupled with a chemometrics strategy was helpful for screening quality markers for distinguishing the raw MF and MF pulp and charcoal samples, and it would provide a reliable reference for the development of TCM or other related food and drug.