Geographical Authentication of Macrohyporia cocos by a Data Fusion Method Combining Ultra-Fast Liquid Chromatography and Fourier Transform Infrared Spectroscopy

Macrohyporia cocos is a medicinal and edible fungi, which is consumed widely. The epidermis and inner part of its sclerotium are used separately. M. cocos quality is influenced by geographical origins, so an effective and accurate geographical authentication method is required. Liquid chromatograms at 242 nm and 210 nm (LC242 and LC210) and Fourier transform infrared (FTIR) spectra of two parts were applied to authenticate the geographical origin of cultivated M. cocos combined with low and mid-level data fusion strategies, and partial least squares discriminant analysis. Data pretreatment involved correlation optimized warping and second derivative. The results showed that the potential of the chromatographic fingerprint was greater than that of five triterpene acids contents. LC242-FTIR low-level fusion took full advantage of information synergy and showed good performance. Further, the predictive ability of the FTIR low-level fusion model of two parts was satisfactory. The performance of the low-level fusion strategy preceded those of the single technique and mid-level fusion strategy. The inner parts were more suitable for origin identification than the epidermis. This study proved the feasibility of the data fusion of chromatograms and spectra, and the data fusion of different parts for the accurate authentication of geographical origin. This method is meaningful for the quality control of food and the protection of geographical indication products.


Introduction
The dried sclerotium of Macrohyporia cocos, belonging to Polyporaceae, is an herbal medicine (called Poria) that can be used as food, and has been approved by the National Health Commission of the People's Republic of China. It plays an indispensable role in numerous drugs, such as the liquid oral formulation of Poriacocos polysaccharides, Sijunzi Tang, Liuwei Dihuang Wan and Chuanbei Pipa Gao. Various kinds of Poria-based foods and skin cosmetics such as sleep-friendly tea, Tuckahoe pie, Guiling jelly (drinks made from turtle shell and medicinal herbs), Guiling jelly soft candy and the Poria facial mask, are pretty popular. Present phytochemical investigation suggests that this fungus contains terpenes and polysaccharides, which present beneficial biological properties, such as a prebiotic effect, through the modulation of gut microbiota composition [1], anti-hyperlipidemic [2], anti-cancer [3] hepatoprotective [4] and affecting adipocyte and osteoblast differentiation effects [5].
Generally, the sclerotium of M. cocos is peeled and processed into two products, the epidermis and the inner part. The epidermis is called Poriae Cutis in Chinese, and the inner part is still called Poria. The epidermis and inner part have similar types of compounds and different secondary metabolites contents [6], which are often used and studied separately. Both Poria and Poriae Cutis are officially recorded in the Chinese Pharmacopoeia.
The provenance of M. cocos is mainly distributed in the Dabie mountains area and Yunnan Province of China. Yunnan is suggested as the most satisfactory habitat because the quality of Yunnan M. cocos is being highly recommended all the time. Due to the large demand for it, and the knowledge of cultivation mastered easily by common people, this fungus is cultivated in large quantities. Although M. cocos is cultivated in Yunnan, the chemical profiles influencing biological activities may be uneven owing to various cultivation sites and different management techniques. It was reported in a previous study that the contents of pachymic acid of M. cocos in different regions of Yunnan varied significantly [7]. Consequently, customers are increasingly demanding some sort of proof of the geographical origin. For the sake of response to the demand, it is necessary to conduct research with respect to the authentication of geographical origin, which can also provide basic technology for the protection of specific geographical indication products [8].
To date, various analytical technologies that respond to the different chemical information of samples have been implemented for the origin identification of M. cocos [9][10][11]. Although these methods proved promising for the discrimination of provenance, they were separately applied. Nowadays, data fusion has been applied in the fields of food and medicine [12,13]. For example, Ni et al. [14] discovered that, based on high-performance liquid chromatography (HPLC) and Fourier transform infrared spectroscopy (FTIR) data fusion, the type and geographical origin of Radix Paeoniae samples could be successfully discriminated, and the fused data matrix showed a prominent result compared with the independent technique.
Data fusion strategies, which fuse the outputs of multiple complementary information to provide rich knowledge about a sample, are hoped to achieve a more accurate characterization than single pieces of information [15]. In addition to the fusion of several datum regarding one sample, the fusion of information regarding different parts was reported. For instance, Casale et al. [16] combined the near-infrared information obtained by the three parts (pileipellis, flesh and hymenium) of each individual to check the authenticity of dried porcini mushrooms. Studies mentioned above demonstrated that although time and effort would be taken to collect multiple complementary data, data fusion was suggested as an alternative strategy to show accurate characterization.
Infrared spectroscopy can provide the molecular functional group structure of metabolites. Liquid chromatography can characterize the exist of compounds and determinate the special compounds. Both techniques present different and complementary information, which were used for data fusion in this study. To the best of our knowledge, infrared spectroscopy was widely used for geographical classification because of the features of simplicity and rapidity [17,18]. Liquid chromatography was almost used for determining the contents of compounds [19,20]. Multiple chromatographic data fusion has been merely reported in the authentication of the geographical origin of palm oil [21], predicting antioxidant activity of Turnera diffusa [22], authentication of Valeriana species [23] as well as a comparison of Salvia miltiorrhiza and its variety [24]. Actually, a wealth of information was contained in the chromatographic data, and due to extensive automation, a stable and reliable result could be obtained.
In this study, two data fusion strategies including low and mid-level fusion as well as two data combinations including the fusion of complementary information regarding a single part, and the fusion of information regarding two medicinal parts from one sclerotium were performed for the geographical authentication of M. cocos. Liquid chromatograms at two wavelengths (242 nm and 210 nm) and FTIR spectra of two medicinal parts (Poria and Poriae Cutis) of M. cocos were analyzed.
Contents of five triterpene acids were measured. Chromatographic data fusion, spectral data fusion as well as chromatography and spectroscopy data fusion were implemented, combined with partial least squares discriminant analysis (PLS-DA).

Spectral Analysis
FTIR is an auxiliary method in the structural elucidation of organic compounds, which is also employed to assess the quality attributes of a product and authenticate geographic location [17]. With the characteristics of easy operation and rapid acquisition, it was applied to the identification of cultivation location of M. cocos. The second derivative spectra of samples from each geographic origin were given in Figure 1, and absorption peaks were observed in the form of negative peaks. Because a 2600-1750 cm −1 signal was caused by ATR crystal material [25], it was discarded and did not present in the Figure  Absorption bands at 2964 and 1704 cm −1 were just observed in Poriae Cutis samples. A disparity of absorption intensity exhibited in samples from different cultivation locations. Relatively high absorbance values were at around 1200-950 cm −1 , which were mainly caused by C-O stretching vibration, C-C stretching vibration and C-OH bending vibration of polysaccharides [26,27]. Peaks located at 2964 and 2873 cm −1 correspond to C-H antisymmetric and symmetrical stretching vibration of methyl group respectively, while the peak at 2927 cm −1 is assigned to C-H antisymmetric stretching vibration of methylene. The absorption at 1452 cm −1 and 1373 cm −1 belonged to C-H antisymmetric and symmetrical bending vibration of methyl [11]. The peak at 1643 cm −1 was assigned to C=O antisymmetric stretching vibration, which was related to triterpenes [28]. The band at 1704 cm −1 was associated with C=O group of esters [29,30]. The band at 891 cm −1 was assigned to the bending vibration of the C=CH 2 functional group [28]. The peak at 1259 cm −1 may be related to the amide III band [31]. In total, FTIR spectrum reflected comprehensive structural information of components in M. cocos samples, like triterpenes, polysaccharides, and so on.

Quantitative Analysis of Five Triterpene Acids
The content of each triterpene acid was calculated by their calibration curves and result of the validation of quantitative method was presented in Tables S1 and S2. The calibration curves of five compounds showed good linearity (R 2 ≥ 0.99). Recovery rates calculated by the standard addition method varied from 96.32% to 106.4%. Values of relative standard deviation (RSD) of intra-day and inter-day precision were less than 1.24% and 5.68%, respectively. RSDs of repeatability did not exceed 5.95% after analyzing six solutions from the same sample in parallel. RSDs of stability were less than 0.71% after detecting a sample solution at 0, 6, 12, 17, 21 and 24 h, respectively. The method validation above indicated that the quantitative analysis was feasible. In particular, due to the obvious difference in the contents of poricoic acid A in Poria and Poriae Cutis samples, the calibration curves in two concentration ranges were prepared separately.
Contents of five triterpene acids were displayed as box-plot given in Figure 2. One-way analysis of variance was computed by SPSS 21.0 software (IBM Corporation, Armonk, NY, USA) to display the difference among eight cultivated locations. A value of p < 0.05 was considered significant. Poricoic acid A contents of Mengmeng were significantly different from those of Beicheng, Tuodian and Zhanhe in inner parts, and Yongping in cutis samples. Contents of dehydropachymic acid and pachymic acid in inner parts from Caodian were higher than those of other geographical origins except for Baliu. Inner parts from Baliu showed higher contents of dehydropachymic acid than those from Beicheng, Dawen and Mengmeng, and higher contents of pachymic acid than those from Tuodian, Yongping, Beicheng and Mengmeng. Inner parts from Dawen contained fairly low contents of dehydrotrametenolic acid compared with those from others with the exception of Baliu. Compared with epidermis samples from Dawen, Beicheng and Yongping showed higher contents of dehydrotumulosic acid, and Caodian and Baliu presented higher amount of pachymic acid. From the results, it was found that it was difficult to distinguish M. cocos samples from eight cultivation origins just in terms of contents of several target compounds. Therefore, it was necessary to take full advantage of the chromatographic fingerprint, namely, the intensity data for each retention time, to extract more information related to cultivation location.

Chromatographic Data Preprocessing
The chromatograms recorded at 242 nm in Figure S1 were obtained by analyzing the solution from the same sample five times successively within a day and on two consecutive days. Obviously, the retention time of each peak shifted in two days, which was inconvenient for the qualitative results of chemometric analyses. Hence, all of the chromatographic data should be aligned prior to further analysis.
The correlation optimized warping algorithm proposed by Skov et al. [32] was used to correct the retention time shifts among samples. The chromatogram that was most similar to all others was selected to be the reference chromatogram for alignment. The global search space was set to a combination of segment length from 10 to 200 and a slack size from 1 to 20 according to the observed peak widths and shifts on the chromatograms. Then the optimal combination of segment length slack size was automatically selected according to the criterion of well alignment while at the same time considering the preservation in peak shape and area. The theory for the algorithms with respect to the automated alignment of chromatographic data can be consulted in [32].
As a result, suitable combinations of segment length and slack size were achieved for chromatographic data at 242 nm of Poria (197 and 11), 210 nm of Poria (105 and 16), 242 nm of Poriae Cutis (105 and 11) and 210 nm of Poriae Cutis (198 and 16), respectively. Figure 3 presented the aligned M. cocos chromatographic fingerprints using these warping parameters, which displayed that the retention time shifts were properly corrected. What's more, it was observed that chromatograms of the same medicinal part recorded at 242 nm and 210 nm showed complementary information, i.e., some peaks obviously presented in liquid chromatograms at 242 nm (LC 242 ) and some compounds just displayed in liquid chromatograms at 210 nm (LC 210 ). Further, chromatograms of two parts were appreciably different. In other words, multiple chromatographic profiles presented abundant chemical information of M. cocos that probably facilitated to confirm cultivation areas.
The chromatographic data of one Poria sample and one Poriae Cutis sample could be represented as 7201 and 7801 data points, respectively. In order to save the time for calculation, the number of data points in the retention time dimension of the matrix was reduced by taking one in every three points without affecting the chromatographic features. Therefore, 2401 and 2601 data points were obtained after reducing data, respectively. It was proved that this method was feasible by comparing the PLS-DA results since reducing data had little influence on identifying different groups (Table S3). Additionally, the first 11 min data in the chromatogram that mainly comprised unseparated peaks and baseline shift ( Figure 3), which were discarded to obtain representative fingerprints and accurate results. In this way, the final data points were 1960 and 2160, respectively. Molecules 2019, 24, x 6 of 16

PLS-DA Using Chromatograms and FTIR Spectra
Partial least squares discriminant analysis is a widely-used linear classification method [33][34][35][36]. The selection of the optimal number of latent variables was an essential question for PLS-DA model, which was implemented on the basis of 7-fold cross validation procedure in present study. Unit variance scaling, which could give all variables of the same or different measurements equal importance, was performed by default when developing each PLS-DA model. The parameters of classification models were shown in Table 1 and Tables S4-S6 in detail.
Based on the preprocessing of chromatograms and FTIR spectra, a model of PLS-DA was established using the single dataset (Tables 1 and S4). The LC210 dataset of Poriae Cutis samples did not build model successfully, so results of classification were not listed. FTIR and LC242 datasets showed better performance with higher accuracy not only in calibration set but in validation set than LC210 dataset. The sensitivity values of class 2 and class 8 in the validation set were 1 for Poria LC242 model and were smaller values for the Poria FTIR model, which indicated that LC242 model had stronger ability to correctly recognizing samples of class 2 and class 8. While the sensitivity of class 1 and 7 in calibration set was 0.8571 for Poria LC242 model smaller than that of Poria FTIR model, indicating that FTIR model had stronger ability to correctly recognizing samples of class 1 and class 7. Moreover, LC models of Poriae Cutis samples presented poorer results than those of Poria samples, which reflected the difference of two medicinal parts of M. cocos.

PLS-DA Using Chromatograms and FTIR Spectra
Partial least squares discriminant analysis is a widely-used linear classification method [33][34][35][36]. The selection of the optimal number of latent variables was an essential question for PLS-DA model, which was implemented on the basis of 7-fold cross validation procedure in present study. Unit variance scaling, which could give all variables of the same or different measurements equal importance, was performed by default when developing each PLS-DA model. The parameters of classification models were shown in Table 1 and Tables S4-S6 in detail.
Based on the preprocessing of chromatograms and FTIR spectra, a model of PLS-DA was established using the single dataset (Table 1 and Table S4). The LC 210 dataset of Poriae Cutis samples did not build model successfully, so results of classification were not listed. FTIR and LC 242 datasets showed better performance with higher accuracy not only in calibration set but in validation set than LC 210 dataset. The sensitivity values of class 2 and class 8 in the validation set were 1 for Poria LC 242 model and were smaller values for the Poria FTIR model, which indicated that LC 242 model had stronger ability to correctly recognizing samples of class 2 and class 8. While the sensitivity of class 1 and 7 in calibration set was 0.8571 for Poria LC 242 model smaller than that of Poria FTIR model, indicating that FTIR model had stronger ability to correctly recognizing samples of class 1 and class 7. Moreover, LC models of Poriae Cutis samples presented poorer results than those of Poria samples, which reflected the difference of two medicinal parts of M. cocos. Variable importance for the projection (VIP) plot [37] was used for assessing the significance of variable, and that the VIP score of retention time was greater than one means the compound separated at the time was important on distinguishing different cultivation origins. As an example of the Poria LC 242 model, there were lots of variables whose VIP were higher than one including the corresponding retention time of poricoic acid A and dehydrotrametenolic acid (Figure 4). It indicated that the potential of the chromatographic fingerprint from the aspect of origin identification was greater than that of the contents of several compounds. However, all single technique models did not achieve a perfect performance, so it was necessary to carry out the data fusion strategy that was expected to enhance the classification and prediction ability of the model.

Low-level Data Fusion
2.5.1. PLS-DA of Poria Figure 5 was the workflow of geographical authentication using data fusion, which was helpful to understand how data was combined. As shown in Table 1, accuracy rates of low-level data fusion datasets about Poria samples were 100% and higher than those of single technique models except for the model using LC242-210 data, which implied that these models had strong classification performance. The highest R 2 (cum) (0.9599) and Q 2 (cum) (0.7917) were observed in FTIR-LC242 model, indicating a high goodness of fit for the established model in the data and good predictive ability. Therefore, the combination of FTIR and LC242 datasets was deemed a suitable strategy, and the fusion of three datasets was unnecessary and verbose. Furthermore, compared with the LC242-210 model, the accuracy of FTIR-LC210 model was higher both in calibration and validation sets. It could be interpreted that FTIR dataset provided more helpful information to identify eight geographical origins than LC242 dataset in data fusion model of Poria samples. By analogy, it was found that FTIR data showed more contribution for origin discrimination than LC210 data when compared LC242-210 model with FTIR-LC242 model.   Figure 5 was the workflow of geographical authentication using data fusion, which was helpful to understand how data was combined. As shown in Table 1, accuracy rates of low-level data fusion datasets about Poria samples were 100% and higher than those of single technique models except for the model using LC 242-210 data, which implied that these models had strong classification performance. The highest R 2 (cum) (0.9599) and Q 2 (cum) (0.7917) were observed in FTIR-LC 242 model, indicating a high goodness of fit for the established model in the data and good predictive ability. Therefore, the combination of FTIR and LC 242 datasets was deemed a suitable strategy, and the fusion of three datasets was unnecessary and verbose. Furthermore, compared with the LC 242-210 model, the accuracy of FTIR-LC 210 model was higher both in calibration and validation sets. It could be interpreted that FTIR dataset provided more helpful information to identify eight geographical origins than LC 242 dataset in data fusion model of Poria samples. By analogy, it was found that FTIR data showed more contribution for origin discrimination than LC 210 data when compared LC 242-210 model with FTIR-LC 242 model. datasets was unnecessary and verbose. Furthermore, compared with the LC242-210 model, the accuracy of FTIR-LC210 model was higher both in calibration and validation sets. It could be interpreted that FTIR dataset provided more helpful information to identify eight geographical origins than LC242 dataset in data fusion model of Poria samples. By analogy, it was found that FTIR data showed more contribution for origin discrimination than LC210 data when compared LC242-210 model with FTIR-LC242 model. Figure 5. The workflow of geographical authentication using data fusion. Figure 5. The workflow of geographical authentication using data fusion.

PLS-DA of Poriae Cutis
The accuracy of FTIR-LC 242 and FTIR-LC 242-210 models was 100%, which was greater than that of the models using the independent technique. It indicated the effectiveness of low-level data fusion. The similar Q 2 (cum) of FTIR-LC 242 and FTIR-LC 242-210 models was observed. Accordingly, FTIR-LC 242 was considered as a preferred combination, and the fusion of three datasets was superfluous. Furthermore, the Q 2 (cum) values of low-level fusion models about Poriae Cutis samples (≤ 0.7032) were less than those of corresponding models about Poria samples (> 0.75), indicating that Poria samples were more suitable for origins identification than Poriae Cutis species. In the developing LC 242-210 and FTIR-LC 210 low-level models, latent variables could not be calculated so the models were not successfully built. It was in consistent with the state that epidermis LC 210 dataset did not built PLS-DA model, which was probably attributed by a lot of irrelevant classification information included in LC 210 dataset of epidermis.

PLS-DA of Combination Data of Two Medicinal Parts
Both FTIR and LC 242 datasets of two parts samples showed better performance than LC 210 dataset, which was in accordance with the results of single technique mentioned above. Compared with the single spectrum or chromatogram, data fusion of two medicinal parts proved more advantageous with greater sensitivity, specificity and efficiency. Therein, the FTIR fusion model of two part samples presented the best prediction performance from the Q 2 (cum) point of view. What's more, compared with FTIR-LC 242 model of Poria samples, the Q 2 (cum) of LC 242 fusion model of two parts was smaller. It could be interpreted that Poria FTIR dataset provided more helpful information to predict different geographical origins than Poriae Cutis LC 242 dataset in data fusion model. By analogy, it was found that the contribution of FTIR dataset was always more than that of LC 242 and LC 210 datasets in low-level data fusion. The low-level data fusion strategy has achieved a good classification result, but the mid-level data fusion could spend less computation time compared to the low level. Therefore, mid-level fusion was performed.

The Extraction of Feature Variables
Mid-level fusion needed to first extract relevant features from each dataset independently and then concatenated them into a new matrix employed for origins identification. Principal component analysis (PCA) is a dimension reduction technique that creates a small number of new variables called principal components (PCs) from a large number of original variables, which would be applied to extract features. These PCs almost retain the same information as the original variables [38]. The optimal number of PCs was determined by 7-fold cross-validation procedure. The results of feature extraction were shown in Table S7. As an example of LC 210 dataset of Poria samples, the first thirteen PCs were extracted, which account for 90.92% of the information concerning the original variables. Then the scores of the thirteen PCs were used for data fusion.

PLS-DA of Poria
In agreement with the results of low-level data fusion, the accuracy rates of FTIR-LC 242 and FTIR-LC 242-210 of Poria samples were 100% not only in calibration set but in validation set. And they had stronger recognition performance with higher sensitivity, specificity, efficiency than corresponding single dataset. Nonetheless, all Q 2 (cum) values of mid-level data fusion models of Poria samples were less than those of low-level data fusion models, indicating that low-level fusion presented stronger prediction ability than mid-level fusion according to cross validation.
As always, The LC 242-210 fusion model did not build successfully. The fusion of LC 242 and LC 210 could not gain satisfactory discrimination and even could not construct the model, and it was likely caused by the similar chemical information provided by both chromatograms. Although they presented different peak shapes, there were many common chromatographic peaks that did not provide complementary and useful information.

PLS-DA of Poriae Cutis
LC 242-210 model that was not built successfully in low-level fusion finished construction in mid-level fusion. The fact indicated the significance of mid-level data fusion and might be due to the feature extraction. The accuracy rates of FTIR-LC 210 and FTIR-LC 242-210 models were equal, but the detail of incorrect identification was different from sensitivity and specificity points of view. Further analysis showed that one sample belonging to Tuodian was judged as the sample from Baliu in FTIR-LC 210 model and Mengmeng in FTIR-LC 242-210 model by mistake, respectively. FTIR-LC 242 and FTIR-LC 242-210 mid-level fusion models of Poriae Cutis samples presented poorer results than those of Poria samples as well as low-level data fusion models and FTIR model of epidermis samples.

PLS-DA of Combination Data of Two Medicinal Parts
Both FTIR data fusion and LC 242 data fusion of two medicinal parts had stronger recognition ability when compared to the LC 210 combination. Both LC 242 and LC 210 of two medicinal parts improved performance of single LC 242 and LC 210 models. However, the result of FTIR was the opposite. Compared to low-level data fusion, the identification ability of mid-level data fusion did not show any obvious advantage. This might be due to the limitation of our method of feature extraction. In terms of FTIR datasets, only more than 73.29% original information (Table S7) was extracted.
To validate the performance of the PLS-DA model, a 30-iteration permutation test was performed. As shown in Figure S2  The results showed that all the PLS-DA models were not overfitting.

Samples
Seventy-eight intact cultivated M. cocos sclerotia ( Figure 6) from eight geographical origins of Yunnan Province, China were collected and identified by Prof. Yuanzhong Wang (Institute of Medicinal Plant, Yunnan Academy of Agricultural Sciences, Kunming, China). Voucher specimens (FL20160217) were deposited in the herbarium of Institute of Medicinal Plant, Yunnan Academy of Agricultural Sciences. After digging sclerotium up, the soil was brushed away. Fresh M. cocos sclerotium was air-dried in the shade and then peeled. Both the epidermis and inner part of the dried sclerotium, i.e., Poria and Poriae Cutis, were powdered to a homogeneous size using pulverizer and sieved through No. 60 mesh sieve. The powder was stored in the airproof, dry and dark condition prior to analysis. Detailed information of samples was summarized in Table 2.

FTIR Spectra Acquisition
A Fourier transform infrared spectrometer from Perkin Elmer equipped with an attenuated total reflectance (ATR) sampling accessory with a diamond focusing element was employed for FTIR spectroscopy measurement. The sample powder was pressed under a consistent pressure with a pressure tower when collecting spectral. FTIR spectrum of each sample was scanned 16 times successively with a resolution of 4 cm −1 in the range of 4000-650 cm −1 . After the measurement of one sample was finished, the surface of ATR crystal and the apex of pressure tower were cleaned for the next sample detection. All spectra were background corrected utilizing air spectrum. The laboratory environment was maintained a constant temperature (25 • C) and humidity (30%).

Chromatographic Analysis
Sample powder was weighed accurately to 0.5 g and extracted with 2.0 mL of methanol by an ultrasound-assisted method for 40 min at ambient temperature. The extract solution was filtered using a 0.22 µm membrane filter. The filtrate was loaded into the auto-sampler vial and stored at 4 • C before injecting into the chromatographic system for analysis.
Analyses of all 156 samples (including Poria and Poriae Cutis) were implemented using a Shimadzu ultra-fast liquid chromatography system equipped with a UV detector, binary gradient pumps, a degasser, an auto sampler and a column oven. The chromatographic separation was achieved using an Inertsil ODS-HL HP column (3.0 × 150 mm, 3 µm particle size) operated at 40 • C. The mobile phase consisted of acetonitrile (A) and 0.05% formic acid (B). Before use, the mobile phase constituents were degassed and filtered through a 0.2 µm filter. The gradient elution sequence was conducted as follows: 0-25 min, 40% A; 25-52 min, 40-69% A; 52-56 min, 69-72% A; 56-58 min, 72-78% A; 58-58.01 min, 78-90% A; and 58.01-60 min, remaining at 90% A (eluting to 65 min for Poriae Cutis samples). Each run was followed by an equilibration period of 3 min with initial conditions (40% A and 60% B). The flow rate was kept at 0.4 mL·min −1 and the injection volume was 7 µL. Detective wavelengths were set at 242 nm and 210 nm.

Method Validation
The developed UFLC method was validated in terms of precision, stability, repeatability, accuracy and linearity under the above chromatographic condition.
A mixed standard solution was determined six times successively within a day and on three consecutive days for evaluating intra-and inter-day precision. For the stability test, the extract of a sample was analyzed at 0, 6, 12, 17, 21 and 24 h, respectively. Six sample solutions prepared individually from the same sample were analyzed in parallel for evaluating the repeatability. The recovery test was performed to evaluate the accuracy by adding reference compounds of three different amounts (low, middle, and high) to the sample with known concentration accurately. The following equation was used to calculate recovery rate: Recovery rate (%) = [(measured amount − original amount)/spiked amount] × 100%.

Preprocessing of Chromatograms and Spectra
The correlation optimized warping algorithm was applied to correct the retention time shifts of chromatogram using MATLAB software (MathWorks, R2017a, Natick, MA, USA). Then the corrected chromatographic data was reduced by taking one in every three points without affecting the chromatographic features to save computation time, which was inspired by the 'data binning' of Lucio-Gutiérrez et al. [22,23]. The first 11 minutes of data that mainly comprised unseparated peaks and baseline shift were discarded.
Raw FTIR spectra were subjected to advanced ATR correction to reduce the impact of skewing of band intensity using OMNIC 9.7.7 software (Thermo Fisher Scientific). Due to the fact that spectra contained hidden and overlapped absorption peaks, second derivative was used for highlighting slight differences employing SIMCA-P + 13.0 software (Umetrics, Umeå, Sweden). Derivative spectra were calculated with a Savitzky-Golay filter using a second-order polynomial and a 15-point window. The band of 2600-1750 cm −1 was associated to diamond crystal in ATR accessory, of which data were excluded prior to chemometrics analysis. These pre-processed data were used to data fusion and PLS-DA.

Multiple Chromatograms and Spectra Data Fusion
According to the source of data, there were two kinds of data fusion techniques, including the fusion of multiple complementary pieces of information about a single part and the fusion of information about two parts from one sclerotium. For instance, data matrices of LC-Poria and FTIR-Poria could be fused into a new dataset, and data matrices of FTIR-Poria and FTIR-epidermis could be fused into a dataset. It was important to note that information must correspond in the process of data fusion, namely, the LC and FTIR data of the same Poria sample must correspond, or the FTIR data of inner parts and epidermis from the same sclerotium should correspond.
The data fusion could be classified into three levels in light of the combination of data: low level, mid-level and high level. Low and mid-level fusion has been widely used, and was applied to the identification of geographical origin of M. cocos. The scheme of low and mid-level data fusion approaches is shown in Figure 7. In the low-level fusion, pre-processed different datasets were straightforward concatenated into a matrix, and the number of variables was equal to the sum of number of original variables. For the mid-level fusion, the scores obtained independently from different data by PCA were concatenated into a dataset applied for provenance traceability, and the number of variables of the dataset was significantly less than that of original variables. Compared with low level, mid-level data fusion could save more time on the operation. Specific types of the data fusion in this study were shown in Table 1.
straightforward concatenated into a matrix, and the number of variables was equal to the sum of number of original variables. For the mid-level fusion, the scores obtained independently from different data by PCA were concatenated into a dataset applied for provenance traceability, and the number of variables of the dataset was significantly less than that of original variables. Compared with low level, mid-level data fusion could save more time on the operation. Specific types of the data fusion in this study were shown in Table 1.

Evaluation of Model Performance
The calibration and validation sets were selected for assessing the quality of model. The calibration set was used to construct a model that was performed 7-fold cross validation for internal validation, and the validation set was used to externally estimate the practicability of model. To avoid the influence of randomness caused by random sampling, and to obtain a representative calibration set from a pool of samples, the Kennard-Stone algorithm [39] was performed to systematically divide dataset of 78 samples into calibration (52) and validation (26) sets using MATLAB R2017a software (MathWorks).
The performance of discrimination model could be evaluated by sensitivity, specificity and efficiency [40]. The three parameters are dependent on these values: true positive (TP), false positive (FP), true negative (TN) and false negative (FN). TP and TN represent the correctly identified samples

Evaluation of Model Performance
The calibration and validation sets were selected for assessing the quality of model. The calibration set was used to construct a model that was performed 7-fold cross validation for internal validation, and the validation set was used to externally estimate the practicability of model. To avoid the influence of randomness caused by random sampling, and to obtain a representative calibration set from a pool of samples, the Kennard-Stone algorithm [39] was performed to systematically divide dataset of 78 samples into calibration (52) and validation (26) sets using MATLAB R2017a software (MathWorks).
The performance of discrimination model could be evaluated by sensitivity, specificity and efficiency [40]. The three parameters are dependent on these values: true positive (TP), false positive (FP), true negative (TN) and false negative (FN). TP and TN represent the correctly identified samples in target positive and negative classes, respectively. By analogy, FP and FN represent the incorrectly identified samples in positive and negative classes, respectively.
Efficiency = sensitivity × specificity Therein, sensitivity shows the ability to correctly recognize samples belonging to the target class while specificity reflects the model ability to reject samples belonging to all other classes. The measure combining the sensitivity and specificity value is called efficiency.
In addition, the accuracy rate of calibration set, the accuracy rate of validation set, R 2 (cum) and Q 2 (cum) were also employed for assessing the classification performance. Accuracy was obtained by calculating the proportion of correctly classified samples in the total amount of calibration set (or validation set) samples. R 2 is calculated by following equation: R 2 = 1 − RSS/SSX, where RSS is the residual sum of squares of calculated and measured values, and SSX is the total sum of squares after mean centralization [41]. R 2 (cum) represents the percentage of explained variance for a defined number of latent variables, indicating how well the model fits the data. Q 2 (cum) represents the cross-validated cumulative R2, suggesting how well the model predicts new data. The higher values of these parameters (close to 1 or 100%), the better performance of model.

Conclusions
In order to establish an effective method for geographical authentication of M. cocos, two data fusion strategies, including low and mid-level fusion, as well as two data combinations, including the fusion of complementary information regarding a single part and the fusion of information about two parts from one sclerotium were compared. FTIR, LC 242 and LC 210 were used to characterize the epidermis and inner part of M. cocos sclerotium from different places individually and jointly. The results showed that, chromatographic fingerprint was more suitable than content data of five triterpene acids for origin identification. In the fusion of complementary information about single part, good classification performance was achieved obtained by merging LC 242 chromatograms and FTIR spectra in low-level fusion way. In the fusion of information about two parts from one sclerotium, the predictive ability of the FTIR low-level fusion model of two parts was the most satisfactory, and all analyzed samples were classified correctly.
In most cases, FTIR proved to be more efficient than LC 242 and LC 210 , not only in a single data source but in data fusion. Mid-level data fusion was slightly worse than low-level data fusion. The performance of low-level data fusion models was superior to single technique models. Moreover, Poria samples were more suitable for origin identification than Poriae Cutis samples. On the basis of effective and comprehensive fingerprint information, the low-level data fusion strategy could be used for the discrimination of M. cocos samples from different origins with the aid of appropriate mathematical algorithms.