Untargeted Metabolomics Studies on Drug-Incubated Phragmites australis Profiles

Plants produce a huge number of functionally and chemically different natural products that play an important role in linking the plant with the adjacent environment. Plants can also absorb and transform external organic compounds (xenobiotics). Currently there are only a few studies concerning the effects of xenobiotics and their transformation products on plant metabolites using a mass spectrometric untargeted screening strategy. This study was designed to investigate the changes of the Phragmites australis metabolome following/after diclofenac or carbamazepine incubation, using a serial coupling of reversed-phase liquid chromatography (RPLC) and hydrophilic interaction liquid chromatography (HILIC) combined with accurate high-resolution time-of-flight mass spectrometer (TOF-MS). An untargeted screening strategy of metabolic fingerprints was developed to purposefully compare samples from differently treated P. australis plants, revealing that P. australis responded to each drug differently. When solvents with significantly different polarities were used, the metabolic profiles of P. australis were found to change significantly. For instance, the production of polyphenols (such as quercetin) in the plant increased after diclofenac incubation. Moreover, the pathway of unsaturated organic acids became more prominent, eventually as a reaction to protect the cells against reactive oxygen species (ROS). Hence, P. australis exhibited an adaptive mechanism to cope with each drug. Consequently, the untargeted screening approach is essential for understanding the complex response of plants to xenobiotics.

The experiment setup was inserted into the access database. Each run has a unique numerical ID from 1 to 432. Masses, RTs, and an abundance of compounds from each run were merged to the ID by inserting them into the access database. The internal Access programming helped to visualization and check that all the data was correctly inserted. This was done by reviewing the graphs and the corresponding pre documented additional data like (Solvent, part of the plant …..). After this from the dataset, a pivot matrix data table was calculated.
The pivot data table was arranged that masses@Rt are used as rows while the columns documented the accompanied abundance. The algorithm to attach the abundance to mass@Rt used the first value of abundance. Different approaches of mapping were tested such as (Mean, Average, First Value, Last Value…) but turned out to process a similar outcome in further analysis. Therefore, combination mass@Rt has occurred only in a unique combination with abundance.
Because of the limited possible number of columns (IDs) in Access the data table needed to be split into single excel files via a script before reunion it in SIMCA.
After the Excel, files with all the runs were merged in SIMCA the data table needed to be transposed to treat the different mass@RT as variables (Columns) and the different runs as observations (Rows).
After the union and transposing the data, the additional documentation like (Solvent, part of the plant …..) was pasted into SIMCA as well.
In the statistical software (SIMCA), the additional information (Solvent, part of the plant …..) were defined as secondary observations. This means that this information is not used within the developed models as variables.
As expected, some variables Mass@RT weren't found in all observation runs. Accordingly, the pivot table used to merge the data did document this with missing data. This is not very helpful in analyzing the data because the statistical software would see this data as "missing". Instead of "no occurrence". To put this right the empty cells were replaced with zeros After this, the data was used to build the first model. In this starting PCA analysis, the untreated data was stored as a reference and to start the basic analysis with further models. This is an especially important step to get an overview of the overall pattern in the data. The most important tools are a.) the score scatters plot which presents the consistency of the data using the uses the hosteling ellipse. This ellipse represents a 95% confidence interval in the multidimensional space. Observation outside this ellipse is remarkably interesting and needs to be investigated. Sometimes these Observations could also be identified as outliers.
Also the DModX "Distance to the Model in X" could give insights about the portion of the Variance (Predicted -Observed) which couldn't be described by the Model, to get a better understanding of what the model is capable of and what might be very unlikely and need more detailed investigation.
In the data, no anomalies or outliers have been found.
Within the first analysis, the underlying correlation pattern is represented with clusters in the score scattered plot, who summarized the information of all investigation runs in each one data point. With the help of the 2 nd observations, the data set can be colored accordingly to check if the for-instance solvent or part of the plant does have significant uniqueness to expose the observation in one of the clusters. This is an easy way to analyze the clusters using the secondary information of documentation without considering the extra information to build the model. Then, the OPLS-DA model was built as illustrated in the results part.

Metabolomics data analysis
The DMF of Phragmites australis assigned with the OPLS-DA and S-plots were extracted. The extracted data were returned to the original data. It is impossible to identify a pathway depend on just a mass. To get around this issue, a key concept is to shift the unit of analysis from individual compounds to individual pathways or a group of functionally related compounds. The mummichog algorithm is the first implementation of this concept to infer pathway activities from a ranked list of MS peaks identified by untarget metabolomics. The original algorithm implements an over-representation analysis (ORA) method to evaluate pathway-level enrichment based on significant features assigned with the statistical analysis. Users need to specify a pre-defined cutoff based on p-values. For further details about the original implementation, please refer to Li et al. 2013. The mass accuracy was set to 5 ppm on the positive mode. The p-value cutoff was assigned to 0.05 to delineate between significantly enriched and non-significantly enriched features. Table S1. The standards compounds of the quality control external calibration mixture, monoisotopic mass in the literature (L), monoisotopic in different injection and the mean of them, the variation between monoisotopic mass in the literature (L), and the mean of measured monoisotopic mass, and mean mass standard deviation (SD) are listed.   (d) Figure S2. The Q 2 / R 2 Overview plot displays the individual cumulative R 2 (green columns) and Q 2 (blue columns) and Q 2 for the goodness of fits and cross-validation parameters (a) P. australis different parts.  Figure S3. EICs were corresponding to measured diclofenac (right) and the reference standard (left), which were identified in the extracts of Phragmites australis leaf, rhizome, and roots incubated with 10 and 100 μM diclofenac. Also, EICs relative to transformed products are suspected in the extracts of Phragmites australis leaf, rhizome, and roots incubated with 10 and 100 μM diclofenac.

DM_6 DM_7
Supplementary information to the manuscript S13  Figure S4. EICs were corresponding to carbamazepine (CBZ) and its transformed product standards (left), which were identified in the extracts of Phragmites australis leaf, rhizome, and roots incubated with 10 and 50μM carbamazepine (measured right).