Statistical Integration of ‘Omics Data Increases Biological Knowledge Extracted from Metabolomics Data: Application to Intestinal Exposure to the Mycotoxin Deoxynivalenol

The effects of low doses of toxicants are often subtle and information extracted from metabolomic data alone may not always be sufficient. As end products of enzymatic reactions, metabolites represent the final phenotypic expression of an organism and can also reflect gene expression changes caused by this exposure. Therefore, the integration of metabolomic and transcriptomic data could improve the extracted biological knowledge on these toxicants induced disruptions. In the present study, we applied statistical integration tools to metabolomic and transcriptomic data obtained from jejunal explants of pigs exposed to the food contaminant, deoxynivalenol (DON). Canonical correlation analysis (CCA) and self-organizing map (SOM) were compared for the identification of correlated transcriptomic and metabolomic features, and O2-PLS was used to model the relationship between exposure and selected features. The integration of both ‘omics data increased the number of discriminant metabolites discovered (39) by about 10 times compared to the analysis of the metabolomic dataset alone (3). Besides the disturbance of energy metabolism previously reported, assessing correlations between both functional levels revealed several other types of damage linked to the intestinal exposure to DON, including the alteration of protein synthesis, oxidative stress, and inflammasome activation. This confirms the added value of integration to enrich the biological knowledge extracted from metabolomics.


Introduction
Mycotoxins are toxic fungal secondary metabolites frequently found as contaminants of food and feed. Among them, deoxynivalenol (DON) is very a prevalent mycotoxin in cereals and cereal products [1,2]. It is one of the most frequently occurring contaminants in human and animal diets. The intestine constitutes the first biological barrier to ingested toxics, and is therefore the first target of DON [3]. Several studies have demonstrated the adverse effects of DON on the intestine, such as impaired immune function, the inhibition of intestinal nutrient absorption, and altered intestinal cell and barrier functions [3][4][5]. Due to their cereal-rich diet, pigs are particularly exposed to DON. In addition, they can be considered good models for extrapolating to humans, with a digestive physiology very similar to that of humans [3]. Pigs therefore constitute a well-suited model to assess the effects of DON on intestinal health.
Since the metabolome responds to stress long before standard biomarkers do, metabolic fingerprinting was previously used to address the subacute exposure to DON at a con-centration that is likely to occur in food or feed [6,7]. The effects of low doses of DON, like those of many other toxicants, are often subtle, and confounding variability (due to experimentation, analytics, or physiology) can hamper variations caused by the toxicological exposure. It is therefore difficult to adjust models discriminating control from exposed individuals using the metabolomic data matrix alone [8]. Consequently, only a few metabolites were found to be discriminant in the above-mentioned study, and the extracted biological knowledge was quite limited.
Taking advantage of recent advances in high-throughput analytical techniques, many studies integrate the use of different 'omics platforms on the same samples, which makes it possible to have a set of data at the genome scale, while being able to measure biomolecules at the different functional levels of the studied organisms [9,10]. This 'omics data fusion allows us to handle a biological system as a whole to gain insight into its complex functioning through the identification of more informative models [11]. The main statistical methods used to integrate several 'omics datasets are the penalized canonical correlation analysis (CCA) [12,13] and self-organizing map (SOM) [14,15], used to compute correlations between blocks, and the multi-block generalizations of partial least squares (PLS) to fit a model between a biological factor and 'omics profiles [16][17][18][19][20][21][22][23].
This study aimed to assess whether the NMR-based metabolomic modeling of DON exposure can be enriched by integrating complementary information contained in transcriptomic data, based on a combination of sparse CCA and SOM analysis, to select withinand between-blocks-correlated features, and the O2-PLS to model the selected features. Furthermore, we aimed at assessing the correlations between these two functional levels to identify genes and metabolites as markers of exposure to the mycotoxin. To this end, we generated transcriptomic and metabolomic datasets from pig intestinal explants exposed to a similar and low dose of DON.

Results
In the present study, metabolomic and transcriptomic data were generated from 16 jejunal explants treated (n = 8) or not (n = 8) with 10 µM DON for 4 h to assess the effect of this toxin at the intestinal level.

. Metabolomic Effect of DON on Intestinal Explants
We first analyzed effect of DON on the metabolome. Noisy NMR features were removed prior to analysis because the noise-related variability hampered us from seeing the biological variability and prevented us from fitting the models (NMR data used in this study are presented in file S1).
A valid and robust PLS-DA model (A = 2 latent components, percentage of explained variance: R2 = 95%; predictive capacity: Q2 = 0.91; permutation test: p = 0.005) was fitted on the Pareto-scaled metabolomic data. Figure 1 presents the projection of individuals onto the first latent plane. This score plot showed a clear separation between control and exposed individuals. Sixteen NMR features had a VIP (variable importance in the projection) value > 1.0 and were statistically different at the 5% threshold (false discovery rate-corrected p-value from the Wilcoxon test). These corresponded to three metabolites: alanine, lactic acid, and phosphocholine (see Figure 2).

Transcriptomic Effect of DON on Intestinal Explant
We next analyzed the transcriptomic response of the intestinal explants treated or not with 10 µM of DON for 4 h. Due to computational limitations, the number of transcriptomic features had to be limited and was set to 15,000, based on the highest standard deviation.
The analysis of Pareto-scaled transcriptomic data also generated a valid and robust PLS-DA model (A = 2 latent components, R2 = 97.3% and Q2 = 0.81, permutation test: p = 0.005). The score plot of the PLS-DA model showed a clear separation between the control and exposed individuals ( Figure 3). A total of 1468 probes had a VIP value > 1.0 and were significantly different at the 5% threshold (false discovery rate-corrected p-value from the Wilcoxon test), corresponding to 1094 upregulated and 374 downregulated genes. The

Transcriptomic Effect of DON on Intestinal Explant
We next analyzed the transcriptomic response of the intestinal explants treated or not with 10 µM of DON for 4 h. Due to computational limitations, the number of transcriptomic features had to be limited and was set to 15,000, based on the highest standard deviation.
The analysis of Pareto-scaled transcriptomic data also generated a valid and robust PLS-DA model (A = 2 latent components, R2 = 97.3% and Q2 = 0.81, permutation test: p = 0.005). The score plot of the PLS-DA model showed a clear separation between the control and exposed individuals ( Figure 3). A total of 1468 probes had a VIP value > 1.0 and were significantly different at the 5% threshold (false discovery rate-corrected p-value from the Wilcoxon test), corresponding to 1094 upregulated and 374 downregulated genes. The

Transcriptomic Effect of DON on Intestinal Explant
We next analyzed the transcriptomic response of the intestinal explants treated or not with 10 µM of DON for 4 h. Due to computational limitations, the number of transcriptomic features had to be limited and was set to 15,000, based on the highest standard deviation.
The analysis of Pareto-scaled transcriptomic data also generated a valid and robust PLS-DA model (A = 2 latent components, R2 = 97.3% and Q2 = 0.81, permutation test: p = 0.005). The score plot of the PLS-DA model showed a clear separation between the control and exposed individuals ( Figure 3). A total of 1468 probes had a VIP value > 1.0 and were significantly different at the 5% threshold (false discovery rate-corrected p-value from the Wilcoxon test), corresponding to 1094 upregulated and 374 downregulated genes. The cytokine genes CSF2, TNF, IL-17A, and IL-22, the genes encoding the interleukins 1 alpha and beta (IL-1α and IL-1β), chemokine ligand 20 (CCL20), and the Nuclear Factor Kappa B Subunit 1 (NF-κB1) were the most upregulated genes, as well as the prostaglandinendoperoxide synthase (PTGS). cytokine genes CSF2, TNF, IL-17A, and IL-22, the genes encoding the interleukins 1 alpha and beta (IL-1α and IL-1β), chemokine ligand 20 (CCL20), and the Nuclear Factor Kappa B Subunit 1 (NF-κB1) were the most upregulated genes, as well as the prostaglandin-endoperoxide synthase (PTGS).

Fusion of Transcriptomic and Metabolomic Data
In order to extract more information from metabolomics, a statistical fusion of both 'omics datasets was performed. Robust sparse CCA (rsCCA) and dissimilarity-based SOM were first applied to select correlated features and O2-PLS was then performed using each set of selected features to model the relationship between DON exposure and 'omics profiles.

Fusion of Transcriptomic and Metabolomic Data
In order to extract more information from metabolomics, a statistical fusion of both 'omics datasets was performed. Robust sparse CCA (rsCCA) and dissimilarity-based SOM were first applied to select correlated features and O2-PLS was then performed using each set of selected features to model the relationship between DON exposure and 'omics profiles.   All the correlated transcriptomic (12) and metabolomic (12) features were found to be discriminant in the O2-PLS model. Metabolomic features corresponded to six identified metabolites (see Figure 2); the aspartate, methanol, and phosphocholine concentrations were lower in the control group, whereas glucose, glutamine, and tyrosine concentrations were higher in the control group.

Combination of Self Organizing Map and O2-PLS
A total of 11,069 transcriptomics and 154 metabolomics features were selected by the dissimilarity kernel-based SOM. The adjusted O2-PLS model based on these selected features explained half of the total variability and showed a very low error of prediction (0.01). Figure 5 presents the projection of individuals along the first latent variable of the fitted O2-PLS model for the transcriptomic (a) and metabolomic (b) datasets, respectively. In the case of the model based on features selected by dissimilarity kernel-based SOM, the observations were well discriminated according to the exposure. All the correlated transcriptomic (12) and metabolomic (12) features were found to be discriminant in the O2-PLS model. Metabolomic features corresponded to six identified metabolites (see Figure 2); the aspartate, methanol, and phosphocholine concentrations were lower in the control group, whereas glucose, glutamine, and tyrosine concentrations were higher in the control group.

Combination of Self Organizing Map and O2-PLS
A total of 11,069 transcriptomics and 154 metabolomics features were selected by the dissimilarity kernel-based SOM. The adjusted O2-PLS model based on these selected features explained half of the total variability and showed a very low error of prediction (0.01). Figure 5 presents the projection of individuals along the first latent variable of the fitted O2-PLS model for the transcriptomic (a) and metabolomic (b) datasets, respectively. In the case of the model based on features selected by dissimilarity kernel-based SOM, the observations were well discriminated according to the exposure.
The analysis revealed that 1443 transcriptomic features and 39 metabolites (among which 21 have been identified) were discriminant (see Figure 2, last line for metabolites). The concentrations of AMP, creatine, fumarate, glucose, glutamate, glutamine, glutathione, histidine, inosine, isoleucine, lactate, lysine, nicotinuric acid, phenylalanine, tryptophan, tyrosine, uracil, and uridine were higher in the control group, whereas the concentrations of asparagine, serine, and valine were lower in the control group. The use of the MetExplore metabolic network analysis tool (right-tailed Fisher test with Bonferroni's multiple test correction) showed that the metabolic pathways significantly enriched in this set of discriminant metabolites are primarily linked to the aminoacyl-tRNA biosynthesis and the metabolism of amino acids and 2-oxocarboxylic acids (Table 1). The analysis revealed that 1443 transcriptomic features and 39 metabolites (among which 21 have been identified) were discriminant (see Figure 2, last line for metabolites). The concentrations of AMP, creatine, fumarate, glucose, glutamate, glutamine, glutathione, histidine, inosine, isoleucine, lactate, lysine, nicotinuric acid, phenylalanine, tryptophan, tyrosine, uracil, and uridine were higher in the control group, whereas the concentrations of asparagine, serine, and valine were lower in the control group. The use of the MetExplore metabolic network analysis tool (right-tailed Fisher test with Bonferroni's multiple test correction) showed that the metabolic pathways significantly enriched in this set of discriminant metabolites are primarily linked to the aminoacyl-tRNA biosynthesis and the metabolism of amino acids and 2-oxocarboxylic acids (Table 1).

Discussion
Several studies have demonstrated the adverse effects of DON, which can be observed at different 'omics levels [24][25][26]. To the best of our knowledge, no study has combined two or more 'omics datasets to give a more global view of disruptions due to DON exposure.

Individual PLS-DA Modeling of 'Omics Data
Firstly, each 'omics dataset was individually analyzed using PLS-DA. PLS-DA is well adapted for 'omics data (high dimension and collinearities) because linear combinations of spectral features are constructed and few of these are used in the model. Three metabolites (alanine, lactic acid, and phosphocholine) participated in the discrimination of DON-exposed explants from control explants. This low number of metabolites indicated that, despite a clear separation of observations, little information can be extracted from the metabolomic data. This could result from variables containing confounding variability (experimental or instrumental variability for example), which masks biological variability [8].

Discussion
Several studies have demonstrated the adverse effects of DON, which can be observed at different 'omics levels [24][25][26]. To the best of our knowledge, no study has combined two or more 'omics datasets to give a more global view of disruptions due to DON exposure.

Individual PLS-DA Modeling of 'Omics Data
Firstly, each 'omics dataset was individually analyzed using PLS-DA. PLS-DA is well adapted for 'omics data (high dimension and collinearities) because linear combinations of spectral features are constructed and few of these are used in the model.
Three metabolites (alanine, lactic acid, and phosphocholine) participated in the discrimination of DON-exposed explants from control explants. This low number of metabolites indicated that, despite a clear separation of observations, little information can be extracted from the metabolomic data. This could result from variables containing confounding variability (experimental or instrumental variability for example), which masks biological variability [8].
To date, very few discriminant metabolites including those detected here have been reported in NMR-based metabolomics investigations of the alterations induced by DON To date, very few discriminant metabolites including those detected here have been reported in NMR-based metabolomics investigations of the alterations induced by DON in pigs [6,27,28]. Phosphocholine is a downstream metabolite in the catabolism of phosphatidylcholine. Cellular lactate predominantly stems from alanine and glucose through their conversion into pyruvate, and lactate homeostasis is primarily related to glucose metabolism [29]. Likewise, a crosstalk between glycolysis and the catabolism of phosphocholine has been shown for energy supply to the cell under specific situations [30]. These findings suggest that the DON exposure shifts the main energy source in the intestinal tissue from amino acids in physiological conditions [31] to glucose, and confirm the previously reported disturbance of energy metabolism as a prominent metabolic effect of the mycotoxin in pigs [28].
When looking at discriminant features of transcriptomic data, the top upregulated genes in the intestinal explants were mainly related to inflammation and immunity (cytokine genes CSF2, TNF, IL-17A, and IL-22; genes encoding the interleukins 1 alpha and beta chemokine ligand 20; and the Nuclear Factor Kappa B Subunit 1 were the most upregulated genes). Other inflammatory genes in the top upregulated category included the prostaglandin-endoperoxide synthase, also known as cyclooxygenase; the NR3C1 gene encoding a nuclear glucocorticoid receptor involved in inflammatory responses and cellular proliferation; and CXCL2 and CXCR4, which encode the C-X-C motif chemokine Ligand 2 and Receptor 4 expressed at sites of inflammation and involved in immunoregulatory and inflammatory processes. These observations are in line with previous publications pointing at inflammation as the main endpoint for the molecular toxicity of type B trichothecenes in the intestine [24,26,32,33].

Fusion of Transcriptomic and Metabolomic Data
Methodological issues encountered with the analysis of one single 'omics dataset are worsened by 'omics fusion. Some of these problems are related to the high dimensionality of matrices and biological noise. This latter can hinder the effects of the biological factor and may prevent evidencing the effects of the factor of interest. It is therefore essential to consider these problems when choosing the statistical method to apply for 'omics integration. CCA is designed to select correlated features. However, CCA fails due to ill-, or even un-, conditioned matrices in high dimensions. A regularized method may be applied to solve this issue. Another issue is with selecting relevant and meaningful correlated features from the many thousands of features present in the original datasets, among which many may be noisy. In this study, we firstly selected correlated features using robust sparse CCA or SOM methods prior to multi-block modeling. This selection step enabled us to decrease the number of features included in the O2-PLS-DA model. By clustering objects into a bidimensional grid, the SOM algorithm takes into account possible nonlinear relations between objects, which is a great advantage over other clustering algorithms. The SOM algorithm also enables us to preserve the topology of the original data, such as the correlation structure of features in this study [34]. Moreover, the orthogonal step of O2-PLS-DA allowed us to remove variation in noisy features, if selected by sparse CCA or SOM methods.

Combination of Robust Sparse CCA and O2-PLS
Twelve transcriptomic and 12 metabolomic features were selected by rsCCA. This low number of selected features agreed with the simulation results (simulation study is detailed in file S2). rsCCA demonstrated very high values of specificity ( Figure S1), meaning that only correlated features were selected by this method. This low number of selected features could explain the fact that DON-exposed explants were not well discriminated from control explants in the O2-PLS-DA modeling in the sense that not only correlated features were markers of exposure in this study. Although the number of identified metabolites was greater than in the individual analysis of metabolomic block, this number was still very low and not sufficient to gain insight into dysregulations due to mycotoxin exposure at the organism level.

Combination of Kernel Dissimilarity-Based SOM and O2-PLS
The heterogeneity of measurements is another issue encountered when integrating 'omics data. To deal with this issue, we applied the SOM method to two kernel matricesnamely, dissimilarity and Gaussian kernels. A kernel matrix provides pairwise information between objects. Kernels are a widely used and flexible method to deal with com-plex data of various types [35] We only applied a dissimilarity kernel-based SOM to the biological data because this method showed higher values for sensitivity and specificity than Gaussian kernel-based or raw data-based SOM (Figures S1 and S2).
A total of 11,069 transcriptomic and 154 metabolomic features were selected by the dissimilarity kernel-based SOM. The kernel based-SOM method selected many more features than rsCCA. This result was expected from the results of the simulation study (see Figures S2 and S3): SOM methods showed a high number of selected features, very high values of sensitivity, and very low values of specificity, meaning that these methods were able to identify correlated features but did not select only the correlated features. By not selecting only the correlated features, the SOM method enabled us to catch some features that seemed to be markers of biological effect. This could explain the best discrimination obtained using the SOM-based model. On the other hand, the "orthogonal step" of the O2-PLS modeling enables us to remove noisy variability, reducing the influence of correlated features with a low biological relevance in the modeling step.
Metabolites found to be discriminant in the O2-PLS-DA modeling corroborate previous in vivo findings in the plasma and liver of piglets exposed to DON in our lab [6]. The synthesis of aminoacyl-tRNA and the metabolism of amino acids and 2-oxocarboxylic acids contribute to the global process of protein synthesis, which inhibition is a well-characterized molecular effect of trichothecene mycotoxins including DON on eukaryotic cells [36].
The fusion of transcriptomic and metabolomic data showed that the discriminant metabolite glutathione correlates positively with the gene expression of SOD2 and negatively with the gene expression of TXNIP. Gluthatione is an endogenous tripeptide that shields cellular macromolecules from oxidative stress by directly scavenging endogenous and exogenous oxidants or by recycling the antioxidant vitamins C and E [37]. The protein encoded by SOD2 is one of two isozymes responsible for destroying free superoxide radicals in the body. TXNIP encodes the thioredoxin-binding protein that inhibits the antioxidative function of thioredoxin. Oxidative stress holds an important place in the mechanisms of the toxicity described for DON in eukaryotic cells, with two reported effects for this mycotoxin being the generation of reactive oxygen species and the alteration of antioxidant status [33]. TXNIP also participates in the activation of the NLRP3 inflammasome, bridging oxidative stress to inflammation [38]. The connection between oxidative stress and inflammation is delineated in Figure 6 by the strong correlation between the expression of genes encoding pro-inflammatory and Th17 cytokines (IL-1α, IL-1β, IL17A, IL22, IL23A) and a chemokine (IL8), and the levels of amino acid metabolites asparagine and valine on the one hand and these metabolites and TXNIP expression on the other hand. The sensing of some amino acids including asparagine and valine has been shown to control intestinal inflammation via the integrated stress response mechanism [39,40]. We and others previously reported that DON triggers the transcription of IL1-b in the intestinal tissue [24,41]. This cytokine is produced in a proprotein form that needs to be processed to its active form by caspase 1. The activation of the NRLFP3 inflammasome by DON and the subsequent processing and secretion of cytokines belonging to the IL-1 family was later demonstrated [32]. The strong positive correlation between the expression of TXNIP and the level of asparagine-and valine, to a lesser extent-is consistent with the involvement of oxidative stress in the DON induced-activation of the NRLFP3 inflammasome.

Animals
All the experimental procedures were conducted under the approval of the French Ministry of Higher Education and Research (decision n • #6303_2016080314392462) and the Ethics committee of Pharmacology-Toxicology, Toulouse-Midi-Pyrénées in animal experimentation Toxcométhique (decision n • TOXCOM/0163/PP, 2 February 2017). Three authors (I.A.K., I.P.O., and P.P.) have an official agreement with the French Veterinary Services authorizing animal experimentation.

Treatment of Jejunum Explants
Eight 5-week-old piglets were used to generate jejunal explants. The piglets were weaned at 4 weeks and fed ad libitum prior to the experiment. Explants were prepared as already described [33]. Briefly, 5 cm middle jejunum segments were collected and flushed with William's Medium containing 4.5 g/L of glucose, 1% of penicillin/streptomycin, 0.5% of gentamycin, 10% FBS, and 30 mM of amino acids (Ala/Glu). Each segment was opened longitudinally, and multiple pieces of 6 mm diameter were obtained using biopsy punches (Centravet, Lapalisse, France).
DON was dissolved in dimethylsulfoxide (DMSO) at 60 mM, before dilution in complete culture media. Control samples were treated with equivalent concentrations of DMSO previously tested to be nontoxic for intestinal explants (data not shown). Each of two explant samples from each of eight piglets were deposited villi upward on biopsy sponges per well in 6-well plates (Cellstar, Greiner Bio-One, Frickenhausen, Germany) containing control or DON-contaminated (10 µM) medium. The explants were cultured at 39 • C under a CO 2 -controlled atmosphere with orbital shaking for 4 h. After treatment, the explants were frozen in liquid nitrogen and stored at −80 • C before transcriptional and metabolomics analysis. N = 8 control samples and n = 8 DON-contaminated were used for both transcriptomic and metabolomic analysis.

Transcriptomics
For gene expression analysis, total RNA was extracted using lysing matrix D tubes (MP Biomedicals, Illkirch, France) containing 1 mL of Extract-All reagent (Eurobio), following the manufacturer's instructions. The RNA concentration and purity were determined using a NanoDrop spectrophotometer (Labtech International, Paris, France). RNA quality was assessed using an Agilent Bioanalyzer (Agilent, Les Ulis, France), and the mean (±SD) RNA integrity number (RIN) was 7.23 ± 0.55.
Gene expression profiles were analyzed at the GeT-TRiX facility (GénoToul, Génopole Toulouse Midi-Pyrénées) using Agilent Sureprint G3 Porcinet 60K_DEC2011 microarrays (8 × 60 K, design 037880) following the manufacturer's instructions. For each sample, Cyanine-3 (Cy3)-labeled cRNA was prepared as described in Alassane-Kpembi et al. [33]. Microarray data and experimental details are available in NCBI's Gene Expression Omnibus [42] and are accessible through GEO Series accession number GSE165968. All the data analyses were performed using R (R Core Team, 2018 [43]) and the Bioconductor package [44], as described in GEO accession GSE165968. Raw data (median signal intensity) were filtered, log2-transformed, and normalized using the quantile method [45]. The transcriptomic matrix includes n = 16 observations and p = 41,436 features.
All the 1H NMR spectra were generated using a Bruker Avance III HD spectrometer (Bruker Biospin, Rheinstetten, Germany) operating at 600.13 MHz for proton resonance frequency using an inverse detection 5 mm 1H-13C-15N-31P cryoprobe. 1 H NMR spectra of aqueous explant extracts were acquired and processed as previously described [47] using 512 transients, a spectral width of 20 ppm, and a relaxation delay of 2 s. NMR spectra were phase-and baseline-corrected and then calibrated (TSP, 0.0 ppm) using the Topspin software (version 3.5, Bruker). Then, the NMR data were reduced using the AMIX software (version 3.9, Bruker) to integrate 0.01 ppm-wide regions corresponding to the δ 10.0-0.5 ppm. The 5.1-4.5 ppm region, which includes the water resonance, was excluded in the NMR spectra of aqueous extracts. A total of q = 753 NMR buckets were included in the data matrix. Normalization to the total spectral area was applied to each integration region to account for differences in the sample amount. The metabolomic matrix includes n = 16 observations and p = 753 features. To confirm the chemical struc-tures of the metabolites of interest, 2D 1 H-1 H COSY (correlation spectroscopy) and 2D 1H-13C HSQC (heteronuclear single quantum coherence spectroscopy) NMR spectra were also generated for selected samples. Spectra assignment was based on matching the 1D and 2D data to reference spectra in a homemade reference database, as well as with another database (https://www.hmdb.ca/, release 3.0, accessed on 11 May 2021) and reports in the literature.
Genome-scale metabolic network (GSMN) analysis was performed on the dataset of discriminant metabolites, as previously described [48]. To this end, the Sus scrofa GSMN was imported in MetExplore web server from KEGG database (network 4105, downloaded from KEGG database 24 August 2020) [49,50]. The network contains 1807 meta-bolic reactions and 1520 metabolites.

Statistical Analysis
All the statistical analyses were conducted under R [43] with in-house scripts developed for this study.

Sparse CCA Analyses
Canonical correlation analysis [51] is a method that studies the correlation relationships between two sets (or blocks) of features measured on the same individuals. Briefly, CCA maximizes the correlation between two linear combinations from the two blocks of features. The coefficients estimated for the definition of the linear combinations are used to explore the relationships between features. In this study, we used Robust Sparse CCA [13] for its ability to select relevant correlated features also in presence of outlier observations. Features with non-null loading values in the first pair of linear combinations (transcriptomic and metabolomic datasets) were selected. We adapted scripts from Wilms and Croux [13] to perform Robust Sparse CCA.

SOM Analyses
A self-organizing map [52] is an artificial neural network learned using an unsupervised algorithm aiming at projecting and classifying objects (individuals or features) into a lower dimensional space (e.g., two-dimensional). Prototype vectors are associated with each unit of the map: these are weight vectors, used to classify objects in map units. In this study, prototypes were randomly initialized. Two steps were iteratively performed in the stochastic version of SOM algorithm. In the first step, a randomly drawn single feature was assigned to the unit whose prototype was closest to it (according to the chosen distance). The second step was an update step: prototypes of the closest unit and its neighbors were moved towards the drawn object.
In this study, SOM was used to classify features. Features clustered in the same unit were considered to be correlated. A raw matrix or kernel matrix, based on a dissimilarity matrix or Gaussian kernel, were used to learn SOM. Examples of commonly used distance metrics include Euclidean distance and correlation for relative abundance data [35].

PLS-DA Analyses
Partial least squares [54] (PLS) regression seeks a mathematical relationship between a quantitative biological factor of interest-drug doses, for example-and several features, such as spectral data. PLS-DA is an extension of PLS regression for categorical factors, DON exposure in this study.
PLS-DA analyses were conducted using the ropls package [55] (v1.16.0). O2-PLS [54] is a generalization of the OPLS [56] regression method and can be used for combining two 'omics datasets. It models both predictive and systematic variation.
The variation present in each of the two matrices is split up into three parts. The first part corresponds to the joint-i.e., predictive (useful for the prediction of the studied biological factor)-variation. For example, in the transcriptomic data matrix, the joint variation describes expression profiles that are useful for predicting metabolite profiles. The second part is the orthogonal part (confounding variability-e.g., physiological, experimental, or technical variations). It corresponds to the unique systematic variation in the transcriptomic data matrix, for example. This part is not useful in predicting the metabolomic data. The final part corresponds to unexplained variance (noise).

Conclusions
This study had two objectives. The statistical methods sparse CCA and SOM were compared regarding their ability to select correlated features and, thus, to be used as in noise pre-filtering before O2-PLS modeling. Biologically, the information extracted from the metabolomic data could be prevented from technical variability and noise. We aimed at combining metabolomic data with transcriptomic data to enhance the information extracted from metabolomics and gain insight into effects of DON exposure on the intestine.
To apply sparse CCA, sparsity parameters have to be tuned. Moreover, due to computational limitations, not all the transcriptomic features could be included, and thus features have to be selected before applying CCA. We chose to include features displaying the highest standard deviation, even if this choice could discard discriminant features. Even with a limited number of features, this method is very time-consuming. Although some parameters (map dimensions, distance measures) also have to be set before applying SOM, this method is less time-consuming.
In this study, we showed that the integration of metabolomic data with transcriptomic data enables us to extract more information from metabolomics by combining feature selection based on SOM and an orthogonal step in PLS-DA modeling.
Indeed, we increased the number of discriminant metabolites from seven, as detected by the individual metabolomic block analysis, to 39, as selected by the dissimilarity kernelbased SOM. Both steps enabled us to remove noisy features. At the end of the process, it was possible to gain a better insight into metabolic pathways disrupted by exposure to DON. On top of the clues of the disturbance of energy metabolism previously reported in metabolomics studies, the integration of the metabolomics data and transcriptomics features simultaneously generated on jejunal explants remarkably brought out several other types of damage caused by the intestinal exposure to DON, including the alteration of protein synthesis, oxidative stress, and inflammasome trigger. In future work, we think that applying this integration to a panel of biological samples (e.g., plasma, lung, brain) could help us extend the extracted information to the whole-organism scale.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/ 10.3390/metabo11060407/s1, Supplementary file S1: Excel file including preprocessed NMR data. Supplementary file S2: Details of simulation study run to compare rsCCA and SOM, combined with O2-PLS, on artificial data. Figure S1: Average specificity of SOM (dissimilarity kernel-based and raw) and Robust Sparse CCA methods calculated for transcriptomic (middle blue bars) and metabolomic (light blue bars) blocks. In total, 100 Monte Carlo simulations. Figure S2: Average sensitivity of SOM (dissimilarity kernel-based and raw) and Robust Sparse CCA methods calculated for transcriptomic (middle blue bars) and metabolomic (light blue bars) blocks. In total, 100 Monte Carlo simulations. Figure S3: Average percentage of selected features by SOM (dissimilarity kernel-based and raw) and Robust Sparse CCA methods for transcriptomic (middle blue bars) and metabolomic (light blue bars) blocks. In total, 100 Monte Carlo simulations.