Metabolic Profile of the Cellulolytic Industrial Actinomycete Thermobifida fusca

Actinomycetes have a long history of being the source of numerous valuable natural products and medicinals. To expedite product discovery and optimization of biochemical production, high-throughput technologies can now be used to screen the library of compounds present (or produced) at a given time in an organism. This not only facilitates chemical product screening, but also provides a comprehensive methodology to the study cellular metabolic networks to inform cellular engineering. Here, we present some of the first metabolomic data of the industrial cellulolytic actinomycete Thermobifida fusca generated using LC-MS/MS. The underlying objective of conducting global metabolite profiling was to gain better insight on the innate capabilities of T. fusca, with a long-term goal of facilitating T. fusca-based bioprocesses. The T. fusca metabolome was characterized for growth on two cellulose-relevant carbon sources, cellobiose and Avicel. Furthermore, the comprehensive list of measured metabolites was computationally integrated into a metabolic model of T. fusca, to study metabolic shifts in the network flux associated with carbohydrate and amino acid metabolism.


Introduction
High-throughput experimental data (genomics, transcriptomics, proteomics, metabolomics) provide distinct glimpses into cellular function. When considering cellular metabolism, metabolomics is unique in being a direct outcome measure of integrated biochemical function, and can provide the starting point information for efficient pathway engineering. However, the dynamics of metabolomic data is not only defined by genetic and environmental factors, but also the life stages of the bacteria. As a broad classification, the difference between the primary and secondary metabolites is the growth phase when it is produced in the bacteria. The early log phase shows a higher concentration of primary metabolites, which is utilized for growth, and reproduction of the cells. This is followed by the production of secondary metabolites as the byproducts of the metabolic process. Many of these secondary metabolites are known for their medical and industrial relevance.
Actinomycetes have a long-cited history for producing medically-relevant chemicals, beginning with antibiotics (roughly two-thirds of clinically used antibiotics are produced by the actinomycete, Streptomyces). As actinomycetes continue to be studied, additional important compounds, such as siderophores [1], polyketides [2,3], and terpenoids [4,5], continue to be identified. The breadth of compounds natively produced by actinomycetes suggests that this group of Gram-positive bacteria may be naturally well-suited for production of medically-relevant compounds. This is supported by the fact that approximately 45 percent of all bioactive compounds obtained from microorganisms come from actinomycetes [4].

Functional Association of Metabolites
The MBRole [14] software was used for identifying functional associations of the detected metabolites. MBRole provides access to organism-specific precompiled lists of metabolites on KEGG as a reference [15,16]. The mapping category includes metabolic pathways, enzyme interactions, biological roles, chemical groups, and other interactions (transport associated molecules). The summary of functional classifications for detected metabolites is shown in Table 2. Table 2. Category distribution for metabolites detected in cellobiose (blue) and Avicel (red) growth condition. The total metabolites detected in cellobiose were 667 and in Avicel were 461. Thermobifida fusca is a soil bacteria that belongs to the family of actinomycetes. In this study, our goal is to gain a better understanding of the metabolic capabilities of T. fusca by measuring the T. fusca metabolome, and integrating this data with a computational metabolic model. This study is meant to extend previous work where metabolism in T. fusca was characterized using systems-level modeling based on genomic and proteomic data [6]. Specifically, the macromolecular network of T. fusca is studied using metabolite profiling for growth on two different media formulations (cellobiose and Avicel). Cellobiose has been studied in past as the inducer of cellulases [7][8][9][10][11][12][13], thus, is used as the reference media condition. Avicel, on the other hand, is a crystalline solid carbon source more closely related to plant waste, and therefore, used for comparative analysis.

Results
Metabolomic data was obtained for growth on cellobiose or Avicel with separate samples being analyzed for detection of polar and non-polar metabolites for each growth condition.

Coverage Polar and Non-Polar Metabolites
All the samples were run with technical replicates, and the metabolites detected in both the replicates were used for further processing and analysis. For growth of the wild type T. fusca strain on cellobiose, 618 polar and 64 non-polar compounds were detected. For growth on Avicel, 402 polar and 82 non-polar compounds were detected. Of the detected compounds, 15 (cellobiose) and 23 (Avicel) were detected in both polar and non-polar runs. A summary of the number of detected compounds is shown in Table 1. Table 1. Summary of compounds identified in the wild type strain of Thermobifida fusca when grown on cellobiose and Avicel media growth conditions. Cellobiose  667  618  64  15  Avicel  461  402  82  23

Functional Association of Metabolites
The MBRole [14] software was used for identifying functional associations of the detected metabolites. MBRole provides access to organism-specific precompiled lists of metabolites on KEGG as a reference [15,16]. The mapping category includes metabolic pathways, enzyme interactions, biological roles, chemical groups, and other interactions (transport associated molecules). The summary of functional classifications for detected metabolites is shown in Table 2. Table 2. Category distribution for metabolites detected in cellobiose (blue) and Avicel (red) growth condition. The total metabolites detected in cellobiose were 667 and in Avicel were 461. Pathways  72  71  191  129  476  332  Enzyme Interactions  299  213  71  45  595  416  Biological Roles  71  63  67  45  600  416  Chemical Groups  62  56  186  123  481  338  Other Interactions  95  74  31  25  636  436

Metabolites for Cellobiose Growth
Out of 667 metabolites detected in the cellobiose growth condition, no annotation could be detected for 474 compounds in T. fusca. For the detected compounds with annotations, the following were classified into the following categories: sugars (10 metabolites), lipids or fatty acids associated molecules (36), nucleotides (48), peptides or amino acids molecules (40), hormone or neurotransmitter like molecules (2), and vitamin molecules (9), which were detected in cellobiose growth condition. Eighty-one metabolites from secondary metabolism (or phytochemical compound) as named by KEGG were also identified. The other interactions associated with transporters (47), reporters (61), and ion channel-associated molecules were also identified. The transporter molecules included ABC type, and amino acid transporters (arginine, glycine, proline, methionine, arginine/ornithine, and glutamate). Among the receptor molecules, 24 glutamate receptors, including metabotropic and ionotropic subclass, were identified. The ionotropic receptors are the ligand gated non-selective ion channels and metabotropic activates biochemical cascades [17]. As compared to eukaryotic homologs, the prokaryotic glutamate receptors are less well-characterized. The detection of 24 compounds from this class opens an arena for further explorations of this family in T. fusca. This might also be useful to explain the strong secretome as studied in past [18].
Compounds associated with cytochrome P450 were another important family of compounds that was observed in the analysis. There were 21 metabolites associated with the CYP1 and CPY2 families. The primary function of this superfamily of proteins is to catalyze the oxidation of lipids, steroids and xenobiotic compounds, such as toxic chemicals and drugs [19].
Apart from learning the biological role of the molecules, these detected compounds were also mapped to the metabolic capabilities as pre-compiled by KEGG, using the open access online database interface MBRole [14]. Among these 206 detected compounds with annotation, 105 were associated with primary metabolism pathways, and 54 belonged to the secondary metabolism network. These networks are shown in Supplementary Figure S1a,b with compounds highlighted with grey. The major metabolic subsystems with the highest representation of detected metabolites were 38% in oxidative phosphorylation (tfu00190), with 38% of associated metabolites detected and nucleotide metabolism with 23% associated metabolites detected. For nucleotide metabolism, pyrimidine metabolism had 14 metabolites detected and purine metabolism had 21 metabolites detected. Detected metabolites associated with amino acids, carbohydrate, and secondary metabolism pathways are shown in Supplementary Table S1.

Metabolites for Avicel Growth
For growth of wild type T. fusca on Avicel, 461 compounds were detected, but 330 metabolites could not be associated with a functional category in T. fusca as defined by KEGG. The macromolecular distribution of annotated metabolites for growth on Avicel was: 7 sugar molecules, 21 lipids, 17 nucleotides, 43 peptides (and amino acids), and 6 vitamins. Most of the numbers were lesser as compared to cellobiose condition analysis, except for the number of identified peptides. For the transporter and receptor-associated compounds, the numbers were 43 and 41, respectively. There were 17 cytochrome P450 subfamily proteins and 19 glutamate receptor molecules (ionotropic and metabotropic) identified. One unique entity detected in the Avicel condition was 2-aminoethylphosphonate transporters, which are an ABC-type transporter that helps transport 2-aminoethylphosphonate across the membrane for utilization in bacteria. Bacteria has the capability to use these organophosphates as a source of carbon, energy, and phosphorous for growth. These organophosphates are often found in insecticides and herbicides [20].
Apart from the macromolecular classification, 131 identified compounds were studied for connections to metabolic pathways. Seventy-one metabolites were found to be associated with primary metabolic process, and 31 with secondary metabolism pathway. These networks are shown in Supplementary Figure S2a,b, with detected metabolites highlighted with grey. The metabolic subsystem with the highest percent of experimentally detected metabolites was lysine biosynthesis (tfu00300), where 9 out of 32 compounds (28%) were identified experimentally. Other detected compounds from carbohydrate, amino acid, and secondary metabolism pathways are listed in Supplementary Table S2.

Comparing Cellobiose vs Avice
Overall, 827 unique metabolites were detected when combining results for growth on cellobiose and on Avicel. Three hundred and one metabolites were common to both growth conditions, 366 were unique to cellobiose, and 160 were unique to the Avicel growth condition. The pathway association of the compounds in these two conditions has been summarized in Figure 1. compounds from carbohydrate, amino acid, and secondary metabolism pathways are listed in Supplementary Table S2.

Comparing Cellobiose vs Avice
Overall, 827 unique metabolites were detected when combining results for growth on cellobiose and on Avicel. Three hundred and one metabolites were common to both growth conditions, 366 were unique to cellobiose, and 160 were unique to the Avicel growth condition. The pathway association of the compounds in these two conditions has been summarized in Figure 1. When MBRole maps a metabolite to a metabolic pathway, it assigns a score of significance based on the background compounds known for each pathway versus the mapped list. Table 3 summarizes the significantly different pathway along with their uniquely detected compounds.
Some of the interesting compounds, besides the above list, that were unique to cellobiose media conditions, were dihydrostreptomycin 6-phosphate (C01221) from the streptomycin biosynthesis pathway, hexadecanoic acid (C00249) of fatty acid metabolism, lipoyl-AMP (C16238) of lipoic acid and D-alanine (C00041). When MBRole maps a metabolite to a metabolic pathway, it assigns a score of significance based on the background compounds known for each pathway versus the mapped list. Table 3 summarizes the significantly different pathway along with their uniquely detected compounds.
Some of the interesting compounds, besides the above list, that were unique to cellobiose media conditions, were dihydrostreptomycin 6-phosphate (C01221) from the streptomycin biosynthesis pathway, hexadecanoic acid (C00249) of fatty acid metabolism, lipoyl-AMP (C16238) of lipoic acid and D-alanine (C00041). Among the 301 compounds whose peaks were detected in both cellobiose and Avicel condition, 88 were observed in differential quantities. Nineteen (out of 88) were detected at higher concentration in cellobiose, and 36 (out of 88) were higher in Avicel. These compounds when mapped with biological functions that belonged to fatty acid synthesis (C05763); lysine biosynthesis (C03871, C12986); arginine and proline biosynthesis (C00624, C03415), and glycerophospholipid metabolism (C00513) for the cellobiose condition. In the Avicel growth conditions, ubiquinone and other terpenoid-quinone biosynthesis (C00811); lysine degradation (C03955); phenylalanine metabolism (C00811); gamma-hexachlorocyclohexane degradation (C12836) and selenoamino acid metabolism (C05699) were the pathways for which the respective compounds were detected in higher concentrations. A summarized list of compounds is shown in Table 4. Table 4. Compounds with significantly different concentrations as detected in two growth media conditions (green-cellobiose, red-Avicel).
Among the 301 compounds whose peaks were detected in both cellobiose and Avicel condition, 88 were observed in differential quantities. Nineteen (out of 88) were detected at higher concentration in cellobiose, and 36 (out of 88) were higher in Avicel. These compounds when mapped with biological functions that belonged to fatty acid synthesis (C05763); lysine biosynthesis (C03871, C12986); arginine and proline biosynthesis (C00624, C03415), and glycerophospholipid metabolism (C00513) for the cellobiose condition. In the Avicel growth conditions, ubiquinone and other terpenoid-quinone biosynthesis (C00811); lysine degradation (C03955); phenylalanine metabolism (C00811); gamma-hexachlorocyclohexane degradation (C12836) and selenoamino acid metabolism (C05699) were the pathways for which the respective compounds were detected in higher concentrations. A summarized list of compounds is shown in Table 4.

Metabolites in Secondary Metabolism
The global metabolite detection was done for the general metabolites and mapped to the metabolic pathway of T. fusca using MBRole. As per the KEGG's compilation of tfu01110 Biosynthesis of Secondary Metabolites, there were a total of 1038 compounds in the list. This compilation includes intermediate metabolites and the final products. The metabolomics data for growth on cellobiose and growth on Avicel was compared to the list of 1038 secondary metabolite compounds, and 63 metabolites associated with secondary metabolism were identified. To find of the missing nodes of the pre-compiled tfu01110 list, the metabolomics data was mapped to the list of all the natural products (total 2480). This gave an interesting list of 88 secondary metabolites. This breaks down into a total of alkaloids (41), terpenoids (25), phenylpropanoids (8), amino acid-related compounds (8), polyketides (4), flavonoids (4), and fatty acids related compounds (2). The details of these subcategories can be seen in Table 5. The list of compounds can be obtained in Supplementary Table  S1 for growth on cellobiose, Supplementary Table S2 for growth on Avicel, and raw data in Supplementary Table S3.

Metabolites in Secondary Metabolism
The global metabolite detection was done for the general metabolites and mapped to the metabolic pathway of T. fusca using MBRole. As per the KEGG's compilation of tfu01110 Biosynthesis of Secondary Metabolites, there were a total of 1038 compounds in the list. This compilation includes intermediate metabolites and the final products. The metabolomics data for growth on cellobiose and growth on Avicel was compared to the list of 1038 secondary metabolite compounds, and 63 metabolites associated with secondary metabolism were identified. To find of the missing nodes of the pre-compiled tfu01110 list, the metabolomics data was mapped to the list of all the natural products (total 2480). This gave an interesting list of 88 secondary metabolites. This breaks down into a total of alkaloids (41), terpenoids (25), phenylpropanoids (8), amino acid-related compounds (8), polyketides (4), flavonoids (4), and fatty acids related compounds (2). The details of these subcategories can be seen in Table 5. The list of compounds can be obtained in Supplementary Table S1 for growth  on cellobiose, Supplementary Table S2 for growth on Avicel, and raw data in Supplementary Table S3.
One of the interesting findings from focusing on secondary metabolism was identification of the hemiterpenoid, isoprene. Past studies have shown that both Gram-positive and Gram-negative bacteria possess the capability to produce isoprene [21]. A comparative study between Bacillus sp., Micrococcus, Rhodococcus, E. coli, Pseudomonas, and Agrobacterium was conducted, and concluded that Bacillus ranks highest among all in the production of isoprene. The study illustrated the capability of Gram-positive bacteria to produce this industrially significant product. In our metabolomics analysis for T. fusca, we propose that this actinomycete also possesses similar characteristics and the ability to produce isoprene. The biological role of this compound as defined by Sharkey et al. in 2007 [22] as protection against the temperature stress (~40 • C). This might be useful to understand the mechanism of thermostability in T. fusca, a naturally thermophilic organism.

Integrating Metabolomic Data with a Metabolic Model
The predictive and analytical capabilities of constraint-based metabolic models can be significantly improved by integrating experimental data as model constraints. To gain a better in-context understanding of the metabolomics data, and to further improve our previously published metabolic model of T. fusca [6], we implemented a method for directly integrating metabolomics data into a constraint-based model.
For this study, we have focused only on data associated with growth on cellobiose, since the published T. fusca model was also based on growth on cellobiose. Starting with 946 reactions in the proteomics-based metabolic model of T. fusca, flux balance analysis predicted 56 reactions with low flux and 173 reactions with high flux, i.e., a total of 229 active flux reactions. Metabolomic data from this study was integrated into the metabolic model using the metabolites with high-confidence annotations, and the pathways with active fluxes was reduced to 224 (57 low flux reactions and 167 high flux reactions).
The reaction distribution in FBA model (without metabolomics data) and FBA with metabolomics data from cellobiose media is discussed in the following three categories: (1) reactions that remain unchanged in both scenarios (with and without metabolomics data integration), (2) reactions that were active in both scenarios, but there was a change in the flux value, and (3) reactions that were inactive in one scenario, but were active in the other.
Unchanged reactions: There were 773 out of 946 reactions that remain unchanged. Accounting for reactions with the same active flux in both scenarios, we found 60 reactions had same active flux. This involves 20 amino acid pathway reactions, 12 carbohydrate metabolism pathways, and 9 nucleotide metabolism pathways. These core, active pathways may constitute core metabolic pathways that remain constant, irrespective of the external parameters.
Changed active reactions: The reaction flux comparison with respect to FBA simulations (without metabolites data) can be divided into 4 parts-reactions with high flux (1000 to 250), moderate flux (249 to 3.8), low flux (3.8 to 0), and negative flux (0 to −1000).
The overview of reaction flux variation can be seen in Figure 2 below. This involves 20 amino acid pathway reactions, 12 carbohydrate metabolism pathways, and 9 nucleotide metabolism pathways. These core, active pathways may constitute core metabolic pathways that remain constant, irrespective of the external parameters. Changed active reactions: The reaction flux comparison with respect to FBA simulations (without metabolites data) can be divided into 4 parts-reactions with high flux (1000 to 250), moderate flux (249 to 3.8), low flux (3.8 to 0), and negative flux (0 to −1000).
The overview of reaction flux variation can be seen in Figure 2 below.

Figure 2.
Comparison of metabolic pathways with high, medium, low, and negative fluxes between FBA simulations (red) and metabolomics-integrated simulations (green).
It was observed that the moderate and low flux reactions showed an overall shift in the value to the lower range, however, the trend of which reactions were used remained largely unchanged. However, among the 39 reactions with high flux, there was a wide range of variability on how the flux changed for when the metabolites data was integrated.
This variability is further explained with respect to amino acid ( Figure 3a) and carbohydrate metabolism (Figure 3b). On integrating the metabolomic data, reactions between the arginine to fumarate changed from low flux to high flux. Also, the interconversion of arginine to citrulline was changed from low flux to moderate flux. Some of the major changes also include the inactivity around previously active reactions. These include reactions in tryptophan metabolism from chorismate and formation of lysine from homoserine. A significant drop in flux at formation of aspartate from oxaloacetate and formation of proline from arginine were also found. It was observed that the moderate and low flux reactions showed an overall shift in the value to the lower range, however, the trend of which reactions were used remained largely unchanged. However, among the 39 reactions with high flux, there was a wide range of variability on how the flux changed for when the metabolites data was integrated.
This variability is further explained with respect to amino acid ( Figure 3a) and carbohydrate metabolism (Figure 3b). On integrating the metabolomic data, reactions between the arginine to fumarate changed from low flux to high flux. Also, the interconversion of arginine to citrulline was changed from low flux to moderate flux. Some of the major changes also include the inactivity around previously active reactions. These include reactions in tryptophan metabolism from chorismate and formation of lysine from homoserine. A significant drop in flux at formation of aspartate from oxaloacetate and formation of proline from arginine were also found.  In carbohydrate metabolism, significant changes were observed in glycolysis, TCA cycle, and pentose phosphate metabolism. In glycolysis, the pyruvate kinase mediated conversion of glucose to glucose 6-phosphate is reduced from a high flux reaction to inactive after data integration. In the TCA cycle, there is a positive shift (low flux to moderate flux) in all the reactions from succinyl-CoA to oxaloacetate, whereas there is a slight reduction in flux through intermediate reactions leading to formation of succinyl-CoA from alpha-ketoglutarate. In the pentose phosphate metabolism pathway, there is a significant increase in flux for the conversion of glucose 6-P to 6-phosphgluconolactone. This variability is illustrated in the graphs shown in Figure 3b. The detailed reactions list is presented in Table 6b, along with their fluxes values.  Inactive to active change: Finally, there was a set of nine reactions where flux balance analysis predicted active fluxes, but integration of metabolomic data indicated the reaction should be inactive (Table 7). Generally, the reactions that were found to be inactive after integration of metabolomics data were pathways where fluxes were easily routed to alternate pathways without significant effects on cellular phenotype.

Discussion
Metabolomics data provides direct evidence for the presence of substrate or product for all the cellular and biochemical reaction catalyzed in the cell at a specific time point. This data type has some unique strengths and weaknesses when used to study and analyze an organism's metabolic capabilities and functionality. The global metabolomics profile of T. fusca presented here are the first ever metabolomic datasets, which can be used for the study of specific target molecules depending upon the application of pathway of focus.
One of the main strengths of metabolomics data is the direct detection of specific chemical species which is particularly useful when considering secondary metabolism. Due to the nuances and potential for unique secondary metabolic capabilities of an organism (which can vary much more than central metabolism), the biochemical production capabilities of a poorly-characterized organism are often speculative, if based upon genome-centric knowledge. In considering T. fusca as a potential cellolytic platform for producing biochemical products, there are a number of compounds that bioinformatically appear to have the potential to be natively produced, but have yet to be confirmed experimentally. In this study, we were able to identify several secondary metabolites produced by T. fusca, including isoprene, a high-value precursor metabolite that has not been previously identified for production in T. fusca.
An extended use of metabolomic data can be integration into other computational analysis approaches, such as metabolic models to help better understand the coordinated functionality of an organism's metabolic network. In this study, we employed an algorithmic approach to integrate metabolomics data with a constraint-based metabolic model, focusing on using metabolites that were uniquely identified with high-confidence. Use of metabolomics data provides a completely complimentary type of information to the information that can be obtained from genomic, transcriptomic, or proteomic data. All of these data types can identify potential biochemical functionality, but they all also have the potential pitfall of being multifunctional (annotations for different functions for the same gene/protein). While genomic, transcriptomic, or proteomic data potentially has a wider coverage across a metabolic network, only metabolomics data has the potential to directly distinguish and confirm biochemical functionality. This helped confirm activity of some central metabolic pathways, and helped identify shifts in pathway usage.
In this study, the breadth of metabolomic data coverage across the metabolic network of T. fusca was limited, but the data that was obtained was used to help identify some differences in functional states between growth on cellobiose and Avicel. Most of the observed changes were consistent with what might be intuitively expected. Cellular growth is faster on cellobiose than on Avicel, and this is consistent with some of the changes that show increased fluxes and metabolites detected in amino acid synthesis that are required to support growth. Growth on Avicel is slower and requires secretion of proteins (e.g., cellulases) and uptake of a broader range of substrates, and thus, there was an increase in support subsystems and transporters. Furthermore, integration of metabolomics data with metabolic modeling was able to identify subtle shifts in pathway usage that were not possible using proteomic data.
The extent of utility of the metabolomics data in this study was severely limited by the ability to uniquely and definitively associate detected metabolites with a single chemical species. Raw metabolomic data currently suffers from a limitation of compound identification and accurate prediction due to the data available on the resource banks, such as KEGG, HMDB, etc. This is seen in compounds that have closely related masses and retention times. Some of the very closely related masses with similar retention times that were observed in the dataset are as follows: UDP (C00015), with the mass of 404.002200, and aluminoparaaminosalicylate calcium (C13104), with the mass of 404.011300. These two compounds vary at the second decimal place for the masses, and had similar retention time. Another example was 1-methyl-4-phenylpyridinium (C11310, mass = 170.097000) and furfural diethyl acetal (C14280, mass = 170.094300). Their mass varies at the third decimal place. Similarly, there were compounds that varied at fourth and fifth decimal points also having similar retention time. In our opinion, this is not the limitation of the detection technique, but more of the inadequacy at the resource bank end or the mapping/identification algorithm end. This leads to problems at both ends of the analysis pipeline, not only overestimation, but also sometimes underestimation or no annotations (in this case). Approximately, seventy percent of the compounds could not be well annotated in the context of the T. fusca pre-compiled compounds listed by KEGG. This was partially explained by limited functional annotation for T. fusca on KEGG. There was a significant increase in mapped compounds to this database. Due to these shortcomings, it is almost impossible to obtain an exhaustive coverage like in the case of genomics or transcriptomics level dataset. And if it is possible, the hierarchy of techniques as shown in Figure 1 will have to be followed to get the most comprehensive, confirmative, and accurate list of metabolites.
Thus, this method is useful to get an idea of focus targets, and has to be paired with the targeted metabolite search for the specific pathway of interest.

Materials and Methods
Global metabolite profiling experiments attempting to characterize all metabolites of a cellular system involve a huge combinatorial hierarchy of techniques [23]. Figure 4 shows the compound and technique classification that may be used for targeted or untargeted metabolite profiling. If a well-structured study is performed to attain an exhaustive metabolite profile of the microbe, it is possible to perform a top-down approach to forward engineer of a synthetic microbe with de novo synthesized genetic information to reach designed targeted production. This may further be developed as the designing principle for new synthetic microbes with most efficient manufacturing genetic and proteomic system.  In the current scope, only untargeted metabolite profiling using HPLC/MS-TOF in positive polarity was conducted. Using a methanol/water/chloroform extraction method, the aqueous and non-aqueous layers are separated for two different runs: polar and non-polar. The peak and mass data acquired on Agilent Mass Hunter Software from LC-MS was sent to the Omics Discovery Pipeline at Bindley Bioscience Center for deconvolution, alignment, normalization, and identification of the compounds [24]. The parameters for each step were established after observing the results from each previous step. This analysis detects a subset of polar and non-polar compounds representing general metabolites and lipids, respectively. Technical replicates of T. fusca wild type grown on cellobiose and Avicel. This is the first ever attempt to investigate metabolomics of T. fusca. It opens an arena for target based metabolite profiling if required to specifically look into the pathways/product of interest. Raw metabolomics data is included in Supplementary Table S3.

Cell Growth
T. fusca YX strain (ATCC BAA-629) cells were grown on Hagerdahl media (ATCC medium: 2382) with cellobiose and Avicel as the carbon source (5.0 g/L) at 55 • C. The two growth conditions resulted in the dry cell weight of 8.4 mg/mL for cellobiose, and 4.5 mg/mL for growth on Avicel. Analysis and preparation of cells for metabolomics analysis was conducted during exponential growth.

Extraction Process
Frozen cell pellets were re-suspended in 1mL methanol and transferred to glass tubes. CHCl 3 (3 mL) was added to each tube and sealed with foil to avoid contamination. The tubes were incubated for 5 min in a sonicator-water bath for cell disruption. MeOH/H 2 O (3:2) was added to the cell lysate and kept aside, to allow for phase separation. The final ratio of solvents MeOH/H 2 O/CHCl3 was 4:2:3. The 6mL of polar and 3 mL of non-polar phase were separated and kept on speed vacuum overnight. These vials were stored at −80 • C after sealing with parafilm for subsequent analysis.

Polar and Non-Polar General Metabolites Isolation
The processing of small molecules detection along with the instrumentation conditions are described in Figure 5. The 6mL of polar and 3 mL of non-polar phase were separated and kept on speed vacuum overnight. These vials were stored at −80 °C after sealing with parafilm for subsequent analysis.

Polar and Non-Polar General Metabolites Isolation
The processing of small molecules detection along with the instrumentation conditions are described in Figure 5.

Statistical Analysis Using Omics Discovery Pipeline
The raw data from the Agilent Mass Hunter software was fed to the Omics discovery pipeline developed by Purdue Bindley Biosciences Core Metabolite Care Lab Facility. This pipeline processed the data by performing deconvolution of peaks using XMASS. Preprocessing of m/z, intensity, and retention time data is fed to XALIGN to align the data from technical replicates, to reduce the possibility of instrumentation error. Further, t-tests were conducted to see significant difference from the blank runs, and eliminate the peaks due to solvents and spiked standards (mass of 121.050873 and 922.009798 respectively) [24].

Statistical Analysis Using Omics Discovery Pipeline
The raw data from the Agilent Mass Hunter software was fed to the Omics discovery pipeline developed by Purdue Bindley Biosciences Core Metabolite Care Lab Facility. This pipeline processed the data by performing deconvolution of peaks using XMASS. Preprocessing of m/z, intensity, and retention time data is fed to XALIGN to align the data from technical replicates, to reduce the possibility of instrumentation error. Further, t-tests were conducted to see significant difference from the blank runs, and eliminate the peaks due to solvents and spiked standards (mass of 121.050873 and 922.009798 respectively) [24].

Metabolite Identification and Pathway Association
The polar and non-polar extracts were run as separate experiments, and the peak data was collected in the form of mzData files by the software Mass Hunter provided by Agilent. These mzData files were processed using Omics Discovery Pipeline (www.omicsDP.org) provided by Purdue [24]. An illustration of retention time plot, spectra data, and output of identified compounds is shown in Figure 6. . Illustration of retention time, spectra data peak, and sample output file obtained for T. fusca wild type strain grown on cellobiose growth media.
The identified set of metabolites was then mapped to T. fusca pathways using the precompiled compound list on KEGG [15,16] as a reference. This was done using MBRole [14], an online compilation of metabolites and pathways.

Metabolic Model and Metabolomics Integration
For a metabolic reconstruction, a flux distribution is a set of relative reaction rates that satisfy conservation of mass constraints. To derive a flux distribution that reflects the presence or absence of metabolites observed in metabolomics data, we adopt a simple modification of the standard flux balance analysis model. We find a flux distribution that maximizes the number of metabolites that are present in both the metabolomics data and our flux distribution. Presence of a metabolite in the metabolomics data is defined as being observed beyond a certain threshold in the metabolomics data. Presence of a metabolite in a flux distribution means that the sum of fluxes through reactions producing that metabolite exceeds a certain threshold.  The identified set of metabolites was then mapped to T. fusca pathways using the precompiled compound list on KEGG [15,16] as a reference. This was done using MBRole [14], an online compilation of metabolites and pathways.

Metabolic Model and Metabolomics Integration
For a metabolic reconstruction, a flux distribution is a set of relative reaction rates that satisfy conservation of mass constraints. To derive a flux distribution that reflects the presence or absence of metabolites observed in metabolomics data, we adopt a simple modification of the standard flux balance analysis model. We find a flux distribution that maximizes the number of metabolites that are present in both the metabolomics data and our flux distribution. Presence of a metabolite in the metabolomics data is defined as being observed beyond a certain threshold in the metabolomics data. Presence of a metabolite in a flux distribution means that the sum of fluxes through reactions producing that metabolite exceeds a certain threshold.
Let R be the set of reactions and M be the set of metabolites that are in a draft metabolic reconstruction. Let M be a subset of metabolites that are present in the metabolomics data. Let S ij be the stoichiometric coefficient for metabolite i in reaction j for i ∈ M, j ∈ R, with the convention that S ij is positive for reactions producing metabolite i, negative for reactions using metabolite i, and zero otherwise. Let v j be the flux through reaction j for j ∈ R. If a reaction is reversible, we represent the reaction as two irreversible reactions; because the columns in the stoichiometric matrix are linearly independent, only one direction will be selected in any basic optimal flux in our optimization model. In the following mixed integer program, x i equals 1 if metabolite i is present in both the metabolomic data and a flux distribution and 0 otherwise, for i ∈ M . max ∑ i∈M x i , The objective function maximizes the number of metabolites present in both the metabolomics data and the flux distribution. The first set of constraints enforces conservation of mass. The next set of constraints require that if a metabolite is counted as present in the flux distribution, then the sum of fluxes producing that metabolite must be at least ε. For the experiments we conducted, we set ε = 1.