This article is an open-access article distributed under the terms and conditions of the Creative Commons Attribution license (

The goals of metabolic engineering are well-served by the biological information provided by metabolomics: information on how the cell is currently using its biochemical resources is perhaps one of the best ways to inform strategies to engineer a cell to produce a target compound. Using the analysis of extracellular or intracellular levels of the target compound (or a few closely related molecules) to drive metabolic engineering is quite common. However, there is surprisingly little systematic use of metabolomics datasets, which simultaneously measure hundreds of metabolites rather than just a few, for that same purpose. Here, we review the most common systematic approaches to integrating metabolite data with metabolic engineering, with emphasis on existing efforts to use whole-metabolome datasets. We then review some of the most common approaches for computational modeling of cell-wide metabolism, including constraint-based models, and discuss current computational approaches that explicitly use metabolomics data. We conclude with discussion of the broader potential of computational approaches that systematically use metabolomics data to drive metabolic engineering.

Abbreviation | Meaning | Abbreviation | Meaning |
---|---|---|---|

CE-MS | Capillary Electrophoresis-Mass Spectrometry | MOMA | Minimization Of Metabolic Adjustment |

CHO | Chinese Hamster Ovary | NET | Network-Embedded Thermodynamic |

COBRA | Constraints Based Reconstruction and Analysis | NMR | Nuclear Magnetic Resonance |

dFBA | Dynamic Flux Balance Analysis | OMNI | Optimal Metabolic Network Identification |

EMUs | Elementary Metabolite Units | ODE | Ordinary Differential Equation |

FBA | Flux Balance Analysis | PLS | Partial Least Squares |

GC-MS | Gas Chromatography-Mass Spectrometry | PLS-DA | Partial Least Squares Discriminant Analysis |

HCA | Hierarchical Clustering Analysis | PCA | Principal Components Analysis |

HPLC | High-Performance Liquid Chromatography | QP | Quadratic Programming |

idFBA | Integrated-Dynamic Flux Balance Analysis | rFBA | Regulatory Flux Balance Analysis |

iFBA | Integrated Flux Balance Analysis | SBRT | Systems Biology Research Tool |

IOMA | Integrative “Omics”-Metabolic Analysis | TMFA | Thermodynamic Metabolic Flux Analysis |

LP | Linear Programming | TAL | Transaldolase |

LC-MS | Liquid Chromatography-Mass Spectrometry | TKL | Transketolase |

MASS | Mass Action Stoichiometric Simulation | TCA | Tricarboxylic Acid |

MCA | Metabolic Control Analysis | VIP | Variable Importance in the Projection |

MFA | Metabolic Flux Analysis | VHG | Very High Gravity |

Organisms such as

Frequently, metabolic engineering studies use targeted analysis of a few carefully selected intracellular or secreted extracellular compounds to drive or assess the progress of their efforts [

Another technique used to characterize metabolic pathways during metabolic engineering is Metabolic Flux Analysis (MFA). MFA provides more information than measurement of just a few metabolites, and is a staple technique of many who work in metabolic engineering [^{13}C labels) are leveraged to calculate fluxes – the rate at which material is processed through a metabolic pathway – from knowledge of carbon-carbon transitions in each reaction and the measured isotopomer distribution in each metabolite [^{13}C protocols and analytical platforms [

Metabolic engineering seeks to maximize the production of selected metabolites in a cell, whether produced by the organisms’ natural metabolic activities or by entire exogenous pathways introduced through genetic engineering. Strategic, small-scale measurements and flux calculations have to date been indispensable tools for metabolic engineering. However, the development of systems-level analyses – precipitated by whole-genome sequencing and the rapid accumulation of data on RNA, protein and metabolite levels – has provided new opportunities to more completely understand the effects of strain manipulations. Genetic modifications often have additional effects outside the immediately targeted pathway, and a better understanding of the nature and extent of these perturbations would lead to more effective strategies for redesigning strains, as well as improved ability to understand why a proposed design may fail to achieve its predicted performance.

Aided by recent advancements in analytical platforms that allow for the simultaneous measurement of a wide spectrum of metabolites, metabolomics (the analysis of the total metabolic content of living systems) is approaching the level of maturity of preceding “global analysis” fields like proteomics and transcriptomics [

Here, we review examples of recent strategies to integrate metabolomics datasets into metabolic engineering. First, we briefly cover the fundamentals of metabolomics. We then discuss strategies for assessing metabolic engineering strain designs, and how metabolomics methods can extend these strategies. We follow with discussion of computational tools for metabolic engineering, with an emphasis on how these methods are used to design strains and predict their performance as well as how metabolomics datasets are currently applied to computational modeling. We conclude with a brief summary of the state of the field and the potential that integrating metabolomics presents.

The development of metabolomics, the newest of the global analysis methods, has much in common with its predecessor fields of genomics, transcriptomics, and proteomics [

One of the primary difficulties facing the development of metabolomics has been the staggering diversity of metabolites. Metabolites are substantially more chemically diverse than the subunit-based chemistries of DNA, RNA, and proteins, impeding the progress of metabolomics as a truly “omics” field that measures all metabolites. The entire genome and transcriptome can be (at least theoretically) surveyed using single platforms, from simple PCR to more exhaustive sequencing and microarrays, whereas metabolomics requires multiple analytical platforms to achieve complete coverage of all metabolites.

Common approaches involve coupling of a chromatographic separation to mass spectrometry, including gas chromatography-mass spectrometry (GC-MS) [

As the youngest of the global analysis methods, metabolomics has drawn heavily from the data analysis techniques developed for transcriptomics and proteomics. Like these two fields, the datasets generated by metabolomics suffer from a “curse of dimensionality,” where there are far more variables than there are samples. Methods taken from transcriptomics and proteomics, as well as some derived from the field of chemometrics, have been used extensively to analyze metabolomics data as a result [

One of the most prominent methods for analysis of metabolomics data is Principal Components Analysis (PCA) (

A few examples of using PCA to reveal underlying patterns in metabolomics datasets include the characterization of extracellular culture conditions in Chinese Hamster Ovary (CHO) cell batch cultures [

Much of the value of PCA comes from its dimensional reduction capabilities: typically the first few components contain biologically relevant information, and higher components contain variance due to noise or biological variability. The number of components that are “significant” is an open question, and depends predominantly on the dataset or even the specific downstream processing and applications [

Partial Least Squares (PLS) regression and discriminant analysis (PLS-DA,

Examples of data analysis techniques for metabolomics. The effects of glucose deprivation on a cancer cell line were measured with GC-MS and analyzed in MetaboAnalyst [^{2} and Q^{2} denote, respectively, the goodness of fit and goodness of prediction statistics. (

Complete and effective use of a metabolomics dataset necessitates not only careful design of experiment and data processing methods, but also a thorough validation of conclusions from data analysis (e.g. apparent clusters in principal component space). For example, discussion of p-value distributions by Hojer-Pedersen

Metabolomics continues to be exploited for numerous biomedical applications, ranging from the study of differences between clinically isolated and industrial yeast strains [

The simplest and most direct use of metabolomics datasets is as an extension of existing small-scale metabolite analyses; metabolomics inherently enables a more comprehensive assessment of a strain than a handful of narrowly selected measurements. Studies employing this approach typically either compare strains and culture conditions, or seek to monitor the time-course evolution of many metabolite concentrations in parallel. These studies use a combination of measured growth and production parameters in conjunction with direct examination of the metabolomics data (e.g. significant increases or decreases in metabolite levels) in the context of known biochemical pathways to determine the effects of mutations and culture conditions. For example, if one overexpresses the enzyme that is the first step in a linear biosynthetic pathway and finds that the first few metabolites accumulate significantly but subsequent metabolites do not, this may suggest a rate-limiting step further down the pathway that needs to be upregulated. Broader knowledge of metabolite levels beyond the target pathway can serve to determine the wider-ranging effects of a given metabolic engineering perturbation and can suggest candidate supplementary perturbations (to address, for example, cofactor imbalances).

One example of a strain- or condition-comparison approach is a study of an arcA mutant in ^{13}C MFA-derived fluxes, they found significant differences in tricarboxylic acid (TCA) cycle metabolism and ATP production among conditions. Similarly, Christen ^{13}C MFA suggested differences between TCA cycle fluxes and consistent flux through glycolysis, there was a much wider variation of metabolite levels across species – especially in amino acid pool compositions. They also found that across species, these values correlated poorly with fluxes.

In an example of time-course analysis, Hasunuma

Other work with

The simple approaches used to exploit the results of targeted measurements can be scaled up to metabolomics datasets, but they often do not take full advantage of structures or patterns in the data at the systems level. Many in the field of metabolic engineering have used multivariate techniques to interrogate metabolomics datasets on more complicated questions about strain performance and metabolite allocation. Due to the complexity of biological systems, the answers to these questions are often non-intuitive and increasingly difficult to identify without taking such a systems-scale approach.

While rational design approaches were the original driving force in metabolic engineering, directed evolution and high throughput screens of mutant libraries have since become increasingly commonplace [

Common techniques employed in such approaches include HCA, PCA, and PLS-DA. These methods generate clusters or loadings that identify key metabolite differences, which in turn suggest what genetic changes may have been selected for. For example, Hong

A study by Yoshida _{2} and H_{2}S production [_{2}S and SO_{2}. They verified their prediction by exposing

Similarly, Wisselink

Other examples of using metabolomics for after-the-fact assessment of engineered strains include studying the effects of repeated exposure to vacuum fermentation conditions on

While the goal of metabolic engineering is to introduce a change on the metabolic level, many of these changes are necessarily implemented by introducing genetic modifications to affect transcriptional levels. As such, analysis of biological layers beyond the metabolome, such as the transcriptome and proteome, can provide further, and sometimes crucial, insight into the wide-reaching effects of an alteration.

A number of techniques widely used in metabolomics (such as PCA and HCA) are also well-established for many of these other “omics” datasets, though there are a number of other techniques that until recently were more specific to transcriptional or proteomic analyses. One of the most prominent examples of this is enrichment analysis, originally developed for transcriptomics datasets. Enrichment analysis uses information about the frequency of occurrence or the ranking of sets of gene names or functions in a given list of genes to examine the biological relevance of observed changes [

In metabolic engineering, use of metabolomics is comparatively much less common than using other global analysis approaches, perhaps attributable to the maturity of fields like transcriptomics and proteomics compared to metabolomics. Proteomics, transcriptomics and genomics have frequently been combined with small-scale metabolite measurements for metabolic engineering purposes. Examples of this include functional genomics with targeted metabolite measurements for isoprenoid production in

Flux measurements have also been commonly combined with other “omics” datasets, such as with the transcriptome and proteome in an analysis of lysine-producing ^{13}C MFA to understand metabolic behavior in ^{13}C MFA-derived fluxes [

The above examples at most used small-scale metabolite measurements, but a handful of studies have combined analysis of full metabolomics datasets with other “omics” datasets. Previously described analysis of adaptations from directed evolution generally fit this category: transcriptional measurements using microarrays [

In other applications, Piddock

The emergence of genome-scale investigations has led to a deluge of information about all molecular layers in the cell. This in turn has provided a broader context in which metabolic engineering strategies can be evaluated. However, we note that many of the techniques discussed so far have focused on systematic

One of the difficulties in applying metabolomics datasets to strain design is the volume of data produced in a metabolomics experiment. Computational approaches are well suited to systematically integrate large volumes of biochemical knowledge and data. As shown in

Applications of various techniques to understanding and manipulating cellular metabolism. Solid lines represent widely used strategies, dashed lines represent underused strategies. Both metabolomics and transcriptional profiling provide a direct readout that helps enable a deeper understanding of cellular metabolism, but only transcriptional profiling has seen widespread application to enhance standard computational modeling and metabolic engineering strategies. Integrating metabolomics data into metabolic engineering and computational modeling strategies would help bridge gaps in biochemical knowledge and improve our ability to control cellular metabolism.

One of the most powerful ways to extend this concept would be to include metabolomic data in the design and fitting of computational models. While there are many well-developed metabolic modeling strategies, most of these approaches have not yet been adapted to effectively leverage the additional information that metabolomics can offer. Nonetheless, these strategies have made substantial contributions to metabolic engineering. We discuss these computational approaches to establish how they have been used to date in metabolic engineering, to suggest how metabolomics can contribute to their effectiveness, and to highlight current efforts to integrate the two.

The most basic models for metabolic engineering use simplified equations for bioreactor kinetics to empirically fit relationships between characteristics such as metabolite uptake or secretion and specific growth rate. While these models are useful as tools for investigating specific behaviors of existing strains, their small-scale and coarse-grained nature precludes broader application to directing engineering strategies, as well as the possibility of substantively integrating metabolomics data even when available [

Early biochemical modeling strategies initially sought to move beyond such simplistic approaches by compiling knowledge of metabolic pathways and enzyme kinetics into detailed mechanistic models to predict the dynamic behavior of metabolite concentrations [

To attempt to overcome these issues, “constraint-based” approaches that calculate metabolic fluxes primarily from stoichiometry were developed. This change of focus from dynamic metabolite levels to fluxes made sense, as the idea of optimizing and controlling metabolic fluxes has long been a fundamental part of metabolic engineering. These approaches allow flux calculations without the difficulties arising from parametric uncertainty by predicting flux distributions from the structure of the biochemical network and constraints on the feasible range of fluxes [

Flux Balance Analysis (FBA) is, in short, a modeling technique that uses metabolic network stoichiometry, a set of feasible flux ranges, and a cellular objective function to calculate an optimal flux distribution for a metabolic network [

A few examples of basic FBA for metabolic engineering include estimation of flux distributions from MFA measurements [^{13}C MFA [

A prerequisite step in FBA is the reconstruction of genome-scale metabolic networks for the organism of interest. Reviews by Fiest

A few recent examples particularly relevant to metabolic engineering and metabolomics include models of

The original FBA framework has been supplemented with dozens of refinements broadly referred to as constraint-based models. While these models retain the optimization problem framework based on stoichiometric constraints, the flux constraints or objective function are altered. We direct the reader to reviews on the topic of FBA by Lee, Gianchanani, and others for more complete discussion of these methods [

The most basic refinements are straightforward extensions of FBA, from adding a simplified representation of transcriptional regulatory constraints, to integrating uptake/effluxes and comparing against extracellular concentration profiles. Examples from this family include regulatory FBA (rFBA) [

Another class of refinements to FBA comprises methods intended to better reduce the discrepancy between model predictions and experimental observations. Optimal metabolic network identification (OMNI) is used to identify discrepancies between measured and predicted fluxes, and then determine changes that need to be made to the model to better match the measurements [

More directly relevant to metabolic engineering applications is a class of refinements focused on predicting the result of metabolic network alterations. An early and well-known example of this is Minimization of Metabolic Adjustment (MOMA), which formulates a quadratic programming (QP) problem to find the feasible flux distribution nearest to the original FBA solution in response to a gene knockout [

As reviewed above, the majority of constraint-based modeling strategies make negligible use of systems-scale metabolite data in their calculations. The requirement that organisms adhere not only to stoichiometric mass conservation, but also to thermodynamic restrictions on energy and entropy, provides one means of introducing metabolite concentrations into the constraints. Several constraint-based model techniques make use of metabolite or metabolomics data in this fashion.

Network-embedded Thermodynamic (NET) Analysis combines pre-determined flux directions with quantitative metabolomics datasets and the metabolite Gibbs energy of formations to determine the feasible ranges of Gibbs free energy of reaction throughout the system [

Summary of Software Tools Presented in This Manuscript.

ChromA | [ |
GC-MS Peak Alignment |

Metab | [ |
GC-MS Data Statistical Analysis Package |

MetaboAnalyst 2.0 | [ |
Web-based Metabolomics Data Processing Pipeline |

MetAlign | [ |
GC-MS and LC-MS Data Processing Pipeline |

Mzmine 2 | [ |
MS Data Processing Pipeline |

SpectConnect | [ |
GC-MS Peak Alignment |

Xalign | [ |
LC-MS Data Pre-processing |

XCMS Online | [ |
Web-based Untargeted Metabolomics Pipeline |

anNET | [ |
MATLAB-based NET analysis |

CellNetAnalyzer | [ |
MATLAB-based Metabolic and Signal Network Analysis |

COBRA Toolbox | [ |
MATLAB-based FBA Toolbox Suite |

OptFlux | [ |
Open Source, Modular Constraint-based Model Strain Design Software Toolbox |

Systems Biology Research Tool | [ |
Open Source, Modular Systems Biology Computational Tool |

GapFind, GapFill | [ |
Automated Network Gap Identification and Hypothesis Generation |

GeneForce | [ |
Regulatory Rule Correction for Integrated Metabolic and Regulatory Models |

MetRxn | [ |
Web-based Knowledgebase Comparison Tool |

Model SEED | [ |
Web-based Generation, Optimization and Analysis of Genome-scale Metabolic Models |

BioCyc | [ |
Genome and pathway database for >2000 organisms |

BRENDA | [ |
Comprehensive enzyme database, ~5000 enzymes |

ChEBI | [ |
Biologically relevant small molecules and their properties |

KEGG | [ |
Genomes, enzymatic pathways, and biological chemicals |

MetaCyc | [ |
>1,900 metabolic pathways from >2,200 different organisms |

PubChem | [ |
Biological activity and structures of small molecules |

NET analysis of a metabolomics dataset for

Henry

Garg ^{13}C MFA calculations to produce thermodynamically consistent fluxes [

While TMFA and NET analysis have been the prevailing approaches used to incorporate metabolite concentrations and thermodynamic constraints into constraint-based modeling approaches, several other methods have been developed. For example, Bordel

Constraint-based models have successfully directed numerous metabolic engineering projects. However, by construction they often ignore or have trouble dealing with dynamic metabolite behaviors that may have significant impact on final product titers, and in general they only indirectly make use of metabolite concentration measurements. Improved knowledge of network structures and strategies for dealing with parametric uncertainty have made ordinary differential equation (ODE) based models of metabolic kinetics increasingly viable tools for strain design. These methods explicitly model intracellular concentrations, making them attractive and convenient frameworks for integrating metabolomics datasets.

Kinetic models are built around explicit mathematical descriptions of enzyme-metabolite interactions. Natural choices for kinetic rate laws are mass-action kinetics and Michaelis-Menten kinetics, but a review by Heijnen highlights several approximate rate laws that require fewer parameters and are relevant to metabolic engineering applications [

In light of recent genome-scale reconstruction efforts, several investigators have sought to assess the properties of several of these approximate forms in the context of metabolic networks. Examples of this approach include study of the glycolytic pathway in

Independent of metabolomics, kinetic models have already been applied in several recent metabolic engineering contexts. Rasler

Ensemble approaches are also promising, and are in part a response to issues of parametric “sloppiness” which can preclude precise determination of kinetic parameters [

None of the aforementioned ODE models explicitly look to integrate metabolomics data into their analyses, due in part to the only recent development of more quantitative metabolomics techniques. While scarce, there have been some efforts to use metabolomics data to improve or exploit ODE-based models of metabolism. For example, Klimacek

Two recent examples have attempted to apply metabolomics measurements to models of an entire metabolic network. Yizhak

A mass action stoichiometric simulation (MASS) modeling strategy described by Jamshidi

Notably, these last two methods both take advantage of constraint-based modeling strategies, but result in ODE-based kinetic models that can subsequently be used for strain design. This reflects the complementary nature of metabolite fluxes and concentrations, especially when faced with system-wide parametric uncertainty. However, to fully capture the wide-ranging dynamics that directly and indirectly contribute to the often subtle and nonintuitive behaviors exhibited in engineered strains, additional model detail is necessary. More advanced modeling strategies will need to find ways to integrate additional information, including proteomics and transcriptomics, to meet this need.

Metabolomics is the global analysis of the metabolic content of a living system. While it has found increasing application in fundamental biological research and in fields of clinical interest (e.g. disease biomarker discovery), there is surprisingly little use of metabolomics approaches to drive metabolic engineering efforts. Existing experimental approaches to supplement rational metabolic engineering efforts typically focus instead on the determination of flux with MFA techniques, or the use of enzyme assays and analytical platforms such as HPLC for highly targeted metabolite measurements.

While global analysis methods have been used to better predict and assess the effects of metabolic engineering modifications, the techniques most typically used have been transcriptomic or proteomic analyses – not metabolomics. While this may have previously been due to the relative immaturity of metabolomics techniques, the current technology in the field should allow for easy integration of metabolomics into metabolic engineering workflows.

Direct applications of metabolomics datasets to metabolic engineering include expanding the existing narrowly targeted analysis methods to a broader scope, identifying non-intuitive mutations in strains produced by directed evolution, and adding direct metabolic context to other global analysis datasets. Computational approaches have also begun to integrate metabolomics datasets through thermodynamic constraints in constraint-based models, or even more directly in the case of some kinetic models.

However, long-term strategies will need to find novel ways of incorporating the system-wide perspective provided by metabolomics and other global analysis methods. Such approaches will facilitate strain design based on increasingly detailed mechanistic descriptions and enable us to engineer strains towards any arbitrary product, not just those well-suited to high-throughput screens and directed evolution. Computational methods have a great deal of potential here. In the case of kinetic models, combining the metabolome and proteome can help address issues of

The authors thank M. Smith and K. Vermeersch for helpful feedback on the manuscript. This work was supported by a DARPA Young Faculty Award and NSF award #1125684. RAD was also supported by the NSF Stem Cell Biomanufacturing IGERT program.

The authors declare no conflict of interest.