# A Computational Framework for High-Throughput Isotopic Natural Abundance Correction of Omics-Level Ultra-High Resolution FT-MS Datasets

^{*}

## Abstract

**:**

^{13}C and

^{15}N isotopes on a set of raw isotopologue intensities of UDP-N-acetyl-D-glucosamine derived from a

^{13}C/

^{15}N-tracing experiment. Finally, we demonstrate the algorithm on a full omics-level dataset.

## 1. Introduction

## 2. Methodology for Peak Correction

^{13}C, while Equation (3) gives the calculation of the S-correction terms.

## 3. Software Design and Methods

#### 3.1. Language and Library Choices

#### 3.2. Data Flow

#### 3.3. P and S Caching

#### 3.4. Correction Algorithm, Constructors, and Modularity

**Figure 1.**Procedural diagram of the isotopic natural abundance correction algorithm. Starting with the shape and order of the set of observed isotopologues, the algorithm is initialized, followed by the calculation of the P and S tables or their recovery from a cache. Next, the corrected isotopologue intensities (I

_{corrected}) are calculated from the observed isotopologue intensities (I

_{data}). Then the isotopical natural abundance contaminated intensities (I

_{datacalc}) are calculated from the corrected intensities. The calculated and observed intensities are compared. If an improvement is made, the calculation cycle is repeated.

_{data}are tuples of length n, where n is the number of labeling sources used in the experiment (i.e., the data’s dimensionality). The data’s shape defines the maximum isotope count for each labeling source (i.e., C

_{Max}for

^{13}C), while its order defines which isotope label corresponds to which dimension. For example, a shape of (23,54) and an order of (“15N”,”13C”) corresponds to the dataset of a molecule with 23 nitrogen atoms and 54 carbon atoms respectively and indicates that

^{15}N and

^{13}C were used as labels. The P and S lookup tables are calculated based on the dataset’s order and shape, however if these tables are supplied to the algorithm’s constructor via a cache (see Section 3.3), these calculations will be skipped. Another advantage of using an algorithm-object design is that each instance of the algorithm can be optimized at runtime, depending on the dataset’s dimensionality. Specifically, the overhead necessary for converting a multidimensional index to a flat index is not required when the algorithm is operating on a dataset where only a single isotopic label was used. The implementation takes this into account in the algorithm’s constructor method and uses a much faster indexing function when operating on a one-dimensional dataset.

**Figure 2.**Algorithm generalization and class relations in the modularization of the code. (

**a**) The orange xnacrange object is generalized to handle the different summations in each formula. The light green plookup object generalizes the P table representing different sets of binomial terms specific to each formula. The purple slookup object generalizes the S table representing different sets of summative binomial terms specific to each formula. (

**b**) The PyNAC module has several classes separated into the green Core submodule or the red Analysis submodule. The blue multiprocessing module is provided by the standard Python language library. The yellow numpy module is the only additional python library that is necessary. The NACorrector class implements the main correction algorithm using the ndarray class from the nympy module. The PenultimateNACIter, NAProduct, and NASumProduct classes implement the orange xnacrange, light green plookup, and purple slookup objects.

#### 3.5. Quality Control

#### 3.6. Implementations of Binomial Terms

#### 3.7. Cell Culture and FT-ICR-MS

^{13}C data is from glycerophospholipids separated from crude cell extracts derived from MCF7-LCC2 cells in tissue culture after 24 h of labeling with uniformly labeled

^{13}C-glucose. The doubly labeled

^{13}C/

^{15}N data is from polar metabolites separated from crude cell extracts derived from MCF7-LCC2 cells in tissue culture after 24 h of labeling with uniformly labeled

^{13}C/

^{15}N glutamine. Samples were directly infused in positive (glycerophospholipid) and negative (metabolites) ion modes on a hybrid linear ion trap 7T FT-ICR mass spectrometer (Finnigan LTQ FT, Thermo Electron, Bremen, Germany) equipped with a TriVersa NanoMate ion source (Advion BioSciences, Ithaca, NY, USA), with peaks identified as previously described [6].

## 4. Results and Discussion

#### 4.1. Validation of the Algorithm

_{17}H

_{27}N

_{3}O

_{17}P

_{2}). In all cases, the difference between the original implementation and the new implementation were either zero or insignificant (i.e., <10

^{−9}). Furthermore, this approach cross-validates all parts of the iterative single-isotope Python implementation at once, including both the subtractNA and addNA functions.

**Table 1.**Comparison of the old Perl and new Python single-isotope algorithm implementations using isotopologues of UDP-GlcNAc.

^{13}C Count ^{a} | Intensity ^{b} | Python (New) ^{c} | Perl (Old) ^{d} | Difference |
---|---|---|---|---|

5 | 187.9 | 214.81 | 214.81 | 2.27 × 10^{−}^{10} |

6 | 60.5 | 39.81 | 39.81 | 1.79 × 10^{−}^{11} |

7 | 109.8 | 116.15 | 116.15 | 1.78 × 10^{−}^{10} |

8 | 418.4 | 449.36 | 449.36 | 3.58 × 10^{−}^{10} |

9 | 23.1 | 0 | 0 | 0 |

10 | 165 | 176.39 | 176.39 | 3.68 × 10^{−}^{10} |

11 | 1438 | 1,523.77 | 1,523.77 | 2.63× 10^{−}^{9} |

12 | 1,215.9 | 1,183.78 | 1,183.78 | 3.59 × 10^{−}^{9} |

13 | 4,235.8 | 4,360.57 | 4,360.57 | 3.63 × 10^{−}^{9} |

14 | 1,562.5 | 1,420.73 | 1,420.73 | 2.17 × 10^{−}^{9} |

15 | 1,253.9 | 1,231.68 | 1,231.68 | 4.81 × 10^{−}^{9} |

16 | 175.8 | 149.9 | 149.9 | 4.44 × 10^{−}^{10} |

^{a}Zero valued isotopologue intensities have been omitted from the table for the sake of brevity;

^{b}Observed uncorrected isotopologue intensities;

^{c}Corrected intensities using the Python implementation;

^{d}Corrected intensities using the older Perl implementation.

^{13}C and

^{15}N single-isotope datasets, each with normalized isotopologue intensities that sum to 1 and each representing a molecule with 9 carbon atoms and 6 nitrogen atoms, respectively.

^{13}C simulated dataset and a

^{15}N simulated dataset, with the results also shown in Table 2. Next, we calculated the vector outer products of both the simulated datasets and the simulated natural abundance tainted datasets to produce matrices representing a multi-isotope isotopologue intensity dataset for a molecule with both 9 carbon and 6 nitrogen atoms, as shown in Figure 3a,b respectively. Then, we applied the multi-isotope Python implementation to the matrix in Figure 3a to produce the natural abundance corrected matrix in Figure 3c. Next, we took the absolute difference between the matrices in Figure 3b,c, which is shown in Figure 3d. All of the specific differences in the Figure 3d matrix elements were either zero or below 10

^{−16}. Also, this approach cross-validates all parts of the iterative multi-isotope Python iteration at once, including both the specific subtractNA and addNA functions. Furthermore, these results demonstrate the numerical stability of the multi-isotope implementation, even though the algorithm is dealing with NxM data points and not just N data points.

**Figure 3.**Validation of the multi-isotope natural abundance correction algorithm. (

**a**) The matrix outlined by the black lines represents isotopologue intensities with

^{13}C and

^{15}N isotopes from both a labeling source and natural abundance. It is calculated from the vector outerproduct of two single isotope labeled vectors of isotopologue intensities. These single-labeledvectors represent the addition of

^{13}C/

^{15}N natural abundance to the corresponding single-labeled vectors in (

**b**). Each vector and matrix of intensities is normalized to a sum of 1. (

**b**) The matrix outlined by the black lines represents isotopologue intensities with

^{13}C and

^{15}N isotopes from only a labeling source. It is calculated from the vector outer product of two single isotope labeled vectors of isotopologue intensities. (

**c**) This matrix is the result of just one iteration of the multi-isotope natural abundance correction algorithm implemented in the Python programming language. (

**d**) This matrix is the absolute difference between the matrices in (

**b**) and (

**c**). No element is larger than 10

^{−17}.

^{13}C Count | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |

Simulated | 0.5 | 0 | 0 | 0.15 | 0.1 | 0 | 0 | 0 | 0 | 0.25 |

addNA | 0.4523 | 0.0456 | 0.0020 | 0.1403 | 0.1040 | 0.0056 | 1.2 × 10^{−}^{4} | 1.4 × 10^{−}^{6} | 7.6 × 10^{−}^{9} | 0.25 |

^{15}N Count | 0 | 1 | 2 | 3 | 4 | 5 | 6 | - | - | - |

Simulated | 0.5 | 0 | 0 | 0.1 | 0 | 0 | 0.4 | - | - | - |

addNA | 0.4890 | 0.0109 | 0.0001 | 0.0989 | 0.0011 | 4 × 10^{−}^{6} | 0.4 | - | - | - |

#### 4.2. Numerical Analysis of Interleaving Method

^{−14}, well below any level that one would call significant in double precision math. The “choose” and “comb” methods gave identical values throughout, likely due to Pythons use of “long” integers with arbitrary precision. The “comb2” method gave small differences, as the log-gamma method is less accurate than using factorials or multiplicative formulations of the binomial. For a discussion of the stability of the interleaving method see the supplemental materials.

org | comb | comb2 | choose | logReal | |
---|---|---|---|---|---|

org | 0 | −2.36 × 10^{−}^{16} | −5.67 × 10^{−}^{14} | −2.36 × 10^{−}^{16} | −2.36 × 10^{−}^{15} |

comb | - | - | −5.66 × 10^{−}^{14} | 0 | −2.25 × 10^{−}^{15} |

comb2 | - | - | - | 5.66 × 10^{−}^{14} | 5.48 × 10^{−}^{14} |

choose | - | - | - | - | −2.25 × 10^{−}^{15} |

#### 4.3. Application to Observed Isotopologues of UDP-GlcNAc

^{13}C/

^{15}N isotopologue intensities for uridine diphosphate-N-acetyl-D-glucosamine or UDP-GlcNAc (C17H27N3O17P2). There are significant changes in quite a few of the isotopologues, but especially I

_{M+1,0}, I

_{M+1,1}, I

_{M+1,2}, I

_{M+4,2}, I

_{M+1,3}, and I

_{M+4,3}, where I

_{M+i,j}refers to the incorporation of i

^{13}C nuclei and j

^{15}N nuclei. In fact, the effects of isotopic natural abundance are more dramatic than what is seen for single labeling experiments. This is to be expected since the effects of isotopic natural abundance for two elements is naturally greater than for either element.

**Figure 4.**Corrected and observed

^{13}C/

^{15}N isotopologues of UDP-GlcNAc. Each graph represents a set of

^{13}C-labeled isotopologues with a specific number of

^{15}N nuclei incorporated. I

_{M+i,0}, I

_{M+i,1}, I

_{M+i,2}, and I

_{M+i,3}represent 0,1,2, and 3

^{15}N nuclei. Observed intensities are in red and the isotopic natural abundance corrected intensities are in blue. The calculation of the corrected intensities required 12 iterations of the algorithm.

#### 4.4. Running Time

^{®}Xeon X5650 processor running at 2.67 GHz. Also, these timings used a data file of 9,066 different metabolites, with an average of 5 isotopologue peaks per metabolite.

## 5. Conclusions

## Software Availability

## Acknowledgments

## Conflicts of Interest

## References

- Rittenberg, D.; Schoenheimer, R. Deuterium as an indicator in the study of intermediary metabolism. J. Biol. Chem.
**1937**, 121, 235. [Google Scholar] - Schoenheimer, R.; Rittenberg, D. The study of intermediary metabolism of animals with the aid of isotopes. Physiol. Rev.
**1940**, 20, 218. [Google Scholar] - Schoenheimer, R.; Rittenberg, D. Deuterium as an indicator in the study of intermediary metabolism. J. Biol. Chem.
**1935**, 111, 163. [Google Scholar] - Boros, L.G.; Brackett, D.J.; Harrigan, G.G. Metabolic biomarker and kinase drug target discovery in cancer using stable isotope-based dynamic metabolic profiling (SIDMAP). Curr. Cancer Drug Tar.
**2003**, 3, 445–453. [Google Scholar] [CrossRef] - Fan, T.W.; Lane, A.N.; Higashi, R.M.; Farag, M.A.; Gao, H.; Bousamra, M.; Miller, D.M. Altered regulation of metabolic pathways in human lung cancer discerned by (13)C stable isotope-resolved metabolomics (SIRM). Mol. Cancer
**2009**, 8, 41. [Google Scholar] [CrossRef] - Lane, A.N.; Fan, T.W.M.; Xie, Z.; Moseley, H.N.B.; Higashi, R.M. Isotopomer analysis of lipid biosynthesis by high resolution mass spectrometry and NMR. Anal. Chim. Acta
**2009**, 651, 201–208. [Google Scholar] [CrossRef] - Moseley, H.N. Correcting for the effects of natural abundance in stable isotope resolved metabolomics experiments involving ultra-high resolution mass spectrometry. BMC Bioinformatics
**2010**, 11, 139. [Google Scholar] [CrossRef] - Pingitore, F.; Tang, Y.J.; Kruppa, G.H.; Keasling, J.D. Analysis of amino acid isotopomers using FT-ICR MS. Anal. Chem.
**2007**, 79, 2483–2490. [Google Scholar] [CrossRef] - Moseley, H.N.; Lane, A.N.; Belshoff, A.C.; Higashi, R.M.; Fan, T.W. A novel deconvolution method for modeling UDP-N-acetyl-D-glucosamine biosynthetic pathways based on (13)C mass isotopologue profiles under non-steady-state conditions. BMC Biol.
**2011**, 9, 37. [Google Scholar] [CrossRef] - Dauner, M.; Sauer, U. GC-MS Analysis of amino acids rapidly provides rich information for isotopomer balancing. Biotechnol. Progr.
**2000**, 16, 642–649. [Google Scholar] [CrossRef] - Fischer, E.; Sauer, U. Metabolic flux profiling of Escherichia coli mutants in central carbon metabolism using GC-MS. Eur. J. Biochem.
**2003**, 270, 880–891. [Google Scholar] [CrossRef] - Hellerstein, M.K.; Neese, R.A. Mass isotopomer distribution analysis at eight years: Theoretical, analytic, and experimental considerations. Am. J. Physiol.—Endoc. M.
**1999**, 276, E1146–E1170. [Google Scholar] - Lee, W.N.P.; Byerley, L.O.; Bergner, E.A.; Edmond, J. Mass isotopomer analysis: Theoretical and practical considerations. Biol. Mass Spectrom.
**1991**, 20, 451–458. [Google Scholar] [CrossRef] - Snider, R. Efficient calculation of exact mass isotopic distributions. JASMS
**2007**, 18, 1511–1515. [Google Scholar] - Van Winden, W.; Wittmann, C.; Heinzle, E.; Heijnen, J. Correcting mass isotopomer distributions for naturally occurring isotopes. Biotechnol. Bioeng.
**2002**, 80, 477–479. [Google Scholar] [CrossRef] - Wahl, S.A.; Dauner, M.; Wiechert, W. New tools for mass isotopomer data evaluation in 13C flux analysis: Mass isotope correction, data consistency checking, and precursor relationships. Biotechnol. Bioeng.
**2004**, 85, 259–268. [Google Scholar] [CrossRef] - Zhang, X.; Hines, W.; Adamec, J.; Asara, J.; Naylor, S.; Regnier, F. An automated method for the analysis of stable isotope labeling data in proteomics. JASMS
**2005**, 16, 1181–1191. [Google Scholar] - Fernandez, C.A.; Des Rosiers, C.; Previs, S.F.; David, F.; Brunengraber, H. Correction of 13C mass isotopomer distributions for natural stable isotope abundance. J. Mass Spectrom.
**1996**, 31, 255–262. [Google Scholar] [CrossRef] - Rockwood, A.L.; Haimi, P. Efficient calculation of accurate masses of isotopic peaks. JASMS
**2006**, 17, 415–419. [Google Scholar] - Rockwood, A.L.; van Orden, S.L. Ultrahigh-speed calculation of isotope distributions. Anal. Chem.
**1996**, 68, 2027–2030. [Google Scholar] [CrossRef] - Yergey, J.A. A general approach to calculating isotopic distributions for mass spectrometry. Int. J. Mass Spectrom. Ion Phys.
**1983**, 52, 337–349. [Google Scholar] [CrossRef] - Rossum, G.V. The Python Programming Language. Available online: http://www.python.org/ (accessed on 21 July 2013).
- Sanner, M.F. Python: A programming language for software integration and development. J. Mol. Graph. Model.
**1999**, 17, 57–61. [Google Scholar] - Oliphant, T.E. A Guide to NumPy; Trelgol Publishing: Spanish Fork, UT, USA, 2006; Volume 1, pp. 1–371. [Google Scholar]
- Gamma, E. Design Patterns: Elements of Reusable Object-Oriented Software; Addison-Wesley Professional: Boston, MA, USA, 1995; pp. 1–416. [Google Scholar]
- Moseley, H.N.B. Error analysis and propagation in metabolomics data analysis. Comp. Struct Biotech. J.
**2013**, 4, e201301006. [Google Scholar] - Moseley Bioinformatics Laboratory Software Repository for download. Available online: http://bioinformatics.cesb.uky.edu/bin/view/Main/SoftwareDevelopment/ (accessed on 21 July 2013).

## Supplementary Files

**Supplementary File 1:**

© 2013 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).

## Share and Cite

**MDPI and ACS Style**

Carreer, W.J.; Flight, R.M.; Moseley, H.N.B.
A Computational Framework for High-Throughput Isotopic Natural Abundance Correction of Omics-Level Ultra-High Resolution FT-MS Datasets. *Metabolites* **2013**, *3*, 853-866.
https://doi.org/10.3390/metabo3040853

**AMA Style**

Carreer WJ, Flight RM, Moseley HNB.
A Computational Framework for High-Throughput Isotopic Natural Abundance Correction of Omics-Level Ultra-High Resolution FT-MS Datasets. *Metabolites*. 2013; 3(4):853-866.
https://doi.org/10.3390/metabo3040853

**Chicago/Turabian Style**

Carreer, William J., Robert M. Flight, and Hunter N. B. Moseley.
2013. "A Computational Framework for High-Throughput Isotopic Natural Abundance Correction of Omics-Level Ultra-High Resolution FT-MS Datasets" *Metabolites* 3, no. 4: 853-866.
https://doi.org/10.3390/metabo3040853