# Data Pre-Processing for Label-Free Multiple Reaction Monitoring (MRM) Experiments

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Data Sets

#### 2.1. Data Set 1: Rat Brian Post Synaptic Density

#### 2.2. Data Set 2: Cysteine String Protein

#### 2.3. Data Set 3: S. Pyogenes

#### 2.4. Data Structure

**Figure 1.**(

**a**) Example of three Multiple Reaction Monitoring (MRM) extracted ion chromatograms from a single precursor ion (

**b**) Data structure grouped by precursor ion (peptide) [36].

## 3. Data Pre-Processing

#### 3.1. Quality Assessment

#### 3.1.1. Adjusted Retention Time Deviation and Outlier Detection

**Figure 2.**(

**a**) Retention time deviations show global shifts between samples, (

**b**) Coefficient of variation (CV) without retention time adjustment, (

**c**) CV after adjustment, (

**d**) Comparisons between the original and adjusted retention time CVs with diagonal grey line. Red dots correspond to nine transitions with original CV < 4% and adjusted CV > 7%.

#### 3.1.2. Peptide-Level Intensity Estimation via Robust Linear Regression and Measure of Outlying Peak Area

_{ij}denotes log2-scaled peak area from transition i and sample j, μ is the average log2 peak area, α

_{i}is the effect of the i-th transition, and μ

_{j}is the effect of the j-th run. For identifiability, sum of all α

_{i}is set to be zero.

_{ij}= α

_{i}+ μ

_{j}+ ϵ

_{ij}, ϵ

_{ij}~ i.i. d N (0,σ

^{2})

_{i}) after log-transformation. For example, 2

^{α1−α2}denotes the peak intensity ratio between transition 1 and transition 2. Model parameters (μ, α

_{i}, β

_{j}, σ

^{2}) can be estimated by robust linear regression. It is an attractive alternative to the ordinary linear regression model when there are concerns about potential outliers. While the ordinary method estimates the parameters by minimizing sum of squared residuals, the robust approach calculates a weight w

_{ij}for each observation according to the magnitude of its residual and then, tries to minimize the sum of weighted squared residuals. Iteratively reweighted lease-squares (IRLS) is adopted to estimate the model parameters and weights alternately until convergence. The weight has a value between 0 and 1. A larger weight for a peak implies a more consistent peak behavior with regards to the overall pattern. It can be used as an individual peak quality measure. Figure 3 shows an example of the robust model fitted for PSD data. Figure 3a displays log2 transformed peak area from five transitions fragmented from a doubly-charged peptide LTGADGTPPGFLLK of M2OM protein. Peak intensities from different transitions are labeled with different colors and linked across 18 runs. Four transitions (y(10), red; y(12), yellow; y(7), light blue; y(9), blue) exhibit parallel patterns indicating the area ratio among these transitions are consistent across all the runs. On the other hand, y(6) transition (PGFLLK, green line) shows non-parallel relationship with the rest of transitions. For example, y(6) peak areas are slightly higher than y(7) peak areas for PSD1 runs, while they are much lower for PSD2 and 3 runs. This inconsistency in y(6) transition is well captured in its weights as shown in Figure 3b. The majority of weights estimated for y(6) peaks are around 0.5 or lower, indicating that many of y(6) intensities are outlying. In contrast, most of the weights from other transitions are 1.

**Figure 3.**(

**a**) Log2 transformed peak areas for five MRM transitions of M2OM peptide LTGADGTPPGFLLK (postsynaptic density (PSD) assay (n = 18)). Peak areas from the same MRM transition are connected with a unique color label. (

**b**) Weight as a measure of outlying behavior of each peak. (The weights of y(10), y(12), and y(7) are mostly 1, thus the corresponding red, yellow, and light blue curves are overlapping with the blue curve for y(9).) (

**c**) Distribution of weights for each transition.

#### 3.1.3. Use of Robust Model to Measure Inconsistent MRM Transitions

**Figure 4.**Peak shapes from two runs in PSD data from the same peptide as in Figure 2. Middle panel shows the peaks of a problematic transition by median weight by robust regression. Green dotted boxes show correct peak candidates.

**Figure 5.**Interquartile region of transition peak weights for transitions with median weight >0.9 and <0.9 for (

**a**) PSD data and (

**b**) Cysteine string protein (CSP) data.

#### 3.1.4. Use of Robust Model to Assess the Sample Quality

_{j}= Σ

_{i}w

_{ij}is a total transition weight [38]. However, this estimate depends on the standard deviation estimate and does not consider heterogeneity among peptides. Furthermore, it does not take into account the variation in the number of transitions per peptide or the presence of poor quality fragments. Normalized unscaled standard error (NUSE) was proposed as a measure of overall sample quality using the weights [38]. Briefly, it first replaces by 1. Then, the authors suggested to normalize the value through dividing it by its median across all samples, i.e.,

#### 3.2. Data Normalization

#### 3.2.1. Source of Experimental Artifacts

**Figure 6.**Individual weights are graphically summarized and Normalized unscaled standard error (NUSE) distributions are plotted for (

**a**), (

**c**) PSD and (

**b**), (

**d**) CSP data, respectively.

**Figure 7.**Comparison between two technical replicates from CSP knockout (KO) sample (

**a**) and overall peak area distribution among CSP KO and wild type (WT) samples (

**b**).

#### 3.2.2. Using All the Transitions in the Data

#### 3.2.3. Using a Subset of the Data

#### 3.2.4. Comparison of the Normalization Performance

_{kj}= μ + γ

_{k}+ ϵ

_{ij}, ϵ

_{ij}~ i.i. d N (0,σ

^{2})

_{i}. The unbiased estimator of the experimental noise (σ

^{2}) is obtained by mean squared error (MSE), which divides the sum of squared residuals by its degrees of freedom. A better normalization approach should have a lower MSE, and reduces non-biological variations among replicated samples. The CV is obtained as a ratio of root MSE and the average (log2) peak area for each transition or peptide. Median CV for each data set is described in Table 1 across multiple normalization methods. It reveals that any normalization approaches assist reducing the CV.

**Table 1.**Median coefficient of variation for the five normalization methods considered (with median absolute deviation).

Data Set | Original | Global Median | Quantile | Cyclic Loess | IS.Median | Invariant Set |
---|---|---|---|---|---|---|

PSD | 4.18% (1.96%) | 2.15% (1.41%) | 1.97% (1.12%) | 2.12% (1.39%) | 3.09% (1.51%) | 1.99% (1.29%) |

CSP | 4.04% (2.82%) | 3.47% (2.75%) | 2.85% (2.03%) | 3.26% (2.16%) | NA | 3.29% (2.43%) |

S. Pyogene | 3.05% (2.78%) | 3.47% (2.84%) | 2.34% (2.02%) | 2.85% (3.15%) | 3.12% (3.32%) | 2.69% (2.74%) |

^{2}) is obtained by MSE. The ratio of the MSE is evaluated from normalized data to the MSE from the original data. The ratio is calculated for each transition and the ratio distributions are shown in Figure 9a–c. The within-group variance was reduced substantially especially when the data were collected over different biological replicates (a, c). Quantile and invariant set normalization performed better than the other methods. The use of housekeeping (c) proteins is also advantageous over the use of spiked-in peptides (a). Additionally, we performed a differential expression analysis for the S. Pyogene data set for the comparison between normal and human plasma growth conditions. The Welch’s t-test was used to evaluate the peptide alteration and a p-value cutoff of 0.05 was used to determine significant changes. Subset-based normalization using house-keeping peptides detected the same number of differentially expressed peptides with the original data, but it was able to identify significant alteration of both D-alanine—poly(phosphoribitol) ligase subunit 1 and 2 (DLTA and DLTC) which agree with previous studies [21,45]. Rank-invariant set normalization showed improved power to identify differentially expressed peptides compared to the original data analysis.

**Figure 8.**Comparisons of each normalization method vs. original data ((

**a**) PSD, (

**b**) CSP, and (

**c**) S. Pyogene). Each plot shows the log2 ratio of CVs on y-axis and the average peak area on x-axis. Black line is a smoothing curve illustrating the overall pattern and the red dashed line indicates equal CV between normalized and original data.

**Figure 9.**Differences between the within-group deviation between normalized and original data for (

**a**) PSD, (

**b**) CSP, and (

**c**) S. Pyogene data along with the two-group comparison result at 5% false discovery rate (

**d**).

## 4. Conclusions

## Acknowledgments

## Author Contributions

## Conflicts of Interest

## References and Notes

- Ong, S.; Blagoev, B.; Kratchmarova, I.; Kristensen, D.; Steen, H.; Pandey, A.; Mann, M. Stable isotope labeling by amino acids in cell culture, silac, as a simple and accurate approach to expression proteomics. Mol. Cell. Proteomics
**2002**, 1, 376–378. [Google Scholar] [CrossRef] - Ross, P.; Huang, Y.; Marchese, J.; Williamson, B.; Parker, K.; Hattan, S.; Khainovski, N.; Pillai, S.; Dey, S.; Daniels, S.; et al. Multiplexed protein quantitation in saccharomyces cerevisiae using amine-reactive isobaric tagging reagents. Mol. Cell. Proteomics
**2004**, 3, 1154–1169. [Google Scholar] [CrossRef] - Asara, J.; Christofk, H.; Freimark, L.; Cantley, L. A label-free quantification method by ms/ms tic compared to silac and spectral counting in a proteomics screen. Proteomics
**2008**, 8, 994–999. [Google Scholar] [CrossRef] - Luo, R.; Colangelo, C.M.; Sessa, W.C.; Zhao, H. Bayesian analysis of itraq data with nonrandom missingness: Identification of differentially expressed proteins. Stat. Biosci.
**2009**, 1, 228–245. [Google Scholar] [CrossRef] - Gerber, S.A.; Rush, J.; Stemman, O.; Kirschner, M.W.; Gygi, S.P. Absolute quantification of proteins and phosphoproteins from cell lysates by tandem ms. Proc. Natl. Acad. Sci. USA
**2003**, 100, 6940–6945. [Google Scholar] [CrossRef] - Dupuis, A.; Hennekinne, J.A.; Garin, J.; Brun, V. Protein standard absolute quantification (psaq) for improved investigation of staphylococcal food poisoning outbreaks. Proteomics
**2008**, 8, 4633–4636. [Google Scholar] [CrossRef] - Wolf-Yadlin, A.; Hautaniemi, S.; Lauffenburger, D.A.; White, F.M. Multiple reaction monitoring for robust quantitative proteomic analysis of cellular signaling networks. Proc. Natl. Acad. Sci. USA
**2007**, 104, 5860–5865. [Google Scholar] [CrossRef] - Domon, B.; Aebersold, R. Mass spectrometry and protein analysis. Science
**2006**, 312, 212–217. [Google Scholar] [CrossRef] - Picotti, P.; Bodenmiller, B.; Mueller, L.N.; Domon, B.; Aebersold, R. Full dynamic range proteome analysis of s. Cerevisiae by targeted proteomics. Cell
**2009**, 138, 795–806. [Google Scholar] [CrossRef] - Addona, T.A.; Abbatiello, S.E.; Schilling, B.; Skates, S.J.; Mani, D.R.; Bunk, D.M.; Spiegelman, C.H.; Zimmerman, L.J.; Ham, A.J.; Keshishian, H.; et al. Multi-site assessment of the precision and reproducibility of multiple reaction monitoring-based measurements of proteins in plasma. Nat. Biotechnol.
**2009**, 27, 633–641. [Google Scholar] [CrossRef] - Zhang, H.; Liu, Q.; Zimmerman, L.J.; Ham, A.J.; Slebos, R.J.; Rahman, J.; Kikuchi, T.; Massion, P.P.; Carbone, D.P.; Billheimer, D.; et al. Methods for peptide and protein quantitation by liquid chromatography-multiple reaction monitoring mass spectrometry. Mol. Cell. Proteomics
**2011**, 10, M110 006593. [Google Scholar] [CrossRef] - MacLean, B.; Tomazela, D.M.; Shulman, N.; Chambers, M.; Finney, G.L.; Frewen, B.; Kern, R.; Tabb, D.L.; Liebler, D.C.; MacCoss, M.J. Skyline: An open source document editor for creating and analyzing targeted proteomics experiments. Bioinformatics
**2010**, 26, 966–968. [Google Scholar] [CrossRef] - Abbatiello, S.E.; Mani, D.R.; Keshishian, H.; Carr, S.A. Automated detection of inaccurate and imprecise transitions in peptide quantification by multiple reaction monitoring mass spectrometry. Clin. Chem.
**2010**, 56, 291–305. [Google Scholar] [CrossRef] - Reiter, L.; Rinner, O.; Picotti, P.; Huttenhain, R.; Beck, M.; Brusniak, M.Y.; Hengartner, M.O.; Aebersold, R. Mprophet: Automated data processing and statistical validation for large-scale srm experiments. Nat. Methods
**2011**, 8, 430–435. [Google Scholar] [CrossRef] - Bolstad, B.M.; Irizarry, R.A.; Astrand, M.; Speed, T.P. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics
**2003**, 19, 185–193. [Google Scholar] [CrossRef] - Tseng, G.C.; Oh, M.K.; Rohlin, L.; Liao, J.C.; Wong, W.H. Issues in cdna microarray analysis: Quality filtering, channel normalization, models of variations and assessment of gene effects. Nucleic Acids Res.
**2001**, 29, 2549–2557. [Google Scholar] [CrossRef] - Ballman, K.V.; Grill, D.E.; Oberg, A.L.; Therneau, T.M. Faster cyclic loess: Normalizing rna arrays via linear models. Bioinformatics
**2004**, 20, 2778–2786. [Google Scholar] [CrossRef] - Smyth, G.K.; Speed, T. Normalization of cdna microarray data. Methods
**2003**, 31, 265–273. [Google Scholar] [CrossRef] - Gagnon-Bartsch, J.A.; Speed, T.P. Using control genes to correct for unwanted variation in microarray data. Biostatistics
**2012**, 13, 539–552. [Google Scholar] [CrossRef] - Chang, C.Y.; Picotti, P.; Huttenhain, R.; Heinzelmann-Schwarz, V.; Jovanovic, M.; Aebersold, R.; Vitek, O. Protein significance analysis in selected reaction monitoring (srm) measurements. Mol. Cell. Proteomics
**2012**, 11, M111 014662. [Google Scholar] [CrossRef] - Teleman, J.; Karlsson, C.; Waldemarson, S.; Hansson, K.; James, P.; Malmstrom, J.; Levander, F. Automated selected reaction monitoring software for accurate label-free protein quantification. J. Proteome Res.
**2012**, 11, 3766–3773. [Google Scholar] [CrossRef] - Colangelo, C.; Abbott, T.; Shifman, M.; Ivosev, G.; Chung, L.; Sakaue, F.; Cox, D.; Tate, S.A.; Nairn, A.; Rinehart, J.; Williams, K. Development of targeted proteomics assays. In Proceedings of the 60th ASMS Conference on Mass Spectrometry and Allied Topics, Vancouver, Canada, 20–24 May 2012.
- Colangelo, C.M.; Ivosev, G.; Chung, L.; Abbott, T.; Shifman, M.; Sakaue, F.; Cox, D.; Kitchen, R.; Burton, L.; Tate, S.A.; Gulcicek, E.; Bonner, R.; Rinehart, J.; Nairn, A.C.; Williams, K. Development of a highly automated and multiplexed targeted proteome pipeline and assay for 112 rat brain synaptic proteins. Mol. Cell. Proteomics
**2014**. submitted for publication. [Google Scholar] - Multiquant™ 2.1 software, 2.1 (research version); AB SCIEX: Concord, Canada, 2011.
- Shifman, M.; Li, Y.; Colangelo, C.; Stone, K.; Wu, T.; Cheung, K.; Miller, P.; Williams, K. Yped: A web-accessible database system for protein expression analysis. J. Proteome Res.
**2007**, 6, 4019–4024. [Google Scholar] [CrossRef] - Shifman, M.A.; Sun, K.; Colangelo, C.M.; Cheung, K.H.; Miller, P.; Williams, K. Yped: A proteomics database for protein expression analysis. AMIA Annu. Symp. Proc.
**2005**, 2005, 1111. [Google Scholar] - Shifman, M.; Li, Y.; Colangelo, C.; Stone, K.; Wu, T.; Cheung, K.; Miller, P.; Williams, K. Yale protein expression database. Available online: http://yped.med.yale.edu/repository/ViewSeriesMenu.do?series_id=4522&series_name=Data+Pre-processing+for+label-free+multiple+reaction+monitoring+(MRM)+experiments/ (accessed on 6 January 2014).
- Chandra, S.; Gallardo, G.; Fernandez-Chacon, R.; Schluter, O.M.; Sudhof, T.C. Alpha-synuclein cooperates with cspalpha in preventing neurodegeneration. Cell
**2005**, 123, 383–396. [Google Scholar] - Fernandez-Chacon, R.; Wolfel, M.; Nishimune, H.; Tabares, L.; Schmitz, F.; Castellano-Munoz, M.; Rosenmund, C.; Montesinos, M.L.; Sanes, J.R.; Schneggenburger, R.; et al. The synaptic vesicle protein csp alpha prevents presynaptic degeneration. Neuron
**2004**, 42, 237–251. [Google Scholar] [CrossRef] - Zhang, Y.Q.; Henderson, M.X.; Colangelo, C.M.; Ginsberg, S.D.; Bruce, C.; Wu, T.; Chandra, S.S. Identification of cspalpha clients reveals a role in dynamin 1 regulation. Neuron
**2012**, 74, 136–150. [Google Scholar] [CrossRef] - Cox, D.M.; Tate, S.; Duchoslav, E. Automated ion selection and method building for mrm based protein validation and quantification. In Proceedings of ASMS, Indianapolis, IN, USA, 3–7 June 2007; p. MP501.
- Unwin, R.D.; Griffiths, J.R.; Whetton, A.D. A sensitive mass spectrometric method for hypothesis-driven detection of peptide post-translational modifications: Multiple reaction monitoring-initiated detection and sequencing (midas). Nat. Protoc.
**2009**, 4, 870–877. [Google Scholar] - Burton, L.; Ivosev, G.; Lau, A.; Bonner, R. A novel algorithm for quantitative lc peak integration. In Proceedings of ASMS, Philadelphia, PA, USA, 31 May–4 June 2009; p. Poster I-224.
- Cunningham, M.W. Pathogenesis of group a streptococcal infections. Clin. Microbiol. Rev.
**2000**, 13, 470–511. [Google Scholar] [CrossRef] - The swedish storage initiative (swestore). Available online: http://webdav.swegrid.se/snic/bils/lu_proteomics/pub/anubis_data.zip (accessed on 30 April 2013).
- Colangelo, C.M.; Chung, L.; Bruce, C.; Cheung, K.H. Review of software tools for design and analysis of large scale mrm proteomic datasets. Methods
**2013**, 61, 287–298. [Google Scholar] [CrossRef] - Irizarry, R.A.; Hobbs, B.; Collin, F.; Beazer-Barclay, Y.D.; Antonellis, K.J.; Scherf, U.; Speed, T.P. Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics
**2003**, 4, 249–264. [Google Scholar] [CrossRef] - Brettschneider, J.; Collin, F.; Bolstad, B.M.; Speed, T.P. Quality assessment for short oligonucleotide microarray data. Technometrics
**2008**, 50, 241–264. [Google Scholar] [CrossRef] - Yang, Y.H.; Dudoit, S.; Luu, P.; Lin, D.M.; Peng, V.; Ngai, J.; Speed, T.P. Normalization for cdna microarray data: A robust composite method addressing single and multiple slide systematic variation. Nucleic Acids Res
**2002**, 30, e15. [Google Scholar] [CrossRef] - Ejigu, B.A.; Valkenborg, D.; Baggerman, G.; Vanaerschot, M.; Witters, E.; Dujardin, J.C.; Burzykowski, T.; Berg, M. Evaluation of normalization methods to pave the way towards large-scale lc-ms-based metabolomics profiling experiments. OMICS
**2013**, 17, 473–485. [Google Scholar] [CrossRef] - Mar, J.C.; Kimura, Y.; Schroder, K.; Irvine, K.M.; Hayashizaki, Y.; Suzuki, H.; Hume, D.; Quackenbush, J. Data-driven normalization strategies for high-throughput quantitative rt-pcr. BMC Bioinformatics
**2009**, 10, 110. [Google Scholar] [CrossRef] - Qureshi, R.; Sacan, A. A novel method for the normalization of microrna rt-pcr data. BMC Med. Genomics
**2013**, 6, S14. [Google Scholar] [CrossRef] - Martins-de-Souza, D.; Alsaif, M.; Ernst, A.; Harris, L.W.; Aerts, N.; Lenaerts, I.; Peeters, P.J.; Amess, B.; Rahmoune, H.; Bahn, S.; et al. The application of selective reaction monitoring confirms dysregulation of glycolysis in a preclinical model of schizophrenia. BMC Res. Notes
**2012**, 5, 146. [Google Scholar] [CrossRef] [Green Version] - Schadt, E.E.; Li, C.; Ellis, B.; Wong, W.H. Feature extraction and normalization algorithms for high-density oligonucleotide gene expression array data. J. Cell Biochem. Suppl.
**2001**, 84, 120–125. [Google Scholar] [CrossRef] - Ferretti, J.J.; McShan, W.M.; Ajdic, D.; Savic, D.J.; Savic, G.; Lyon, K.; Primeaux, C.; Sezate, S.; Suvorov, A.N.; Kenton, S.; et al. Complete genome sequence of an m1 strain of streptococcus pyogenes. Proc. Natl. Acad. Sci. USA
**2001**, 98, 4658–4663. [Google Scholar] [CrossRef] - Hongyu Zhao’s lab statistical genomics and proteomics. Available online: http://zhaocenter.org/ (accessed on 5 May 2014).

© 2014 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).

## Share and Cite

**MDPI and ACS Style**

Chung, L.M.; Colangelo, C.M.; Zhao, H.
Data Pre-Processing for Label-Free Multiple Reaction Monitoring (MRM) Experiments. *Biology* **2014**, *3*, 383-402.
https://doi.org/10.3390/biology3020383

**AMA Style**

Chung LM, Colangelo CM, Zhao H.
Data Pre-Processing for Label-Free Multiple Reaction Monitoring (MRM) Experiments. *Biology*. 2014; 3(2):383-402.
https://doi.org/10.3390/biology3020383

**Chicago/Turabian Style**

Chung, Lisa M., Christopher M. Colangelo, and Hongyu Zhao.
2014. "Data Pre-Processing for Label-Free Multiple Reaction Monitoring (MRM) Experiments" *Biology* 3, no. 2: 383-402.
https://doi.org/10.3390/biology3020383