Validation of De Novo Peptide Sequences with Bottom-Up Tag Convolution
Abstract
:1. Introduction
2. Materials and Methods
2.1. Generation of k-Tags
2.2. Bottom-Up Tag Convolution
- The number of tags originating from the same peptide is typically quite large;
- The mass offsets of tags matching the same peptide are usually close; thus, their differences are accurate;
- Unlike in the top-down case, Da deconvolution errors are rarely observed in the bottom-up MS/MS spectra.
2.3. Sequence Validation
- ;
- ;
- ;
- .
3. Results
3.1. Datasets
3.2. Sequence Validation
- ;
- The best alignment of against s resulted in the Hamming distance of at most 2 between the matched fragments for all the alignments satisfying the following conditions:
- (a)
- If , must be matched against a substring of s with length ;
- (b)
- Otherwise (i.e., if ), s must be matched against either the prefix or suffix of with length ;
- (c)
- Thereby, neither insertions nor deletions were allowed.
3.3. The TagConvolution Software Tool
4. Discussion
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
MS/MS | Tandem mass spectrometry; |
CAH2 | Carbonic anhydrase 2; |
DTT | Dithiothreitol; |
HCD | Higher-energy C-trap dissociation; |
CAD | Collisionally activated dissociation. |
Appendix A. Contaminants in the CAH2 Data Set
- sp|P62992|1-76 Ubiquitin-40S ribosomal protein S27a [Bos taurus]
- sp|P13696|PEBP1_BOVIN Phosphatidylethanolamine-binding protein 1 OS = Bos taurus GN = PEBP1 PE = 1 SV = 2
- gi|27806297|ref|NP_776676.1| flavin reductase (NADPH) [Bos taurus]
- gi|157830773|pdb|1CYO|A Chain A, Bovine Cytochrome B(5)
- gi|6006423|emb|CAB56828.1| hemoglobin alpha chain [Bos taurus]
- gi|77735367|ref|NP_001029380.1| ribonuclease UK114 [Bos taurus]
- gi|27807109|ref|NP_777040.1| superoxide dismutase [Cu-Zn] [Bos taurus]
- gi|296480569|tpg|DAA22684.1| TPA: thymosin, beta 4-like [Bos taurus]
- gi|149642641|ref|NP_001092620.1| D-dopachrome decarboxylase [Bos taurus]
- gi|28189771|dbj|BAC56500.1| similar to peptidylprolyl isomerase A (cyclophilin A), partial [Bos taurus]
- gi|29135329|ref|NP_803482.1| glutathione S-transferase P [Bos taurus]
- gi|114051361|ref|NP_001039513.1| selenium-binding protein 1 [Bos taurus]
- gi|59858077|gb|AAX08873.1| aspartate aminotransferase 1 [Bos taurus]
- gi|61888856|ref|NP_001013607.1| triosephosphate isomerase [Bos taurus]
- gi|27806591|ref|NP_776501.1| glutathione peroxidase 1 [Bos taurus]
- gi|75057676|sp|Q58DC0.1|CPPED_BOVIN RecName: Full = Serine/threonine-protein phosphatase CPPED1; AltName: Full = Calcineurin-like phosphoesterase domain-containing protein 1
- gi|134085635|ref|NP_001076965.1| lactoylglutathione lyase [Bos taurus]
- gi|62751849|ref|NP_001015572.1| protein DJ-1 [Bos taurus]
- gi|27819608|ref|NP_776342.1| hemoglobin subunit beta [Bos taurus]
- gi|114051487|ref|NP_001039526.1| cytochrome c [Bos taurus]
- gi|229552|prf||754920A albumin [Bos taurus]
- gi|77736203|ref|NP_001029800.1| malate dehydrogenase, cytoplasmic [Bos taurus]
- gi|58760467|gb|AAW82141.1| NDP kinase NBR-A [Bos taurus]
- gi|77735583|ref|NP_001029487.1| adenosylhomocysteinase [Bos taurus]
- gi|94966811|ref|NP_001035592.1| alpha-1-acid glycoprotein precursor [Bos taurus]
- gi|78365305|ref|NP_001030533.1| peptidyl-prolyl cis-trans isomerase FKBP1A [Bos taurus]
- gi|27807167|ref|NP_777068.1| peroxiredoxin-6 [Bos taurus]
- gi|48428343|sp|Q7M135.1|LYSC_LYSEN RecName: Full = Lysyl endopeptidase; AltName: Full=Lys-C
- gi|136429|sp|P00761.1|TRYP_PIG RecName: Full=Trypsin; Flags: Precursor
- gi|914833|gb|AAB60696.1| keratin type II, partial [Homo sapiens]
- gi|386854|gb|AAA36153.1| type II keratin subunit protein, partial [Homo sapiens]
- gi|623409|gb|AAA60544.1| keratin 10 [Homo sapiens]
Appendix B. Contaminants in the Alemtuzumab Data Set
- gi|224977|prf1205229A proteinase K
- gi|136429|sp|P00761.1|TRYP_PIG RecName: Full = Trypsin; Flags: Precursor
References
- Bartels, C. Fast algorithm for peptide sequencing by mass spectroscopy. Biol. Mass Spectrom. 1990, 19, 363–368. [Google Scholar] [CrossRef]
- Dancik, V.; Addona, T.A.; Clauser, K.R.; Vath, J.E.; Pevzner, P.A. De novo peptide sequencing via tandem mass spectrometry. J. Comput. Biol. 1999, 6, 327–342. [Google Scholar] [CrossRef] [PubMed]
- Mann, M.; Wilm, M. Error-tolerant identification of peptides in sequence databases by peptide sequence tags. Anal. Chem. 1994, 66, 4390–4399. [Google Scholar] [CrossRef]
- Tabb, D.L.; Saraf, A.; Yates, J.R. Gutentag: High-throughput sequence tagging via an empirically derived fragmentation model. Anal. Chem. 2003, 75, 6415–6421. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Sunyaev, S.; Liska, A.J.; Golod, A.; Shevchenko, A.; Shevchenko, A. Multitag: Multiple error-tolerant sequence tag search for the sequence-similarity identification of proteins by mass spectrometry. Anal. Chem. 2003, 75, 1307–1315. [Google Scholar] [CrossRef] [PubMed]
- Searle, B.C.; Dasari, S.; Turner, M.; Reddy, A.P.; Choi, D.; Wilmarth, P.A.; McCormack, A.L.; David, L.L.; Nagalla, S.R. High-throughput identification of proteins and unanticipated sequence modifications using a mass-based alignment algorithm for ms/ms de novo sequencing results. Anal. Chem. 2004, 76, 2220–2230. [Google Scholar] [CrossRef] [PubMed]
- Frank, S.R.; Tanner, S.; Bafna, V.; Pevzner, P. Peptide sequence tags for fast database search in mass-spectrometry. J. Proteome Res. 2005, 4, 1287–1295. [Google Scholar] [CrossRef] [PubMed]
- Savitski, M.; Nielsen, M.L.; Zubarev, R.A. New data base-independent, sequence tag-based scoring of peptide ms/ms data validates mowse scores, recovers below threshold data, singles out modified peptides, and assesses the quality of ms/ms techniques. Mol. Cell. Proteomics 2005, 4, 1180–1188. [Google Scholar] [CrossRef] [Green Version]
- Tanner, S.; Shu, H.; Frank, A.; Wang, L.C.; Zandi, E.; Mumby, M.; Pevzner, P.A.; Bafna, V. Inspect: Identification of posttranslationally modified peptides from tandem mass spectra. Anal. Chem. 2005, 77, 4626–4639. [Google Scholar] [CrossRef]
- Cao, X.; Nesvizhskii, A.I. Improved sequence tag generation method for peptide identification in tandem mass spectrometry. J. Proteome Res. 2008, 7, 4422–4434. [Google Scholar] [CrossRef] [Green Version]
- Na, S.; Jeong, J.; Park, H.; Lee, K.J.; Paek, E. Unrestrictive identification of multiple post-translational modifications from tandem mass spectrometry using an error-tolerant algorithm based on an extended sequence tag approach. Mol. Cell. Proteomics 2008, 7, 2452–2463. [Google Scholar] [CrossRef] [Green Version]
- Shen, Y.; Tolić, N.; Hixson, K.K.; Purvine, S.O.; Anderson, G.A.; Smith, R.D. De novo sequencing of unique sequence tags for discovery of post-translational modifications of proteins. Anal. Chem. 2008, 80, 7742–7754. [Google Scholar] [CrossRef] [Green Version]
- Tabb, D.L.; Ma, Z.Q.; Martin, D.B.; Ham, A.-J.L.; Chambers, M.C. Directag: Accurate sequence tags from peptide ms/ms through statistical scoring. J. Proteome Res. 2008, 7, 3838–3846. [Google Scholar] [CrossRef] [Green Version]
- Pan, C.; Park, B.; McDonald, W.; Carey, P.; Banfield, J.; VerBerkmoes, N.; Hettich, R.; Samatova, N. A high-throughput de novo sequencing approach for shotgun proteomics using high-resolution tandem mass spectrometry. BMC Bioinform. 2010, 11, 118. [Google Scholar] [CrossRef] [Green Version]
- Liu, W.-T.; Kersten, R.D.; Yang, R.D.; Moore, B.S.; Dorrestein, P.C. Imaging mass spectrometry and genome mining via short sequence tagging identified the anti-infective agent arylomycin in streptomyces roseosporus. J. Am. Chem. Soc. 2011, 133, 18010–18013. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Kersten, R.D.; Yang, Y.L.; Xu, Y.; Cimermancic, P.; Nam, S.J.; Fenical, W.; Fischbach, M.A.; Moore, B.S.; Dorrestein, P.C. Natural product peptidogenomics: A mass spectrometry-guided genome mining approach. Nat. Chem. Biol. 2011, 7, 667–673. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- LeDuc, R.D.; Taylor, G.K.; Kim, Y.-B.; Januszyk, T.E.; Bynum, L.H.; Sola, J.V.; Garavelli, J.S.; Kelleher, N.L. Prosight ptm: An integrated environment for protein identification and characterization by top-down mass spectrometry. Nucleic Acids Res. 2004, 32, W340–W345. [Google Scholar] [CrossRef] [PubMed]
- Zamdborg, L.; LeDuc, R.D.; Glowacz, K.J.; Kim, Y.-B.; Viswanathan, V.; Spaulding, I.T.; Early, B.P.; Bluhm, E.J.; Babai, S.; Kelleher, N.L. Prosight ptm 2.0: Improved protein identification and characterization for top down mass spectrometry. Nucleic Acids Res. 2007, 35, W701–W706. [Google Scholar] [CrossRef]
- Taylor, J.A.; Johnson, R.S. Sequence database searches via de novo peptide sequencing by tandem mass spectrometry. Rapid Commun. Mass Spectrom. 1997, 11, 1067–1075. [Google Scholar] [CrossRef]
- Taylor, J.A.; Johnson, R.S. Implementation and uses of automated de novo peptide sequencing by tandem mass spectrometry. Anal. Chem. 2001, 73, 2594–2604. [Google Scholar] [CrossRef]
- Henry, K.D.; McLafferty, F.W. Electrospray ionization with fourier-transform mass spectrometry. charge state assignment from resolved isotopic peaks. Org. Mass Spectrom. 1990, 25, 490–492. [Google Scholar] [CrossRef]
- McLafferty, F.W. High-resolution tandem ft mass spectrometry above 10 kda. Acc. Chem. Res. 1994, 27, 379–386. [Google Scholar] [CrossRef]
- Senko, M.W.; Beu, S.C.; McLafferty, F.W. Automated assignment of charge states from resolved isotopic peaks for multiply charged ions. J. Am. Soc. Mass Spectrom. 1995, 6, 52–56. [Google Scholar] [CrossRef] [Green Version]
- Zhang, Z.; Marshall, A.G. A universal algorithm for fast and automated charge state deconvolution of electrospray mass-to-charge ratio spectra. J. Am. Soc. Mass Spectrom. 1998, 9, 225–233. [Google Scholar] [CrossRef] [Green Version]
- Horn, D.M.; Zubarev, R.A.; McLafferty, F.W. Automated reduction and interpretation of high resolution electrospray mass spectra of large molecules. J. Am. Soc. Mass Spectrom. 2000, 11, 330–332. [Google Scholar]
- Gentzel, M.; Köcher, T.; Ponnusamy, S.; Wilm, M. Preprocessing of tandem mass spectrometric data to support automatic protein identification. Proteomics 2003, 3, 1597–1610. [Google Scholar] [CrossRef]
- Liu, X.; Inbar, Y.; Dorrestein, P.C.; Wynne, C.; Edwards, N.; Souda, P.; Whitelegge, J.P.; Bafna, V.; Pevzner, P.A. Deconvolution and database search of complex tandem mass spectra of intact proteins: A combinatorial approach. Mol. Cell. Proteomics 2010, 9, 2772–2782. [Google Scholar] [CrossRef] [Green Version]
- Pevzner, P.A.; Dancik, V.; Tang, C.L. Mutation-tolerant protein identification by mass spectrometry. J. Comput. Biol. 2000, 7, 777–787. [Google Scholar] [CrossRef]
- Vyatkina, K. De novo sequencing of top-down tandem mass spectra: A next step towards retrieving a complete protein sequence. Proteomes 2017, 5, 6. [Google Scholar] [CrossRef] [Green Version]
- Vyatkina, K.; Wu, S.; Dekker, L.J.M.; VanDuijn, M.M.; Liu, X.; Tolić, N.; Dvorkin, M.; Alexandrova, S.; Luider, T.M.; Paša-Tolić, L.; et al. De novo sequencing of peptides from top-down tandem mass spectra. J. Proteome Res. 2015, 14, 4450–4462. [Google Scholar] [CrossRef]
- Vyatkina, K.; Dekker, L.J.M.; Wu, S.; VanDuijn, M.M.; Liu, X.; Tolić, N.; Luider, T.M.; Paša-Tolić, L. De novo sequencing of peptides from high-resolution bottom-up tandem mass spectra using top-down intended methods. Proteomucs 2017, 17, 1600321. [Google Scholar] [CrossRef] [PubMed]
- Liu, X.; Dekker, L.J.M.; Wu, L.; VanDuijn, M.M.; Luider, T.M.; Tolić, N.; Dvorkin, M.; Alexandrova, S.; Vyatkina, K.; Paša-Tolić, L.; et al. De novo protein sequencing by combining top-down and bottom-up tandem mass spectra. J. Proteome Res. 2014, 13, 3241–3248. [Google Scholar] [CrossRef]
- Frank, A.; Pevzner, P. Pepnovo: De novo peptide sequencing via probabilistic network modeling. Anal. Chem. 2005, 77, 964–973. [Google Scholar] [CrossRef]
- Frank, A.M.; Savitski, M.M.; Nielsen, M.L.; Zubarev, R.A.; Pevzner, P.A. De novo peptide sequencing and identification with precision mass spectrometry. J. Proteome Res. 2007, 6, 114–123. [Google Scholar] [CrossRef] [Green Version]
- Frank, A.M. A ranking-based scoring function for peptide-spectrum matches. J. Proteome Res. 2009, 8, 2241–2252. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Vyatkina, K.; Wu, S.; Dekker, L.J.M.; VanDuijn, M.M.; Liu, X.; Tolić, N.; Luider, T.M.; Paša-Tolić, L.; Pevzner, P. Top-down analysis of protein samples by de novo sequencing techniques. Bioinformatics 2016, 32, 2753–2759. [Google Scholar] [CrossRef] [Green Version]
A | V | T | D | P | V | L | S | G | N | A | T | S | M | P | G | S | T | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
tag score | - | - | - | 0 | 0 | 0 | 2 | 3 | 3 | 4 | 2 | 2 | 0 | 0 | 0 | - | - | - |
3-mer score | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 |
CAH2 | Alemtuzumab | ||
---|---|---|---|
de novo strings of length | total | 806,934 | 38,936 |
correct | 90,891 (11.26%) | 1765 (4.53%) | |
with the associated scores all positive (necessarily of length ) | total | 69,205 | 685 |
correct | 46,738 (67.54%) | 592 (86.42%) | |
with the associated scores all zeros | total | 523,382 | 36,569 |
correct | 3258 (0.62%) | 285 (0.78%) | |
upon filtration | |||
retained | total | 58,084 | 656 |
correct | 46,382 (79.85%) | 582 (88.72%) | |
eliminated | total | 11,121 | 29 |
correct | 356 (3.20%) | 10 (34.48%) | |
de novo strings of length 7 | |||
with the middle tag score | total | 46,127 | 903 |
correct | 33,069 (71.69%) | 741 (82.06%) | |
with the middle tag score | total | 76,330 | 0 |
correct | 5673 (7.43%) | 0 | |
final results | |||
de novo strings of length that passed the validation procedure | total | 104,211 | 1559 |
correct | 79,451 (76.24%) | 1323 (84.86%) |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Vyatkina, K. Validation of De Novo Peptide Sequences with Bottom-Up Tag Convolution. Proteomes 2022, 10, 1. https://doi.org/10.3390/proteomes10010001
Vyatkina K. Validation of De Novo Peptide Sequences with Bottom-Up Tag Convolution. Proteomes. 2022; 10(1):1. https://doi.org/10.3390/proteomes10010001
Chicago/Turabian StyleVyatkina, Kira. 2022. "Validation of De Novo Peptide Sequences with Bottom-Up Tag Convolution" Proteomes 10, no. 1: 1. https://doi.org/10.3390/proteomes10010001