2.2. Results
mzStudio was developed in our lab to provide a centralized framework to interactively visualize, annotate, and integrate sequence assignment and other features of mass spectrometry data across instrument manufacturers, platforms, and search engines (
Figure 1). Consistent with our design philosophy for our broader multiplierz project, mzStudio provides direct access to native mass spectrometry data files without the need for conversion to auxiliary file formats (i.e., xml); all supported vendors and instrument platforms are listed in
Supplementary Table S1. Exemplary file access times are listed in
Supplementary Table S2. mzStudio leverages our common API [
16] and manufacturer DLLs (installed with multiplierz) to directly access native data files; as such, mzStudio is currently limited to use on Windows OS. mzStudio supports access to and visualization of MS1, MSn, DIA, and specialized triple quadrupole scans (precursor/neutral loss scanning data). mzStudio can currently read SRM data from LTQ/Orbitrap instruments; we are actively working to facilitate reading SRM data from other platforms. Search results from Mascot, Proteome Discoverer, Comet, and X!tandem can be directly imported and queried with a simple yet powerful SQLite interface based on our previously described mzResults format [
14]. For example, users can filter and sort data to highlight proteins or PTMs of interest by typing simple commands at the SQLite prompt (see example queries in
Supplementary Table S3 and tutorial file hosted on Github). To facilitate construction of queries, we implemented autocompletion of SQLite key words (e.g., SELECT, FROM, WHERE) as well as shortcuts for common worksheet column names (e.g., “Variable Modifications”). An integrated peptide calculator tool (PepCalc) facilitates evaluation of theoretical fragment ions (y/b for collisional activated dissociation/higher collisional energy dissociation (CAD/HCD) spectra or c/z for electron transfer dissociation (ETD) spectra) of specified charge state for spectral validation. Sequences can be adjusted on-the-fly with predicted, color-coded fragment ions remapped to the spectrum (for example, changing placement of phosphate group to validate phosphorylation site localization). For multidimensional liquid chromatography-mass spectrometry (LC-MS) studies, spectral validation can be especially laborious as it requires navigating multiple data files. mzStudio simplifies this task by allowing direct import of combined search results; associated raw data files may be loaded all at once, or cached sequentially as needed during the validation process, affording fast and seamless access across large data sets. This feature also simplifies evaluation of peak areas obtained from MS-based quantitation experiments. mzStudio can also be used to verify reporter-based quantification (TMT, iTRAQ), and supports visualization of corrected reporter intensities (i.e., corrected for reagent isotopic impurities, variation in protein input, or instrument-specific parameters such as ion injection time).
Additional tools provide for dynamic re-evaluation of data and enable exploration of alternative hypotheses for peptide sequence, modification, or fragmentation behavior. For example, mzStudio can implement unbiased detection and visualization of MS1-based features, where each feature is an isotopic cluster over a certain time range with any associated MS/MS spectra. Once features are detected, they are directly mapped onto MS1 data. Clicking a feature tab opens a window allowing users to quickly browse to any MS1 or MS2 scan that corresponds to the feature. With this view of the data, unassigned features can be quickly identified and directly submitted for sequence assignment considering different modifications and protein databases; fragment ions assigned through each iterative search are automatically annotated within MS/MS spectra. Furthermore, mzStudio supports custom spectral processing algorithms (
Supplementary Figure S1 illustrates a custom processing routine written in Python); this capability enables in-depth exploration of surprising or novel gas-phase fragmentation behavior. We used these tools to significantly improve identification rates for peptides modified with cysteine-directed covalent drugs and other chemical probes [
17]. With mzStudio, researchers can add, refine, or create entirely new spectral pre-processing routines (for examples, see the example_processing_scripts folder in the Github repository), submit MS/MS data to multiple search algorithms, and assess the impact both qualitatively (improved utilization or accounting of fragment ions) and quantitatively (individual peptide score).
Figure 2 illustrates a general workflow utilizing these capabilities.
It can be challenging to maintain informative, detailed records of new ideas and progress in sequence assignment when exploring novel peptide fragmentation pathways or the impact of spectral pre-processing algorithms (e.g., de-isotoping, charge-reduction, or removal of kinase inhibitor specific ions). Similarly it is difficult to test and evaluate the myriad of combinations when multiple post-translational modifications are thought to occur along a relatively short sequence of amino acids. For example, we recently utilized quantitative mass spectrometry to interrogate modifications on Olig2, a transcription factor that mediates fate choice of neural progenitor cells in the developing central nervous system and can contribute to the pathophysiology of human gliomas [
18]. A set of three protein kinases works in tandem to phosphorylate Olig2 at multiple sites within the first 20 N-terminal amino acids. Indeed, these and other data [
19,
20] highlight the critical roles that phosphorylation on this region of Olig2 plays in its tumorigenic function. Mapping these phosphorylation sites and deciphering the kinetics to establish potential ‘priming’ phosphorylation events is an important first step in trying to identify the kinases which may represent therapeutic targets. Our work in this study required extensive analysis of MS/MS spectra to localize different and even multiple sites of phosphorylation on the same peptide fragment. To better support our work in this and similar projects, we developed the companion spectral notebook application (SpecStylus,
Figure 3), which enables researchers to create a digital provenance of data analysis activities. Furthermore, spectra, processed spectra, extracted ion chromatograms, or other data projections can be annotated using an associated text box or assorted drawing widgets to catalog evidence for fragmentation pathways, phosphorylation site localization, or other spectral features. These annotations are stored in the notebook for comparison to future experiments. In addition, processing scripts, search results, and other parameters can be linked to notebook entries, thereby creating a forensic ‘chain-of-custody’ for all evidence and procedures used to support a final sequence assignment. For added convenience and portability, all intermediary steps associated with a final result can be dynamically analyzed, or further extended, independent of the original native mass spectrometry data; this feature facilitates sharing results with colleagues and assembling supplemental files for scientific journals. Finally, SpecStylus images can be exported in .png, .pdf, .svg, and .ppt format for preparation of slides or publication quality figures, while peak lists can be output as .sdb files for use with NIST library search tools.