2.2. Protein Processing
Specimens of human biological fluids generally require initial processing to eliminate non-protein sample components. As discussed here in
Section 3, depending on the type of biological fluid, these contaminants are residual cell debris, salts, lipids and other small molecule components. Furthermore, in some of the fluids such as urine, CSF or BAL, protein is present at very low concentrations. Therefore, the first stage in the processing of raw biological fluid samples for phosphoproteomics commonly involves initial centrifugation to remove particulates and other material, followed by steps to remove contaminants and/or to concentrate the protein analyte. For sample cleanup and protein concentration, ultrafiltration with membrane filters with a specific MW cutoff (typically 3–5 kDa) may be employed; alternatively, protein precipitation with acetone or trichloroacetic acid may be used.
A major consideration associated with (phospho)proteome analyses in serum/plasma and other biological fluids is the wide range of protein amounts. Relative concentrations of serum/plasma proteins span over >10 orders of magnitude. Several protein groups including albumin, serotransferrin, and immunoglobulins comprise more than 95 percent of the total protein mass in plasma [
19]. High-abundance proteins or, more precisely the peptides originating from these proteins, are overrepresented in proteome analyses of biological fluids and obscure access to proteins present at lower amounts. To address this challenge, strategies for removal of overabundant proteins and reduction of dynamic range have been developed.
To selectively remove specific high-abundance proteins, antibody-based affinity capture is widely employed. In this “negative chromatography” method, unwanted proteins are bound while the flow-through contains lower-abundance analytes of interest. A number of immunoaffinity columns in various formats have been developed and commercialized. Perhaps the most popular for biological fluid proteomics is termed Multiple Affinity Removal System (MARS) that is available in several versions tailored for removal of a specific number of proteins [
20]; for instance MARS Hu-6 is designed to deplete the top six proteins from human biological fluid samples. A drawback associated with depletion of abundant proteins is the loss of proteins that remain bound to the captured carrier proteins such as albumin or that interact nonspecifically with the column [
21]. The affinity-bound fraction can be analyzed to probe these proteins at the cost of doubling the number of samples entering the downstream portion of a particular bioanalytical workflow.
Another approach to attenuate the levels of high-abundance proteins utilizes the principle of so-called dynamic range compression [
22]. Reagents for this purpose are commercially available under the name ProteoMiner. The method uses combinatorial hexapeptide libraries synthesized on solid support (beads) to provide a pool of affinity ligands, each of them binding a specific protein partner via adsorption. A finite number of molecules from each protein are able to bind to the library to saturate available binding sites for that particular protein; excess protein molecules will remain unbound and be discarded in the flow-through. In this manner, abundant proteins that are present in large excess will be reduced in amount and lower-abundance proteins will be enriched. Thus, in principle this method allows equalization of all proteins within a protein mixture to the same concentration.
2.5. Phosphopeptide Enrichment
Upon proteome digestion, the majority of peptides will be non-phosphorylated. This is because upon digestion of a whole proteome from a biological fluid, non-phosphorylated proteins (present in wide range of abundances) will contribute peptides to the overall peptide mixture. Furthermore, phosphorylation is commonly of low abundance and hence a large fraction of a particular protein will be non-phosphorylated, yielding excess of non-phosphorylated peptides compared to their phosphorylated counterparts. Finally, it has been indicated that in LC–MS/MS, a less effective ionization of phosphopeptides may result in unfavorable detection of phosphopeptides vs. non-phosphorylated peptides [
29]. For these reasons, to effectively sample the phosphoproteome in the bottom-up approaches, enrichment of phosphorylated species at the peptide level is frequently incorporated in the chosen bioanalytical workflows. Several techniques for phosphopeptide enrichment have been developed, effects of the various formats and experimental conditions on specificity of phosphopeptide isolation and size of phosphopeptide panels have been extensively studied [
30], and efforts for further optimization of enrichment strategies for phosphoproteomics are ongoing. Two approaches have emerged as the most prevalent for phosphopeptide enrichment: immobilized metal ion affinity chromatography (IMAC) and metal oxide affinity chromatography (MOAC).
IMAC is an established methodology that utilizes transition metal cations such as Ga
3+ or Fe
3+ [
30,
31,
32] immobilized to solid support bearing chelating moieties iminodiacetic acid (IDA) or nitrilotriacetic acid (NTA) to capture peptides bearing negatively charged phosphate groups. Phosphopeptides are loaded onto IMAC at low pH, and after a series of washes elution is achieved at high pH. A variety of modifications in this basic sequence of binding/elution steps have been used with the goal to maximize sensitivity and selectivity [
30].
Phosphopeptide capture by MOAC uses the affinity of metal oxides for negatively charged phosphopeptides. The most popular MOAC incorporates titanium dioxide (TiO
2) as the capture matrix; similarly to IMAC a variety of modifications of the basic experimental protocol have been introduced to optimize performance of the technique. Furthermore, phosphopeptide enrichment workflows with sequential or parallel combination of IMAC and/or TiO
2 have been employed. Development of new and improved approaches for phosphopeptide enrichment continues [
30] and some of these methods could be adopted in biofluid phosphoproteomics, such as sequential elution from IMAC (SIMAC) [
33], or affinity enrichment with metal ion-functionalized nanopolymers (PolyMAC) [
34]. Collectively, improvements of the different types of phosphopeptide enrichment strategies have been achieved by the phosphoproteomics community but despite these efforts there is no single method that would provide optimum enrichment.
The IMAC and MOAC strategies provide enrichment of pSer-, pThr- and pTyr-containing peptides. For pTyr-specific enrichment, immunoaffinity-based methods have been developed [
35], and reagents are commercially available.
Phosphospecies enrichment at the peptide level is the most widely applied strategy. Enrichment may also be performed at the protein level to isolate intact phosphoproteins. This less common approach has been used in the context of serum phosphoproteome discovery [
15,
16].
Alternatives to IMAC/MOAC chromatography are based on chemical removal of the phosphate moiety and subsequent derivatization with different chemistries that allow enrichment via affinity chromatography [
30]. In biological fluid phosphoproteomics, the approach that has been used involved thiol-based derivatization and capture [
36].
2.6. LC–MS/MS
LC–MS/MS is a key component and the common denominator of (phospho)proteomics workflows. Identification of peptides and proteins in bottom-up approaches is based on data generated by LC–MS/MS analysis of the peptide mixtures produced in proteolytic digestion of the proteome. Reversed-phase chromatography interfaced with high-end tandem mass spectrometers provides separation of the complex analyte mixtures prior to MS and MS/MS. To achieve high sensitivity that is required, the LC configurations commonly feature capillary columns with 75-μm inner diameter and mobile phase flowrates of several hundred nanoliters/min. In an increasing number of LC–MS/MS instrument configurations, the LC is performed in the ultra-high pressure/performance (UPLC) regime, which utilizes sub-2 μm stationary phase particles for major improvements of column efficiencies to achieve high resolution and reproducibility of chromatographic separations.
The peptide analytes in the eluent from nanoLC are introduced into the mass spectrometer and ionized by nanoelectrospray to produce multi-protonated gas-phase molecular ions for MS and MS/MS analysis. The basic goal of the mass spectrometry measurement in the context of (qualitative) phosphopeptide analysis is to determine specific attributes that are then used in subsequent database searches to provide (1) the identity of the proteins present in the sample, and (2) location of the site(s) of phosphorylation in these proteins. Both pieces of information are derived from the mass of the peptide and, most importantly, from the gas-phase dissociation patterns that are diagnostic of the peptide’s amino acid sequence and phosphosite location.
Sensitivity, acquisition speed, and mass measurement accuracy are critical parameters for success of phosphopeptide characterization and site assignment. Several earlier studies of the biological fluid phosphoproteomes were performed with low-resolution LTQ linear ion trap instruments. More recently, as in other sub-fields of MS/MS-based proteomics, hybrid tandem mass spectrometers with configurations of analyzers such as the Orbitrap or time-of-flight (TOF) capable of high resolution/mass accuracy, high data acquisition speed, and increased flexibility in ion-dissociation modes, have been adopted for phosphoproteome discovery in biological fluids. Technological advancements in mass spectrometry instrumentation continue towards maximizing information obtained in a single LC–MS/MS analysis to eliminate the need for upstream fractionation of the analyte mixtures [
37].
Gas-phase dissociation of phosphopeptide molecular ions is commonly performed with collision-induced dissociation (CID) to produce sequence-determining product ions of the b- and/or y-series. For protonated phosphopeptide ions, in particular in low-energy CID regime such as in ion trap instruments, an energetically favored fragmentation channel generates a phosphate diagnostic product ion [
38]. This ion arises from beta-elimination of the elements of phosphoric acid forming a dehydroalanine. Loss of H
3PO
4 (−98 u) from doubly or triply-charged precursor ion (
n = 2
+ or 3
+) generates a non-sequence specific product ion [M +
nH − H
3PO
4]
n+. This product ion can serve as a marker ion, indicating the presence of a phosphorylated peptide. However, oftentimes this product ion dominates the MS/MS spectrum and not enough sequence-determining ions are observed for an unequivocal peptide sequence determination. To address this shortcoming, MS
3 (i.e., another round of CID on the primary product ion from MS
2) can be used for confirmation of site assignment on instruments capable of higher-order dissociation. MS
3 can be triggered when intense primary product ions due to loss of H
3PO
4 are detected in the MS
2 scan. In this manner, the LC–MS/MS datasets contain collections of MS/MS (MS
2) spectra plus neutral loss-triggered MS
3 spectra, and both types of data are used for database searches. This strategy, originally developed for analyses on standalone ion trap instruments, was found to be less valuable in LC–MS/MS performed with high-accuracy hybrid instrumentation [
39].
An ion activation mode complementary to CID that has been adopted for MS/MS-based phosphoproteome analyses is Electron Transfer Dissociation (ETD) [
40]. Upon ETD, dissociation of the activated precursor ions produces product ions of the z- and c-series, thus providing information complementary to low-energy CID where the b- and y-ion series usually dominate. Importantly, phosphorylation, which is labile under CID, is preserved in ETD, and the resulting spectra contain extensive sequence information. To maximize phosphopeptide identification and site localization, both CID and ETD may be incorporated in the phosphoproteomics bioanalytical workflow if an instrument possessing ETD capabilities is available to the investigators.
Another important aspect of LC–MS/MS in (phospho)proteomics concerns the methods of data acquisition. Traditionally, LC–MS/MS of complex proteolytic digests in the bottom-up approach has been performed using data-dependent acquisition (DDA). In DDA, for peptides eluting from LC at any given time, an MS survey scan is acquired to provide information on the masses and intensities of the molecular ions; the MS is then followed by sequential MS/MS scans on a fixed number of precursor ions. This cycle of MS and MS/MS is repeated throughout the whole LC–MS/MS run. Usually, previously interrogated precursor ions are excluded from MS/MS acquisition over a pre-set time window (dynamic exclusion). Real-time selection of molecular ions for MS/MS in each DDA cycle is based on user-set criteria, and is generally biased towards more abundant peptides. Nevertheless, consistent improvements of instrument sensitivity and data acquisition speed have brought enhanced DDA performance [
41], and DDA with state-of-the-art mass spectrometry instrumentation continues to be a powerful method for large-scale profiling of complex (phospho)proteomes.
Alternatively to DDA, LC–MS/MS methods have been developed that avoid real-time sampling of individual precursor ions. These data-independent acquisition (DIA) approaches encompass an assortment of different strategies that involve acquisition of MS/MS data independent on precursor ion information [
42,
43,
44,
45]. Collectively, these DIA strategies do not involve mass selection of individual precursor ions as the first step in CID. Instead, multiple precursors are selected and dissociated concurrently, either all at once over a single wide
m/
z range (in MS
E approach [
42,
44]) or sequentially over smaller windows spanning several tens
m/
z (in SWATH method [
45]). The MS/MS data generated in these analyses are a composite of CID dissociations of all co-selected precursors and they must be deconvoluted post-acquisition to establish precursor–product ion connectivities. The DIA method applied in biological fluid phosphoproteome discovery is MS
E, which utilizes a quadrupole-TOF instrument to acquire LC–MS/MS data using alternating collision energy levels to obtain MS (low energy) and MS/MS (high-energy) spectra; the precursor–product ions relationship is reconstructed based on exact overlap of chromatographic profiles for the precursor and the corresponding product ions [
42,
44].
In most biological fluid phosphoproteomics studies published to date, the focus was on qualitative discovery. Nevertheless, quantitative examinations of the phosphoproteome in serum [
16] and urine [
17] have been carried out, and the corresponding workflows utilized mass spectrometry-based quantification methods—either label-free or based on stable isotope labeling. In the label-free approach, quantitative information is derived from integrated peak area for the ion chromatogram of the phosphopeptide of interest. Label-free quantification is relatively simple and inexpensive, and it does not involve additional workflow steps. However, multiplexing, i.e., quantification of analytes of interest across multiple conditions in a single LC–MS/MS run, is not possible using the label-free method. Stable isotope labeling pertinent to biofluid phosphoproteomics involves chemical derivatization at the peptide level to introduce stable isotope-containing tags that shift the mass of the labeled phosphopeptide (or a specific product ion) by a known increment. Tags with different combinations of heavy and light isotopes may be used to label peptides in different samples. In this way, peptides in samples from different conditions (such as diseased vs. control) are distinguishable in MS or MS/MS, and relative quantification of the (phospho)peptides of interest in multiple samples may be performed in a single LC–MS/MS analysis. Depending on the composition of the label, quantification may be achieved in MS or in MS/MS (in case of isobaric labeling). One example of a tagging strategy with quantification at the MS level is mTRAQ (mass differential tags for relative and absolute quantification), which involves non-isobaric labeling of primary amines in peptides; mTRAQ has been applied to phosphoproteome quantification in urine [
17]. Commercially available tags designed for quantification at the MS/MS level using isobaric labeling include iTRAQ (isobaric tags for relative and absolute quantification) and TMT (tandem mass tags). These approaches utilize isobaric tags whose structure consists of a reporter moiety incorporating a different number/combination of stable heavy isotopes, a balance moiety, and a reactive group that serves to attach the tags to (phospho)peptides after proteolytic digestion. (Phospho)peptides in different samples, when derivatized with these tags have the same precursor ion mass and thus are isolated and dissociated together. However, upon CID, the tagged phosphopeptides produce product ions (so-called reporter ions) that exhibit differences in their m/z. Phosphopeptides originating from different conditions are then quantified based on relative intensities of these reporter ions; amino acid sequence information and phosphosite location is derived from dissociations of the phosphopeptide backbone.
Finally, the LC–MS/MS approaches aimed at global-scale (phospho)proteomics discussed above are complemented by targeted MS/MS. Targeted MS/MS focuses on acquisition of quantitative data for a smaller set of precursor ions selected a priori. A widely used targeted strategy is multiple reaction monitoring (MRM) in which dissociation of a mass-selected precursor to specific product ion(s) (termed transition) is monitored for quantitative measurements [
46]. For MRM, the (phospho)peptides to be targeted must be known. Selection of the targets of interest may be based on prior knowledge such as that originating from previous discovery studies, and development of MRM assays has to be undertaken. Today’s LC–MS/MS systems permit high multiplexing of MRM, i.e., MRM data are obtained for many precursors in a single chromatographic run, and MRM acquisition for subsets of precursor–product transitions may be scheduled based on previously established retention times.