3.1. Comparative Analysis of the Results for the Naja ashei Venom Proteome Obtained Using Different Research Workflows
In our previous 2DE-MS analysis of
Naja ashei venom proteome, we were able to identify 19 proteins that belong to 7 different protein families [
10]. Here, we applied two different data processing workflows to analyze data acquired by shotgun LC-MS/MS proteomic experiment. Raw data was used as an input for either PeptideShaker (PS) or MaxQuant (MQ) software to identify and quantify proteins in venom. PeptideShaker, which applied X! Tandem and MS-GF+ search engines, was able to identify 37 proteins, while MaxQuant, operating with Andromeda engine, yielded 39 protein hits. Cross-comparison of the number of proteins identified with the use of three different research workflows is presented in
Figure 1.
With the highest number of overall protein identifications, MaxQuant also provided the greatest number of unique hits (16 proteins). In this manner, PS software contributed 13 unique identifications, while 2DE-MS/MS strategy allowed for the detection of only 8 exclusive hits. However, it should be noted that in our previous proteomic approach, Mascot search was applied only against the Swiss-Prot repository, while the database for shotgun proteomics was prepared from curated and non-curated sections of UniProtKB. In fact, a large proportion of these unique proteins, reported by the search engines in both research workflows (PS and MQ), were identified in the TrEMBL database.
Nevertheless, although 21 proteins were present in the results of both MQ and PS approaches, the high number of workflow-specific hits is also distinctive. In this case, however, we have not observed any tendency towards the detectability of certain groups of proteins using any particular research workflow (
Tables S1 and S2).
In the context of this comparison, it should be emphasized that the “protein inference problem”, which is an inherent part of the bottom-up strategy, makes drawing conclusions based on individual proteins, more an approximation rather than certainty. Hence, it would be more reliable to compare whole protein families that were detected with different approaches.
Table 1 shows that MaxQuant was able to detect proteins belonging to 13 protein families. These were two more families than in the analysis with PeptideShaker, and six more than in the 2DE-MS approach. This difference, however, mainly concerned low-copy protein families, whose share in the venom did not exceed 1%. Moreover, such a small number of detected protein families in the case of the 2DE-MS/MS strategy was, for certain, partly due to the database, as mentioned above, but also partly due to the insufficient resolution that was observable in previous 2D protein maps.
However, the most notable differences in the results of distinct research workflows can be seen in the quantification of the venom proteins. In the publication from 2018, we quantified the proportions of individual protein families in venom by densitometry of Coomassie-stained 2D gels. We reported that the composition of venom is dominated by two protein families, which together constitute more than 95% of venom (68.98% 3FTx, 27.06% PLA
2). Now, we collate this data with the information obtained from the shotgun LC-MS/MS experiment but processed with two different software solutions. Although we confirmed the prevalence of three-finger toxins and phospholipases A
2, the comparison of the accurate values clearly shows that the applied methodology has a very significant influence on the final quantitative results (
Figure 2).
The quantitative differences are most pronounced in the dominant protein families and can vary by 19.46% for three-finger toxins or 13.61% for phospholipases A2. Moreover, if we assume the authenticity of any of these datasets, we may almost double over- or underestimate the true percentage of PLA2 in the case of the wrong method selection. It is worth mentioning that these differences exist between the data that differed only in the protocol of data processing.
Considering that the label-free absolute quantification protocol in both programs is carried out using completely different algorithms (NSAF+ in PeptideShaker; iBAQ in MaxQuant), the occurrence of differences in the results is not particularly surprising. Nonetheless, comparing absolute quantitative data between studies that used different analytical workflows, but also different data processing protocols, can be burdened with a high risk of error.
3.2. LC-MS/MS Analysis of Naja ashei Venom after Sample Decomplexation with the Use of Different Data Processing Software and an SDS-PAGE of Obtained Fractions
To reduce the complexity of the venom sample and increase the number of identifications, we applied a simple step of fractionation using 30 kDa centrifuge filters. As a result, we obtained two fractions with protein concentrations of 11.792 μg/μL in the upper fraction and 0.430 μg/μL in the bottom fraction. The qualitative results of the LC-MS/MS analysis are presented in
Figure 3 and
Table 2.
As might be expected, the decomplexation of the sample allowed for the total increase in the number of identifications in both data processing protocols, compared to the analysis of unfractionated (crude) venom. Again, the total number of identified proteins in both fractions was higher for MQ (58 proteins) than for PS (53 proteins). It was mainly influenced by the considerable difference in the number of hits identified in the bottom fraction. MaxQuant identified fewer proteins (34) in the upper fraction but reported 24 identifications for the bottom fraction. On the other hand, PS identified 40 proteins in the upper fraction but only 13 in the bottom one. This time, however, the highest number of unique hits was delivered by the combination of X! Tandem and MS-GF+ search engines and was equal to 17 proteins. Six proteins were included in all results, while there were no unique hits in the MaxQuant output for the bottom fraction. Again, no clear trends were observed in the identification of specific groups of proteins by different software (
Table S2).
At this point, it is worth noting that after filtration, we observed an overall increase in the number of unique protein hits (previously unidentified in the crude sample) by 16 for the PS analysis and by only 5 proteins for the MQ analysis. On the other hand, the number of proteins identified only in the unfiltered sample was 9 and 10, respectively for the PS and MQ analysis. Thus, from this perspective, the combination of two search engines in the case of PS analysis provided more diversified results for filtrated samples (
Table S2).
Interestingly, although the decomplexation of the sample resulted in an overall increase in the number of identified proteins, it did not necessarily translate into a greater number of recorded peptides. This is particularly visible in the case of MQ analysis, where the amount of unique peptides is significantly higher in the unfiltered sample (216 peptides) compared to the sum of non-redundant peptides in samples after fractionation (154 peptides). This deceptive contradiction, however, can be explained, as the proteins detected in the unfiltered sample were on average identified from a larger number of peptides (
Table S3). Moreover, as was mentioned before, the higher variability of identified proteins in MQ analysis appeared in the unfiltered sample.
Nevertheless, after sample decomplexation, we have also observed an increase in the total number of identified protein families from 11 to 13 in the case of PeptideShaker. Moreover, only in this case, it was possible to detect TF-like protein (UniProt ID - A0A2H6N0F2), which was not earlier achievable under any other conditions (
Table 2).
Furthermore, we were able to observe the prevalence of low-molecular-weight proteins in the results of the bottom fraction, in the case of data processed by both software solutions. This is not particularly surprising as we used the 30 kDa filter membrane, but in the list of identified proteins for this fraction, larger proteins were observed as well. It might be the case that they were included in the bottom fraction due to the partial degradation of some proteins in the sample.
In this context, a more confusing thing is the dominance of many low-molecular-weight proteins in the upper fraction. This, however, could be explained by an incomplete process of filtration or unspecific membrane-protein interaction ending with the adhesion of proteins to the membrane. A possible explanation for this phenomenon may also be a high concentration of the fractionated sample, resulting in the clogging of the membrane pores.
Another unexpected issue to consider is represented by the quantitative results of the shotgun LC-MS/MS proteomic experiment, after sample fractionation (
Figure 4).
Originally, we expected to see a higher proportion of low abundant proteins in the upper venom fraction after the initial sample decomplexation. That was partly because two dominant protein families in Naja ashei venom have a lower molecular mass than the mass cut-off of membranes used in the experiment. While, in fact, in the bottom fraction we observed almost exclusively proteins from the three-finger toxin family, quite surprisingly, in the upper fraction, the percentage share for this group of proteins also increased. In comparison to the crude venom analysis, this increase was very significant, as in the case of the PeptideShaker calculations it equaled 18.08%, and in the case of the MaxQuant, 14.71%. This observation seems unusual as after filtering some of the 3FTx proteins into the bottom fraction, we would expect a decrease in their percentage in the upper fraction compared to their amount in the whole venom. Based on these results, it could be assumed that the real amount of proteins from this family in this venom is much higher than initially reported. It cannot be ruled out that the stochastic nature of data-dependent acquisition in this case largely influences the data particularly concerning this family of proteins. In other words, it is possible that during MS analysis, part of the 3FTx proteins mask the presence of other proteins from this family, which are also present there in a considerable amount.
It is difficult to neglect that the bottom fraction almost exclusively consists of proteins from the 3FTx family. This could mean that a simple method based on the use of centrifuge filters can be a very rapid and efficient way to isolate from venom a certain share of three-finger toxin proteins. However, to confirm these results, we have performed an SDS-PAGE analysis of the obtained fractions under reducing and non-reducing conditions (
Figure 5).
The obtained gels confirm the results acquired from the LC-MS/MS experiment. The upper fraction largely reflects the composition of whole unfractionated venom, while the bottom fraction consists almost entirely of proteins with a mass of about 7 kDa. Electrophoresis under non-reducing conditions revealed that these proteins form a kind of larger aggregates with a mass close to 11 kDa. At the same time, the non-reducing electrophoresis ruled out the hypothesis that the low-molecular-weight proteins in the upper fraction formed large multimers, which resulted in their mass shift above 30 kDa. It is easy to notice that even in non-reducing conditions, the upper fraction is still mostly composed of low-molecular-weight proteins below 30 kDa. Moreover, SDS-PAGE electrophoresis confirmed that, with the use of centrifuge filters, a separation, of a certain fraction of 3FTx proteins from Naja ashei venom, is possible. Likely, this method may also be effective with other venoms of similar composition.