Identification of Unknown Biological Toxin Proteins Using Mass Spectrometry: A Case Study on De Novo Sequencing of Ricin

Yubo Song; Hao Wang; Junjie Wen; Jiale Xu; Siyu Zhu; Fuli Wang; Yongqian Zhang

doi:10.3390/toxins17110564

,

and

¹

State Key Laboratory of Chemistry for NBC Hazards Protection, Beijing 102205, China

²

School of Medical Technology, Beijing Institute of Technology, Beijing 100081, China

^*

Authors to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Toxins2025, 17(11), 564;https://doi.org/10.3390/toxins17110564

This article belongs to the Special Issue Analytical Novelties and Challenges for the Detection of Natural Toxins

Version Notes

Order Reprints

Abstract

Background: The rapid and reliable identification of unknown or highly variable biological toxin proteins, such as the potent Ricin toxin, remains a critical challenge in biodefense and public security. Methods: To address this, we developed a Heuristic De Novo Sequencing (HDPS) strategy, which combines multiple enzymatic and microwave-assisted acid hydrolysis to generate diverse peptides, followed by a two-stage assembly process integrating de novo sequencing with homology-based database searching for robust error correction. Results: When applied to Ricin, this approach achieved 100% sequence coverage for both its A and B chains, with amino acid-level accuracies of 98.13% and 98.47%, respectively, and successfully corrected potential sequencing ambiguities. Conclusions: These results demonstrate that HDPS is a highly accurate and effective method for the de novo sequencing of full-length proteins, making it particularly valuable for characterizing unknown or mutated toxins in the absence of comprehensive reference databases.

Keywords:

ricin; heuristic de novo sequencing; mass spectrometry; biological toxin protein identification

Key Contribution:

This work introduces the Heuristic De Novo Sequencing (HDPS) platform, overcoming the error propagation of traditional methods by combining complementary digestion and a two-stage assembly to achieve complete coverage and over 98% accuracy in de novo sequencing of Ricin, all without relying on reference sequences.

1. Introduction

Ricin is a heterodimeric, type II ribosome-inactivating protein derived from Ricinus communis. Its A chain catalytically depurinates rRNA, thereby irreversibly inhibiting protein synthesis, whereas the B chain facilitates cellular entry through glycan-binding and endocytosis [1,2,3]. Following internalization, the toxin undergoes retrograde transport to the endoplasmic reticulum (ER), where the chains are separated, releasing the cytotoxic A chain into the cytosol [4,5]. Due to its high toxicity and relative accessibility, ricin is considered a potential bioterror agent [6]. Notably, its components also hold therapeutic promise: the A chain can be used in immunotoxins, and the B chain may serve as a targeting carrier for nanoparticles, illustrating a dual-use potential [7]. Consequently, sequencing and biologically analyzing such toxins are vital for advancing both countermeasures and biomedical applications [8,9].

Currently, detection technologies for protein-based biological toxins are broadly divided into two categories. The first category includes immunoassays that employ specific antibodies, such as enzyme-linked immunosorbent assay (ELISA), Western blot (WB), immunochromatographic assay (ICA), colloidal gold immunochromatographic strip (CGIS), and electrochemiluminescence-based methods [10,11,12,13]. Although these methods offer high sensitivity and operational simplicity, their performance heavily relies on the specificity and affinity of the antibodies used. The second category encompasses mass spectrometry (MS)-based techniques, which accurately determine amino acid sequences of toxin-specific peptides by interpreting MS data. For example, Sousa et al. first reported the use of accelerated solvent extraction (ASE) to process complex samples containing Ricin. Subsequent tryptic digestion and MS analysis led to the identification of 19 peptides, with the tandem mass spectrometry (MS/MS) spectra of three peptides confirming the uniqueness of Ricin [14]. Separately, Chen et al. developed a microwave-assisted acid hydrolysis (MAAH) method, which utilizes hot acid to selectively cleave aspartic acid residues, reducing hydrolysis time to just 15 min and significantly improving efficiency; subsequent MS analysis also successfully identified disulfide-bonded peptides within the Ricin sequence [15]. Despite the diversity of available detection technologies, a universally applicable standard reference method is still lacking [16].

MS-based de novo sequencing adopts a bottom-up strategy to determine the full-length sequences of unknown biological toxin proteins. The process begins with enzymatic digestion of the target protein into peptides, followed by MS/MS analysis of their fragmentation patterns. The resulting spectral data are then assembled de novo, typically utilizing a de Bruijn graph algorithm, which constructs contigs from overlapping peptide segments. This theoretical framework facilitates the accurate reconstruction of the primary protein structure. However, in practice, instrumental signal loss and background noise in MS data can mislead de novo algorithms, resulting in erroneous sequence assignments [17]. Furthermore, the de Bruijn graph assembly is highly dependent on the depth and accuracy of the peptide sequencing; low coverage often leads to fragmented assembly and prematurely truncated contigs [18]. The conventional single-step de novo assembly strategy is highly prone to the transmission and accumulation of such errors, ultimately leading to a reduction in the accuracy of the protein sequence.

To address these limitations, we developed a Heuristic De Novo Protein Sequencing (HDPS) technology. This technology combines microwave-assisted and enzymatic hydrolysis to significantly improve the diversity and yield of peptide sequences. After initial construction of sequence overlap scaffolds, HDPS implements a two-stage assembly and error-correction strategy that integrates graph theory with database searching, thereby achieving highly robust protein sequence assembly [19,20]. By employing Ricin as a model system, we validated the high accuracy and reliability of the HDPS platform for full-length toxin protein sequencing. Our method is designed to overcome persistent challenges in traditional MS-based identification, notably extended cycle times and assembly difficulties, thereby achieving over 95% accuracy at the amino acid level. It thus provides a powerful and versatile tool for characterizing protein toxins from novel or unknown sources [21,22,23,24].

2. Results and Discussion

MS-based peptide de novo sequencing and sequence assembly are recognized as pivotal technologies for determining the full-length sequence of proteins at the amino acid level. It plays a crucial role in various fields, including identifying amino acid mutations, discovering biomarkers for disease treatment response, and screening novel therapeutic agents [24,25,26]. In this study, we integrated mass spectrometry-based de novo protein sequencing with database searching, naming the combined approach HDPS. The technical workflow is detailed in Figure 1. The process began with the digestion of the Ricin protein (as an unknown sequence) using multiple enzymatic digestions combined with microwave-assisted acid hydrolysis to ensure comprehensive sequence coverage. The resulting peptides were analyzed by MS/MS to acquire fragmentation spectra. Subsequently, de novo sequencing was performed on the MS/MS data using pNovo software, and an initial assembly round yielded preliminary sequence contigs. These contigs were then subjected to homology analysis to construct a custom sequence database, which was searched against using pFind software to identify high-confidence peptide-spectrum matches. Finally, peptides from both the de novo and database search results were consolidated and subjected to a second, integrated assembly round, ultimately determining the complete amino acid sequence of the Ricin protein.

Figure 1. Workflow of the Heuristic De Novo Protein Sequencing (HDPS) technique.

2.1. Peptide De Novo Sequencing

The amino acid sequences of peptides were directly determined from the mass spectrometry data using de novo sequencing algorithms. The peptide lengths identified by the pNovo software were predominantly distributed between 6 and 18 amino acids, representing an optimal length range suitable for the subsequent sequence assembly process (Figure 2a) [27,28]. However, the accuracy of de novo sequencing is inherently constrained by factors such as instrumental noise and incomplete peptide fragmentation, often leading to errors including the misidentification of isobaric amino acids [29]. Therefore, when sequencing unknown proteins, rigorous quality control of the de novo sequencing results is essential to reduce the complexity of subsequent peptide assembly. Overall, the majority of the de novo sequenced peptides were associated with confidence scores below 60, underscoring the limited per-segment accuracy of the method and necessitating validation and correction via subsequent database searching to ensure the assembly of correct sequences (Figure 2b). Consequently, a threshold based on the de novo sequencing score was applied to filter and remove low-confidence peptide assignments initially, thereby enhancing the accuracy and robustness of the final sequence assembly.

Figure 2. Characterization of de novo sequencing results for Ricin toxin peptides using five hydrolysis methods. (a) Length distribution results of de novo sequencing peptides analyzed by five hydrolysis methods. (b) Score distribution results of de novo sequencing peptides analyzed by five hydrolysis methods. (c) Score distribution comparison between specific and nonspecific hydrolysis methods. (d) Length comparison of peptides from the Proteinase K and MAAH methods (e) Score comparison of peptides hydrolyzed by Trypsin and Glu-C. (f) Combined score and length distribution of de novo sequencing peptides of Ricin toxin.

This study employed both specific and non-specific hydrolysis methods. Generally, the non-specific method yielded a greater quantity of peptides and a higher proportion of usable peptides, indicating its efficacy in enhancing peptide diversity and overlap, which is crucial for sequence assembly (Figure 2c). In contrast, specific enzymatic digestion cleaves at predictable sites and requires a relatively stable, controllable digestion time. Non-specific digestion, lacking fixed cleavage sites, necessitates strict control of digestion duration to prevent over-digestion, which could generate excessively short peptides detrimental to assembly [30]. Both Proteinase K digestion and microwave-assisted acid hydrolysis exhibited high non-specificity, generating peptides with substantial sequence overlap. Proteinase K yielded a higher number of peptides than MAAH, likely attributable to the challenges in precisely controlling the temperature during MAAH compared to the more stable enzymatic conditions (Figure 2d). Although peptides derived from specific enzymatic digestion generally exhibit higher per-residue accuracy and slightly elevated confidence scores in de novo sequencing, they are considerably fewer in number compared to those generated by non-specific digestion. Moreover, the sequence overlap achievable with specific-digestion peptides is inherently limited by the distribution of the enzyme’s cleavage sites along the protein sequence. Among specific proteases, trypsin outperformed Glu-C, yielding a greater number of peptides with higher overall scores (Figure 2e). This advantage can be attributed to the superior fragmentation efficiency of tryptic peptides during mass spectrometry, which produces more continuous b- and y-ion series, thereby significantly enhancing the accuracy of de novo sequencing [29,31].

To assess how peptide characteristics influence the sequencing results, the correlation between peptide length and confidence scores was examined. The de novo sequenced peptides for Ricin toxin exhibited a length distribution of 7–29 amino acids and confidence scores ranging from 0 to 100. A clear inverse correlation was observed between peptide length and confidence score, wherein longer peptides generally corresponded to lower scores (Figure 2f). This trend can be attributed to the higher probability of fragment ion loss in longer peptides, which compromises sequencing accuracy. In contrast, shorter peptides typically provide more complete b-/y-ion coverage and exhibit better spectral continuity, resulting in more reliable sequence inference. While longer peptides theoretically reduce the number of assembly steps, they become less favorable when sequencing accuracy is limited. Conversely, extremely short peptides increase the risk of random k-mer repetition and are also suboptimal for assembly. To balance these factors, a minimum k-value of 6 was adopted in this study to enhance the robustness and efficiency of the subsequent sequence assembly process.

2.2. First-Round Sequence Assembly

To mitigate errors inherent in de novo sequencing, further filtering of the peptide sequences was necessary. Each de novo sequenced peptide was segmented into K-mers, the credibility of which was evaluated based on their frequency of occurrence. A high-frequency K-mer typically indicates that its sequence is supported by abundant, mutually verifying fragment ion information across multiple MS/MS spectra, signifying high confidence. In contrast, K-mers derived from erroneous peptide sequences exhibit low frequencies. Filtering out K-mers below a specified frequency threshold effectively removes these error-prone sequences (Figure 3a). The assembled sequences from different starting points were aligned. A majority voting principle was then applied at each amino acid position to identify and correct errors, such as isomeric amino acid pairs (e.g., GG=N and GA=Q), amino acid inversions (e.g., ES=SE), deamidation (e.g., Q/E, N/D), and isomeric forms of I and L [32]. The frequency score of each amino acid residue was calculated to assess the local confidence of the full-length sequence. Multiple initiation points were used for assembly to reconstruct the complete protein sequence. The sequences generated from these different starting points showed high overall similarity but contained variations at specific local sites (Figure 3b). Homology analysis of the initially assembled sequences confirmed that the protein consisted of the A and B chains of Ricin toxin. Because the de novo-derived peptides were randomly mixed and lacked positional information, we conducted homology analysis of the first-round sequence assembly to identify the N-terminal regions of the Ricin A and B chains, thereby establishing reliable initiation sites for sequence reconstruction. Figure 3c,d display representative MS/MS spectra of the N-terminal sequences of the A chain (IFPKQYPIINF) and the B chain (ADVCMDPEPIVR). Despite the error correction procedures, the initially assembled A and B chain sequences remained incomplete. This was attributed to factors such as insufficient peptide overlap at specific amino acid sites, unexpected modifications, glycosylation sites, and enzyme cleavage biases, all of which can compromise the accuracy of de novo peptide sequencing.

Figure 3. Evaluation during the splicing process. (a) Frequency distribution of short k-mers from de novo sequencing used to assess k-mer credibility. (b) The top three assembled sequences and site-specific confidence scores for the Ricin B chain after the first assembly round. (c) MS/MS spectrum of the N-terminal sequence of the A chain (IFPKQYPIINF). (d) MS/MS spectrum of the N-terminal sequence of the B chain (ADVCMDPEPIVR).

2.3. Second-Round Sequence Assembly

To enhance the accuracy of sequence assembly, homologous sequences of the Ricin toxin A and B chains were identified by BLAST analysis, and peptides obtained through homologous sequence database searching were utilized in the second-round assembly. A total of 193 homologous sequences were successfully matched. Figure 4a,b illustrate amino acids 110–420 of the A chain and 450–745 of the B chain, respectively, encompassing regions contributing to the final protein sequencing results. At the level of individual amino acid residues, the number of matched homologous sequences serves as an indicator of evolutionary conservation and structural stability. Alignment analysis revealed that the B chain exhibits a high degree of conservation, whereas the A chain displays pronounced variability among homologs. This alignment information was further employed to resolve ambiguous isobaric amino acid residues—such as leucine and isoleucine—that cannot be distinguished by de novo sequencing alone. It should be noted that residues in low-coverage regions of the homologous alignment may be unreliable owing to possible sequence mutations.

Figure 4. Comparative analysis of homologous sequences for the Ricin A and B chains. (a) Alignment of homologous sequences for the Ricin A chain (amino acids 110–420). (b) Alignment of homologous sequences for the Ricin B chain (amino acids 450–745).

Subsequently, peptide sequences from both de novo sequencing and the database search were merged for a second sequence assembly. The assembled sequences were then aligned with homologous references, yielding the complete Ricin A and B chains. The assembly quality was assessed by calculating sequence coverage, which was confirmed to be 100% for both chains, as presented in Figure 5. The contributions of different digestion types to sequence coverage were not uniform, with non-specific cleavages (including proteinase K and MAAH) providing a greater contribution to overall coverage. The peptide sequences used for the second-round assembly are listed in Table S5.

Figure 5. Sequence coverage map of the second-round assembly. (a) Ricin toxin A chain (267 amino acids), achieving complete (100%) sequence coverage. (b) Ricin toxin B chain (262 amino acids), achieving complete (100%) sequence coverage. The different colors of the covered peptides indicate the types of enzymatic digestions.

Figure 6 presents the alignment of the amino acid sequences of Ricin toxin obtained via the HDPS method against the corresponding reference sequence (UniProt P02879, NCBI GI:132567) [33]. The HDPS approach achieved full-length coverage for both chains, exhibiting an accuracy of 98.13% (262/267) for the A chain and 98.47% (258/262) for the B chain. Moreover, the method correctly distinguished isobaric amino acid residues (I/L) with an accuracy of 95.40% (83/87), further demonstrating its exceptional reliability and high precision in reconstructing complex protein sequences.

Figure 6. Alignment of the amino acid sequences of Ricin toxin obtained via HDPS with the reference sequence (UniProt P02879, NCBI GI:132567). (a) Ricin A chain alignment, achieving 98.13% sequence accuracy. (b) Ricin B chain alignment, achieving 98.47% sequence accuracy.

HDPS is a novel sequence assembly strategy that uniquely integrates homology-based database searches with a two-round sequence assembly, thereby enhancing the utilization of informative peptides derived from mass spectrometry data. A comparative evaluation was performed between the HDPS framework and the established ALPS software [24,34]. As shown in Table 1, both platforms achieved complete sequence coverage for the Ricin A and B chains. Nevertheless, HDPS exhibited higher sequencing accuracy, reaching 98.13% for the A chain and 98.47% for the B chain, compared with 95.88% and 95.80% obtained by ALPS. This systematic comparison demonstrates that HDPS enhances full-length protein sequencing coverage and accuracy while providing a robust approach for characterizing proteins with limited or unknown reference information.

Table 1. Assembly performance comparison between ALPS and HDPS using de novo sequenced peptides.

3. Conclusions

In this study, a combination of specific and nonspecific enzymatic cleavage methods was used to increase the diversity of peptides derived from an unknown toxin protein, thereby facilitating sequence assembly by providing sufficient overlap. Starting from de novo sequencing of the peptide segments, secondary splicing was performed by integrating homology search and database identification, resulting in the full-length restoration of the complete amino acid sequence of the toxin. By incorporating peptide coverage information obtained at different PSM confidence levels from enzymatic cleavage, the reliability of the assembled sequence was evaluated in an intuitive manner, offering a novel strategy for identifying unknown biological toxins. The accuracy of de novo peptide sequencing remains challenging and has room for further improvement. Combining homologous sequences to identify novel proteins with extensive mutations constitutes a useful strategy, as exemplified in antibody sequencing studies. Additionally, unexpected post-translational modifications, such as glycosylation, can significantly affect the accurate assignment of peptide sequences during de novo sequencing. Although we have obtained supporting peptide evidence through homologous sequence searches, further optimization is needed to enhance the accuracy of de novo sequencing, particularly in the presence of unanticipated modifications, including glycosylation. Overall, we have established a comprehensive and flexible full-length protein sequencing strategy capable of accurately characterizing multi-subunit toxins with unknown sequences, which is particularly valuable for identifying engineered or naturally occurring toxin variants as well as novel toxins not yet represented in existing databases. Moreover, the HDPS strategy demonstrates broad applicability in protein identification and full-length protein sequencing, providing a reliable and scalable approach for addressing complex challenges in protein characterization.

4. Materials and Methods

4.1. Sample Preparation

The sterile Ricin protein was purchased from Beijing Hapten and Protein Biomedical Institute, Beijing, China. It was purified by affinity chromatography at a concentration of 2 mg/mL and subsequently processed via enzymatic digestion and MAAH. For enzymatic digestion, Ricin was digested using four proteases: Trypsin (Promega, V511A), Glu-C (Promega, V1651), Chymotrypsin (Promega, V1061), and Proteinase K (Promega, V3021). The sample was first loaded into a 10 kDa molecular weight cut-off ultrafiltration centrifuge tube (Sartorius, VN01H02) [35]. After centrifugation, denaturation was performed using 8 M urea, followed by reduction with dithiothreitol (DTT) and alkylation with chloroacetamide (CAA). The final concentrations of DTT and CAA were 10 mM and 50 mM, respectively. Enzymatic digestion was carried out under the following conditions: Trypsin and Glu-C digestions were performed in 50 mM ammonium bicarbonate (pH 7.8) at 37 °C for 18 h with an enzyme-to-substrate ratio of 1:50 (w/w). Chymotrypsin digestion was performed in 100 mM Tris-HCl (pH 8.0) at 25 °C for 18 h with an enzyme-to-substrate ratio of 1:50 (w/w). Proteinase K digestion was performed in 50 mM Tris-HCl (pH 8.0) at 37 °C for 5 min and 20 min at an enzyme concentration of 50 µg/mL [36]. All digestion reactions were quenched by adding formic acid. After digestion, the peptide solution in the ultrafiltration tube was collected by centrifugation and vacuum-dried at 60 °C. Prior to MAAH, Ricin was similarly subjected to denaturation, reduction, and alkylation. The buffer was then exchanged for pure water, and the sample was transferred to a glass vial. Hydrochloric acid was added to a final concentration of 3 M, and microwave irradiation was applied at 700 W for 4 min with ice insulation, with the ice replenished every minute. The hydrolyzed sample was then desalted and vacuum-dried.

4.2. Liquid Chromatography and Mass Spectrometry Analysis

The digested peptides were dissolved in 0.1% formic acid and analyzed online using an Easy-nLC 1200 HPLC system (Thermo Fisher Scientific, Waltham, MA, USA) coupled to an Orbitrap Q-Exactive HF mass spectrometer (Thermo Fisher Scientific), with separation on a self-packed C18 column (15 cm × 150 μm, 1.9 μm particle size) via gradient elution. A 60 min elution gradient was applied. Mobile phase A consisted of 0.1% formic acid in water, and mobile phase B consisted of 80% acetonitrile and 0.1% formic acid in water. The gradient program was as follows: 4% B at 0 min, 7% B at 1 min, 13% B at 5 min, 25% B at 35 min, 45% B at 53 min, 95% B at 54 min, and held at 95% B until 60 min. The flow rate was set to 600 nL/min. Mass spectrometry data were acquired in data-dependent acquisition (DDA) mode. Full-scan MS spectra (350–2000 m/z) were acquired at a resolution of 60,000, with an AGC target of 3 × 10⁶ and a maximum injection time of 30 ms. The top 20 most intense precursor ions were selected for fragmentation. MS/MS spectra were acquired at a resolution of 15,000, with an isolation window of 1.6 m/z, an AGC target of 5 × 10⁴, a maximum injection time of 45 ms, a normalized collision energy of 27%, a dynamic exclusion of 45.0 s, a charge exclusion of 1, 8, >8, and an intensity threshold of 1.8 × 10⁴.

4.3. Peptide De Novo Sequencing

Raw data files conversion was performed with ProteoWizard’s msConvert (version 3.0.24026), with a binary encoding precision of 64 bits. For the mgf conversion, centroiding was performed using the vendor’s peak picking algorithm included in msConvert. De novo sequencing was performed using pNovo software (version 3.1.5). For data from Trypsin, Glu-C, and Chymotrypsin digestions, the corresponding specific enzymatic cleavage sites were selected. For data from Proteinase K digestion and MAAH, non-specific cleavage was selected. Higher-energy collisional dissociation (HCD) was selected as the fragmentation mode. The precursor mass tolerance was set to ±10 ppm, and the fragment ion mass tolerance was set to ±0.02 Da. Carbamidomethylation [C] was set as a fixed modification, while Oxidation [M], Deamidation [N] and Deamidation [Q] were set as variable modifications. De novo sequencing results for peptides were obtained by searching the MGF files corresponding to the five hydrolysis methods.

4.4. First-Round Sequence Assembly

Peptides with a PSM score greater than 20 from the de novo sequencing results were retained. Each peptide was segmented into shorter fragments of a fixed length K (K-mers) using a sliding window. The occurrence frequency (R) of each K-mer was calculated, and only K-mers with a frequency greater than 2 were retained for assembly. De novo sequencing results from multiple enzymatic digestions were merged. The top 3000 most frequent K-mers were selected as starting points for assembly. Extension was performed by searching for overlapping K-mers (K-1 overlap) from both ends. Only assembled sequences with lengths between 50 and 400 amino acids were retained.

4.5. Homologous Sequences Database Search

The first assembled sequences were subjected to homology analysis using the Protein BLAST tool on the NCBI website (https://blast.ncbi.nlm.nih.gov/ accessed on 10 October 2024). The obtained protein sequences were compiled into a custom homologous sequence database. Database searching against the custom database was performed on the raw files from the multiple digestions using pFind software (version 3.2.0). Search parameters were consistent with those used for pNovo.

4.6. Second-Round Sequence Assembly

Peptide sequences identified from the database search offer higher accuracy but cannot detect sequence mutations or unknown proteins. Therefore, peptides from both de novo sequencing and database search results were combined for a second round of sequence assembly. Peptides with a PSM score > 20 from de novo results and peptides with an FDR < 1% from database search results were retained. The K-mer length was set to 7. The top 100 most frequent K-mers were used as assembly initiation points. K-mers with a frequency greater than 2 were used for assembly, and unique sequences with lengths between 50 and 400 amino acids were retained. Assembled sequences from different starting points showed high similarity, differing only at specific local sites. A sequence alignment method was employed to align the amino acids at each position. The frequency of occurrence of each amino acid at a given position was used as an assembly confidence score for that site. This information was utilized to correct errors such as isobaric amino acid assignments and amino acid rearrangements.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/toxins17110564/s1, Table S1: Peptide De Novo Sequencing Results; Table S2: Score Distribution of De Novo Sequencing Results; Table S3: Length Distribution of De Novo Sequencing Results; Table S4: Database Search Peptides; Table S5: Peptide Sequences Used for the Second-Round Assembly; Table S6: Chromatography and Mass Spectrometry Parameters.

Author Contributions

Y.S. and H.W. contributed equally. Conceptualization, F.W. and Y.Z.; methodology, Y.S. and H.W.; investigation, J.X. and S.Z.; data curation, Y.S. and H.W.; writing—original draft preparation, Y.S., H.W. and J.W.; supervision, F.W. and Y.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the State Key Laboratory of Chemistry for NBC Hazards Protection (Grant No. SKLNBC2023-09).

Data Availability Statement

All the mass spectrometry proteomics data associated with this study have been deposited at the ProteomeXChange Consortium via the PRIDE repository with identifier PXD061213. You can access the dataset by logging in to the PRIDE website using the following account details. Username: reviewer_pxd061213@ebi.ac.uk, Password: UDBL8vKulBkt.

Acknowledgments

We gratefully acknowledge the technical support from the Analysis & Testing Center of Beijing Institute of Technology.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Endo, Y.; Tsurugi, K. RNA N-glycosidase activity of ricin A-chain. Mechanism of action of the toxic lectin ricin on eukaryotic ribosomes. J. Biol. Chem. 1987, 262, 8128–8130. [Google Scholar] [CrossRef]
Olsnes, S.; Kozlov, J.V. Ricin. Toxicon 2001, 39, 1723–1728. [Google Scholar] [CrossRef] [PubMed]
Van Deurs, B.; Pedersen, L.R.; Sundan, A.; Olsnes, S.; Sandvig, K. Receptor-mediated endocytosis of a ricin-colloidal gold conjugate in vero cells: Intracellular routing to vaculoar and tubulo-vesicular portions of the endosomal system. Exp. Cell Res. 1985, 159, 287–304. [Google Scholar] [CrossRef] [PubMed]
Wesche, J.; Rapak, A.; Olsnes, S. Dependence of ricin toxicity on translocation of the toxin A-chain from the endoplasmic reticulum to the cytosol. J. Biol. Chem. 1999, 274, 34443–34449. [Google Scholar] [CrossRef]
Sandvig, K.; Spilsberg, B.; Lauvrak, S.U.; Torgersen, M.L.; Iversen, T.-G.; Van Deurs, B. Pathways followed by protein toxins into cells. Int. J. Med. Microbiol. 2004, 293, 483–490. [Google Scholar] [CrossRef]
Pincus, S.H.; Bhaskaran, M.; Brey, R.N., III; Didier, P.J.; Doyle-Meyers, L.A.; Roy, C.J. Clinical and pathological findings associated with aerosol exposure of macaques to ricin toxin. Toxins 2015, 7, 2121–2133. [Google Scholar] [CrossRef]
Sowa-Rogozińska, N.; Sominka, H.; Nowakowska-Gołacka, J.; Sandvig, K.; Słomińska-Wojewódzka, M. Intracellular transport and cytotoxicity of the protein toxin ricin. Toxins 2019, 11, 350. [Google Scholar] [CrossRef]
König, S.; Obermann, W.M.; Eble, J.A. The current state-of-the-art identification of unknown proteins using mass spectrometry exemplified on de novo sequencing of a venom protease from Bothrops moojeni. Molecules 2022, 27, 4976. [Google Scholar] [CrossRef]
Janik, E.; Ceremuga, M.; Saluk-Bijak, J.; Bijak, M. Biological toxins as the potential tools for bioterrorism. Int. J. Mol. Sci. 2019, 20, 1181. [Google Scholar] [CrossRef]
Moshiri, M.; Hamid, F.; Etemad, L. Ricin toxicity: Clinical and molecular aspects. Rep. Biochem. Mol. Biol. 2016, 4, 60. [Google Scholar]
Griffiths, G.; Newman, H.; Gee, D. Identification and quantification of ricin toxin in animal tissues using ELISA. J. Forensic Sci. Soc. 1986, 26, 349–358. [Google Scholar] [CrossRef] [PubMed]
Guglielmo-Viret, V.; Splettstoesser, W.; Thullier, P. An immunochromatographic test for the diagnosis of ricin inhalational poisoning. Clin. Toxicol. 2007, 45, 505–511. [Google Scholar] [CrossRef] [PubMed]
Shyu, R.-H.; Shyu, H.-F.; Liu, H.-W.; Tang, S.-S. Colloidal gold-based immunochromatographic assay for detection of ricin. Toxicon 2002, 40, 255–258. [Google Scholar] [CrossRef]
Sousa, R.B.; Lima, K.S.; Santos, C.G.; França, T.C.; Nepovimova, E.; Kuca, K.; Dornelas, M.R.; Lima, A.L. A new method for extraction and analysis of ricin samples through MALDI-TOF-MS/MS. Toxins 2019, 11, 201. [Google Scholar] [CrossRef]
Chen, D.; Bryden, W.A.; Fenselau, C. Rapid analysis of ricin using hot acid digestion and MALDI-TOF mass spectrometry. J. Mass Spectrom. 2018, 53, 1013–1017. [Google Scholar] [CrossRef]
Dorner, B.G.; Zeleny, R.; Harju, K.; Hennekinne, J.-A.; Vanninen, P.; Schimmel, H.; Rummel, A. Biological toxins of potential bioterrorism risk: Current status of detection and identification technology. TrAC Trends Anal. Chem. 2016, 85, 89–102. [Google Scholar] [CrossRef]
Zhao, L.; Svetec, N.; Begun, D.J. De novo genes. Annu. Rev. Genet. 2024, 58, 211–232. [Google Scholar] [CrossRef]
Yanes, L.; Accinelli, G.G.; Wright, J.; Ward, B.J.; Clavijo, B.J. A Sequence Distance Graph framework for genome assembly and analysis. F1000Research 2019, 8, 1490. [Google Scholar] [CrossRef]
Jeong, K.; Kim, S.; Pevzner, P.A. UniNovo: A universal tool for de novo peptide sequencing. Bioinformatics 2013, 29, 1953–1962. [Google Scholar] [CrossRef]
Zhang, J.; Xin, L.; Shan, B.; Chen, W.; Xie, M.; Yuen, D.; Zhang, W.; Zhang, Z.; Lajoie, G.A.; Ma, B. PEAKS DB: De novo sequencing assisted database search for sensitive and accurate peptide identification. Mol. Cell. Proteom. 2012, 11, M111.010587. [Google Scholar] [CrossRef]
Pieri, M.; Lombardi, A.; Basilicata, P.; Mamone, G.; Picariello, G. Proteomics in forensic sciences: Identification of the nature of the last meal at autopsy. J. Proteome Res. 2018, 17, 2412–2420. [Google Scholar] [CrossRef]
Yang, Y.; Shevchenko, A.; Knaust, A.; Abuduresule, I.; Li, W.; Hu, X.; Wang, C.; Shevchenko, A. Proteomics evidence for kefir dairy in Early Bronze Age China. J. Archaeol. Sci. 2014, 45, 178–186. [Google Scholar] [CrossRef]
Keller, J.I.; Lima-Cordón, R.; Monroy, M.C.; Schmoker, A.M.; Zhang, F.; Howard, A.; Ballif, B.A.; Stevens, L. Protein mass spectrometry detects multiple bloodmeals for enhanced Chagas disease vector ecology. Infect. Genet. Evol. 2019, 74, 103998. [Google Scholar] [CrossRef] [PubMed]
Mai, Z.-B.; Zhou, Z.-H.; He, Q.-Y.; Zhang, G. Highly robust de novo full-length protein sequencing. Anal. Chem. 2022, 94, 3467–3475. [Google Scholar] [CrossRef]
Li, N.; Wang, X.; Wang, H.; Liu, F.; Song, Y.; Lu, J.; Zhang, Y. Homo-Tag-Assembler Assay for Full-Length Antibody Sequencing. Anal. Chem. 2025, 97, 16525–16532. [Google Scholar] [CrossRef] [PubMed]
Le Bihan, T.; Nunez de Villavicencio Diaz, T.; Reitzel, C.; Lange, V.; Park, M.; Beadle, E.; Wu, L.; Jovic, M.; Dubois, R.M.; Couzens, A.L.; et al. De novo protein sequencing of antibodies for identification of neutralizing antibodies in human plasma post SARS-CoV-2 vaccination. Nat. Commun. 2024, 15, 8790. [Google Scholar] [CrossRef]
Muth, T.; Hartkopf, F.; Vaudel, M.; Renard, B.Y. A potential golden age to come—Current tools, recent use cases, and future avenues for de novo sequencing in proteomics. Proteomics 2018, 18, 1700150. [Google Scholar] [CrossRef]
Muth, T.; Renard, B.Y. Evaluating de novo sequencing in proteomics: Already an accurate alternative to database-driven peptide identification? Brief. Bioinform. 2018, 19, 954–970. [Google Scholar] [CrossRef] [PubMed]
Bozza, W.P.; Tolleson, W.H.; Rosado, L.A.R.; Zhang, B. Ricin detection: Tracking active toxin. Biotechnol. Adv. 2015, 33, 117–123. [Google Scholar] [CrossRef]
Morsa, D.; Baiwir, D.; La Rocca, R.; Zimmerman, T.A.; Hanozin, E.; Grifnée, E.; Longuespée, R.; Meuwis, M.-A.; Smargiasso, N.; Pauw, E.D.; et al. Multi-enzymatic limited digestion: The next-generation sequencing for proteomics? J. Proteome Res. 2019, 18, 2501–2513. [Google Scholar] [CrossRef]
Liang, L.-H.; Cheng, X.; Yu, H.-L.; Yang, Y.; Mu, X.-H.; Chen, B.; Li, X.-S.; Wu, J.-N.; Yan, L.; Liu, C.-C.; et al. Quantitative detection of ricin in beverages using trypsin/Glu-C tandem digestion coupled with ultra-high-pressure liquid chromatography-tandem mass spectrometry. Anal. Bioanal. Chem. 2021, 413, 585–597. [Google Scholar] [CrossRef] [PubMed]
Schulte, D.; Snijder, J. A Handle on Mass Coincidence Errors in de Novo sequencing of antibodies by bottom-up proteomics. J. Proteome Res. 2024, 23, 3552–3559. [Google Scholar] [CrossRef]
Worbs, S.; Skiba, M.; Söderström, M.; Rapinoja, M.-L.; Zeleny, R.; Russmann, H.; Schimmel, H.; Vanninen, P.; Fredriksson, S.-Å.; Dorner, B.G. Characterization of ricin and R. communis agglutinin reference materials. Toxins 2015, 7, 4906–4934. [Google Scholar] [CrossRef]
Tran, N.H.; Rahman, M.Z.; He, L.; Xin, L.; Shan, B.; Li, M. Complete de novo assembly of monoclonal antibody sequences. Sci. Rep. 2016, 6, 31730. [Google Scholar] [CrossRef] [PubMed]
Zhang, Z.; Dubiak, K.M.; Huber, P.W.; Dovichi, N.J. Miniaturized filter-aided sample preparation (MICRO-FASP) method for high throughput, ultrasensitive proteomics sample preparation reveals proteome asymmetry in Xenopus laevis embryos. Anal. Chem. 2020, 92, 5554–5560. [Google Scholar] [CrossRef]
Wang, Y.; Qin, P.; Hong, J.; Li, N.; Zhang, Y.; Deng, Y. Deep Membrane Proteome Profiling of Rat Hippocampus in Simulated Complex Space Environment by SWATH. Space Sci. Technol. 2021, 2021, 9762372. [Google Scholar] [CrossRef]

Figure 1. Workflow of the Heuristic De Novo Protein Sequencing (HDPS) technique.

Figure 2. Characterization of de novo sequencing results for Ricin toxin peptides using five hydrolysis methods. (a) Length distribution results of de novo sequencing peptides analyzed by five hydrolysis methods. (b) Score distribution results of de novo sequencing peptides analyzed by five hydrolysis methods. (c) Score distribution comparison between specific and nonspecific hydrolysis methods. (d) Length comparison of peptides from the Proteinase K and MAAH methods (e) Score comparison of peptides hydrolyzed by Trypsin and Glu-C. (f) Combined score and length distribution of de novo sequencing peptides of Ricin toxin.

Figure 3. Evaluation during the splicing process. (a) Frequency distribution of short k-mers from de novo sequencing used to assess k-mer credibility. (b) The top three assembled sequences and site-specific confidence scores for the Ricin B chain after the first assembly round. (c) MS/MS spectrum of the N-terminal sequence of the A chain (IFPKQYPIINF). (d) MS/MS spectrum of the N-terminal sequence of the B chain (ADVCMDPEPIVR).

Figure 4. Comparative analysis of homologous sequences for the Ricin A and B chains. (a) Alignment of homologous sequences for the Ricin A chain (amino acids 110–420). (b) Alignment of homologous sequences for the Ricin B chain (amino acids 450–745).

Figure 5. Sequence coverage map of the second-round assembly. (a) Ricin toxin A chain (267 amino acids), achieving complete (100%) sequence coverage. (b) Ricin toxin B chain (262 amino acids), achieving complete (100%) sequence coverage. The different colors of the covered peptides indicate the types of enzymatic digestions.

Figure 6. Alignment of the amino acid sequences of Ricin toxin obtained via HDPS with the reference sequence (UniProt P02879, NCBI GI:132567). (a) Ricin A chain alignment, achieving 98.13% sequence accuracy. (b) Ricin B chain alignment, achieving 98.47% sequence accuracy.

Table 1. Assembly performance comparison between ALPS and HDPS using de novo sequenced peptides.

	Ricin A Chain		Ricin B Chain
	ALPS	HDPS	ALPS	HDPS
Coverage	100%	100%	100%	100%
Accuracy	95.88%	98.13%	95.80%	98.47%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Identification of Unknown Biological Toxin Proteins Using Mass Spectrometry: A Case Study on De Novo Sequencing of Ricin

Abstract

1. Introduction

2. Results and Discussion

2.1. Peptide De Novo Sequencing

2.2. First-Round Sequence Assembly

2.3. Second-Round Sequence Assembly

3. Conclusions

4. Materials and Methods

4.1. Sample Preparation

4.2. Liquid Chromatography and Mass Spectrometry Analysis

4.3. Peptide De Novo Sequencing

4.4. First-Round Sequence Assembly

4.5. Homologous Sequences Database Search

4.6. Second-Round Sequence Assembly

Supplementary Materials

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics