Identification of Novel Conotoxin Precursors from the Cone Snail Conus spurius by High-Throughput RNA Sequencing

Marine gastropods of the genus Conus, comprising more than 800 species, have the characteristic of injecting worms and other prey with venom. These conopeptide toxins, highly diverse in structure and action, are highly potent and specific for their molecular targets (ion channels, receptors, and transporters of the prey’s nervous system), and thus are important research tools and source for drug discovery. Next-generation sequencing technologies are speeding up the discovery of novel conopeptides in many of these species, but only limited information is available for Conus spurius, which inhabits sandy mud. To search for new precursor conopeptides, we analyzed the transcriptome of the venous ducts of C. spurius and identified 55 putative conotoxins. Seven were selected for further study and confirmed by Sanger sequencing to belong to the M-superfamily (Sr3.M01 and Sr3.M02), A-superfamily (Sr1.A01 and Sr1.A02), O-superfamily (Sr15.O01), and Con-ikot-ikot (Sr21.CII01 and Sr22.CII02). Six of these have never been reported. To our knowledge, this report is the first to use high-throughput RNA sequencing for the study of the diversity of C. spurius conotoxins.


Introduction
Gastropods of the genus Conus are among the many marine invertebrates that produce important compounds with specific biological activity. More than 800 Conus species are recognized, all having a sophisticated system to inject a neurotoxic venom that rapidly paralyzes its prey [1]. These venoms are composed of a complex mixture of mostly disulfide-rich neurotoxic peptides with 10-30 residues, commonly known as conotoxins (or conopeptides), that affect the central and peripheral nervous systems [2].
Currently, more than 2000 nucleotide sequences and 8000 peptide sequences of conotoxins have been published, but to date, less than 0.1% have been characterized at the level of their molecular targets [3,4]. Based on the similarity of their signal peptide regions, conotoxins have been categorized into more than 30 gene superfamilies: A, B 1 , B 2 , B 3 , C, D, E, F, G, H, I, I 1 , I 2 , I 3 , J, K, L, M, N, O 1 , O 2 , O 3 , P, Q, R, S, T, U, V, Y, Con-ikot-ikots, ConoCAPs, Conopressins, Conkunitzins, and Conodipins [5,6]. Each gene superfamily can include toxins belonging to different pharmacological families, defined by their molecular targets and pharmacological activities over them [4,5]; however, several distinct gene superfamilies have been shown to contain members belonging to one or more particular pharmacological families. Their structures and functions are highly diverse, and they primarily target membrane proteins, in particular ion channels, membrane receptors and transporters [7,8]. The conotoxin open reading frame (ORF) generally consists of a signal sequence named the pre region, an intervening pro region called sometimes the propeptide, the mature peptide region, and, sometimes, a region located after the mature peptide that is excised out during maturation [5][6][7][8][9]. These peptide toxins have been the subject of considerable attention, including their utility as molecular tools in the field of physiology, largely due to their high potency and specificity on human ion channels [10,11]. These same properties confer them potential for important clinical applications in their native form or as models for drug design. Two examples of the utility and potential for clinical application, respectively, are MVIIA (ziconotide) conopeptide isolated from Conus magus to treat chronic pain in patients with severe cancer or AIDS [12,13] and α-conotoxin (Vc1.1) from Conus victoriae to treat intense, chronic neuropathic pain [14].
Conus spurius is distributed along the coast of the Gulf of Mexico and its diet is based on wandering polychaetes and hemichordates [15]. It has been reported that C. spurius produce toxins in several gene superfamilies, such as, for example, I 2 (κ-conotoxins) [16,17], A-(α-conotoxins) [18], O 1 [19], and T [20], and other conopeptides not yet classified into superfamilies, such as conorfamides [21,22]. Because next-generation sequencing approaches, such as transcriptomics, have proven useful for rapid discovery of new conopeptide sequences in several Conus species [4], here we used RNA-Seq analysis to identify new conopeptides of C. spurius.
We identified 80 amino acid (aa) sequences, for which only 55 putative conotoxins were assigned to a known superfamily. Seven of these were selected to validate the bioinformatics analyses through RT-PCR sequencing. This omics approach enabled the discovery of six novel conotoxin sequences with biotechnological potential.

Putative Conopeptide Precursors Predicted by ConoSorter
Around 156,215,000 raw reads were assembled using Trinity software, yielding 141,629 transcripts with a mean length of 588.31 base pairs (bp), which were analyzed with ConoSorter. In the Regular Expression file generated by ConoSorter, 52,457 putative conopeptide precursor protein sequences were identified from all possible translations of the assembled sequences using six reading frames, and 3,642 transcripts of conopeptides in the pHMM file. A total of 56,099 amino acid sequences obtained from ConoSorter were filtered according to Prashanth and Lewis [23] criteria, resulting in a total of 4310 putative conopeptide precursors. Subsequently, in a BlastX search using Blast2GO software, 318 amino acid sequences (7.3%) were annotated, with only 80 peptide sequences having average sequence identity >50% with conotoxins related to a species of Conus.
We identified two conopeptides (Sr1.A01 and Sr1.A02) with cysteine framework I (CC-C-C). The Sr1.A02 conopeptide is similar to the α-conotoxins SrIA and SrIB previously reported by López-Vera et al. [18]. The only difference in Sr1.A02 is the Ile residue in the third position of the signal peptide. However, this synonymous variant does not affect changes in the mature toxins ( Figure 1B).
The O-superfamily conotoxins are composed of four Cys frameworks (XII, XV, VI, and VII) and classified as δ, µO, ω, κ, and γ families [3,4]. One conopeptide O 2 -superfamily (Sr15.O01) was identified as sharing cysteine framework XV. The mature protein contains an arrangement of eight Cys residues. However, the arrangement of eight Cys residues differs from other the O 2 superfamily from other Conus species as Cerm_305 precursors from Conus ermineus [25] and Lt15a precursors of C. litteratus both with eight Cys residues in the mature toxin but at different positions [26] ( Figure 1C).
Con-ikot-ikot toxin (CII) was identified for the first time in Conus striatus, one of the most common species of piscivorous cone snails, and has an effect on AMPA receptors, inhibiting channel desensitization [27]. In C. spurius, we identified two Con-ikot-ikot precursors (Sr21.CII01 and Sr22.CII02): the Sr21.CII01 conopeptide precursor contains a mature toxin with 77 amino acid residues (aa) and 10 Cys residues. Alignment showed 50% identity with the cysteine frameworks of the G005_VD precursor from Conus geographus [28] and ARCII16 precursor from Conus arenatus [29] (Figure 2A). frameworks (I, II, IV, VI/VII, XIV, and XXII) and to affect at least one of these three targets: nicotinic acetylcholine receptors (nAChRs) subtypes, the GABAB receptor, and the α1adrenoceptor [3][4][5]. We identified two conopeptides (Sr1.A01 and Sr1.A02) with cysteine framework I (CC-C-C). The Sr1.A02 conopeptide is similar to the α-conotoxins SrIA and SrIB previously reported by López-Vera et al. [18]. The only difference in Sr1.A02 is the Ile residue in the third position of the signal peptide. However, this synonymous variant does not affect changes in the mature toxins ( Figure 1B).
The O-superfamily conotoxins are composed of four Cys frameworks (XII, XV, VI, and VII) and classified as δ, μO, ω, κ, and γ families [3,4]. One conopeptide O2-superfamily (Sr15.O01) was identified as sharing cysteine framework XV. The mature protein contains an arrangement of eight Cys residues. However, the arrangement of eight Cys residues differs from other the O2 superfamily from other Conus species as Cerm_305 precursors from Conus ermineus [25] and Lt15a precursors of C. litteratus both with eight Cys residues in the mature toxin but at different positions [26] (Figure 1C).
The conopeptide precursor Sr22.CII02 yields a mature toxin with 90 aa residues and eight Cys residues. The Blast search showed that precursor Sr22.CII02 shares 52% similarity with the Con-ikot-ikot precursors from two sister species, AMZ8.1II from Conus andremenezi and PS8.1 from Conus praecellens, with 10-Cys residues frameworks [30]. The alignment of these three sequences shows that they share the MTMDMKMTFS sequence in the signal peptide; however, the precursor from C. spurius lacks a propeptide region ( Figure 2B).

Discussion
The ConoSorter algorithm has been used to identify conotoxin precursors from RNAseq analysis of transcriptome of several Conus species, after assembly by Trinity and other algorithms. For example, in C. marmoreus [31], 158 novel conopeptide precursors were identified, and 106 of these were validated by protein mass spectrometry and classified among 13 novel gene superfamilies. In an analysis of three venom transcriptome libraries of C. literatus, 128 new putative conopeptides were identified and classified into 22 superfamilies [6].
In this work with C. spurius, we used the ConoSorter software and subsequently the same characterization pipeline as Prashanth and Lewis [23], where ConoSorter identified 4310 conopeptide precursors. After these sequences were annotated with the Blast2GO software, only 55 putative conopeptides were assigned to a gene superfamily using the ConoPrec tool of the ConoServer website. We also found three peptide sequences (neuropeptide FF receptor 2-like and two beta-defensins 50) that did not meet the characteristics of conotoxins; we also found 22 amino acid sequences that could not be assigned to a known superfamily, so they may belong to novel superfamilies not yet reported. The number of identified putative conopeptides is low relative to other studies where two or more transcriptomes were compared [6,32]. This is probably because only one cDNA library of the venom duct was analyzed here. Generally, in high-throughput sequencing analyses of transcriptomes of the Conus species, only the peptide sequences that were assembled and subsequently classified with ConoSorter have been reported. In our work, we used Sanger sequencing for in vitro experimental validation of the samples of seven raw cDNAs to confirm their presence [33]. Complete cDNA sequencing eliminates errors in assembly and then leads to a real classification of the conopeptide precursors. Thus, the results reported here allowed us to identify conotoxin sequences that have not previously been reported.
In this first approach using the transcriptome analysis to explore the toxin diversity in C. spurius, we focused on describing the conopeptide precursors that were verified by Sanger sequencing. However, the remaining 70 conopeptide sequences hypothetically correspond to conotoxin precursors.
Two M-superfamily precursors (Sr3.M01 and Sr3.M02) have a mature toxin with the same Cys pattern as that of Mi3-IP02 conopeptide precursor from C. miles. The mature toxin belongs to the MMSKL clade [24]. Toxins in the MMSK clade are found in Conus species that hunt fish, molluscs, and polychaetes, and have retained the common conotoxins from their ancestral Conus species [24]. Two A-superfamily precursors (Sr1.A01 and Sr1.A02) found in C. spurius, correspond to the alpha conotoxin group (α4/7) [9], conotoxins that preferentially target nAChRs and inhibit neuromuscular transmission and cause paralysis [34]. Alpha conotoxins have also been reported in other species of worm-hunting Conus species of the Eastern Pacific, such as Conus brunneus, Conus nux, and Conus princeps, for example [35].
Sr21.CII01 Con-ikot-ikot precursors with 10 Cyst-residues in the mature toxin that we identified have also been found in other Conus snail species, such as C. geographus [28], C. arenatus [28], and C. victoriae [37]. In our manual Blast search, Sr22.CII02 shared >54 similarity with Con-ikot-ikot precursors of C. praecellens and C. andremenezi [29], which share MTMDMKMTFS residues in the signal peptide. Possibly, Sr22.CII02 precursor is a novel member of the Con-ikot-ikot conopeptides, which then would not be exclusive to Conus fish hunters and that block desensitization of AMPA receptors in dendrites of the mammalian hippocampus [27].
Regarding conopeptides previously identified from this species at the protein and/or nucleic acid level, the results reported in this work (Table S1) confirmed the structure of peptides sr5a (Isolate Sr5.T.05) [20,38], a variant of sr7a which differs from it by one out of 32 residues (Isolate Sr6.O.08) [19], α-SrIA/B (Isolate Sr1.A.02) [18], and κ-SrXIA (Isolate Sr11.I.02) [16,17,39]. However, we did not identify any of the conorfamides, CNF-Sr1 [40], CNF-Sr2 [41], or CNF-Sr3 [21,22]. A tentative explanation is that this was because these peptides were purified from specimens collected off the coasts of the State of Yucatan, whereas the transcriptome was determined for individuals captured off the coasts of the State of Veracruz, and intraspecies variation in the expression of conotoxins is well known [42].

Biological Material
Five specimens were collected off the coast at the port of Veracruz, in the Gulf of Mexico in December 2015. The venom duct was excised from each living snail, immediately added to DNA/RNA Shield™ (Zymo Research, Tustin, CA, USA), incubated overnight at 6 • C, and then stored at −70 • C.

RNA Extraction and Library Preparation and Sequencing
RNA was isolated from a pool of venom ducts from five individuals of C. spurius, using Trizol Reagent and the manufacturer's protocols (Invitrogen, Carlsbad, CA, USA). The RNA was treated with Turbo DNA-free (Ambion, Austin, TX, USA). RNA Integrity values (RIN) were > 7.0 and measured using the Agilent 2100 BioAnalyzer system (Agilent Technologies, Santa Clara, CA, USA) with the RNA 6000 Nanochip. RNA samples were processed using the manufacturer's protocol for NEBNext Ultra RNA Library Prep Kit for Illumina (New England Biolabs, Ipswich, MA, USA) with the NEBNext Poly (A) mRNA Magnetic Isolation Module and the NEBNext Multiplex Oligos for Illumina. Briefly, 10 µL of library (4 nM) was mixed with 10 µL 0.1 N NaOH for 5 min, then the library was diluted to 20 pM in HT1 buffer. Sequencing was performed using Illumina NextSeq500 with 150-cycle High Throughput 2 × 75 cycles run.

De Novo Transcriptome Sequencing and Putative Conopeptide Precursors Predicted by ConoSorter
The raw data obtained from the RNA-seq were first filtered to remove adapters and low-quality reads using the NGS QC Toolkit v2.3.3 software [43] and program IlluQC.pl for Ilumina data using default parameters. Subsequently, the filtered reads were assembled by the de novo assembly package Trinity v2.12.41 [44]. For classifying the conopeptide superfamilies, query data were sorted initially using ConoSorter [31], which translates raw cDNA sequences into six reading frames and extracts sequences from the first start codon in each read to the first subsequent stop codon. The results generated two files, the Regex.tab file containing 52,457 unambiguously identified amino acid sequences and the pHMM.tab file containing 3642 unclassified amino acid sequences considered to be novel peptides. A total of 56,099 amino acid sequences were filtered using the workflow of Prashanth and Lewis [23], adjusting the parameters to number of reads (n ≥ 1), sequence length (50 to 300 amino acids), number of Cys residues (>4), hydrophobicity of the signal region (>50), class score (≥2) and superfamily score (≥1). To eliminate false amino acid sequences in the pHMM.tab file, we applied an e-value cut-off value (superfamily e-value < 0.001). Sequences that had no assignment to a superfamily were discarded. Only 4310 amino acid sequences met the parameters.

Annotation of Conotoxins
The 4310 amino acid sequences classified into various superfamilies were used as queries to align against sequences in the NCBI non-redundant protein database (Nr), with an Expect (E) value ≥0.001 and a 20-hit maximum, using the Blast algorithm with Blast2GO (in the package OmicsBox ver 1.1.164 (BioBam ® , Valencia, Spain) [45]. The putative conopeptide sequences were predicted using a local reference database of known conopeptides from the ConoServer databases and then examined manually using the ConoPrec tool [46].

Confirmation by RT-PCR
To validate the integrity of sequences assembled by Trinity v2.12.41, the nucleotide sequences for seven of these putative conotoxins were selected and primers designed for the regions flanking the ORFs (Table 2). Polymerase chain reactions (PCRs) were carried out in 50-µL reaction volumes using standard PCR reagents in a mixture containing 20 ng cDNA (remainder of the library), 1× Reaction Buffer, 2 mM MgCl 2 , 0.3 of each dNTP, 3 µM of each primer, 1 U of Taq DNA polymerase (Invitrogen TM , Carlsbad, CA, USA). The thermocycling conditions in the C1000 Touch TM thermocycler (Bio-Rad, Hercules, CA, USA) were 5 min at 95 • C for initial denaturation; 35 cycles of 94 • C for 40 s, 60 • C for 40 s, 72 • C for 45 s; and a final extension of 72 • C for 5 min. Table 2. Primer sequences used to amplify putative conopeptide genes using by RT-PCR.

ID-Trinity
Primer Sequence The PCR products were cloned into pCR ® -TOPO ® Vectors (Invitrogen TM , Carlsbad, CA, USA) via TA cloning and inserted into electrocompetent E. coli cells DH5α (Invitrogen TM , Carlsbad, CA, USA). Once clones were randomly selected for cDNA purification and sequencing, plasmids were purified using ZR Plasmid Miniprep (Zymo Research, Irvine, CA, USA) and were sequenced in both senses using the dideoxy chain termination method on a 3730 × l DNA Analyzer (Applied Biosystems, Foster, CA, USA) at the Laboratorio Nacional de Genómica para la Biodiversidad, CINVESTAV-Irapuato (Irapuato, Gto, Mexico).

Conclusions
Using high-throughput sequencing analysis and a subsequent in vitro validation, we identified 55 new conopeptides from C. spurius, distributed in 11 superfamilies (A, I, L, M, O, P, Q, S, T, W, and Z) and four groups (con-ikot-ikot, conoinsulin, conophysinconopressin, and conotoxin-specific protein disulfide isomerase). We also reported the presence of other peptides, such as beta-defensin 50, which shares 100% similarity with the sequence reported from rat (Rattus norvegicus) and the neuropeptide FF receptor 2-like peptide reported from the snail Biomphalaria glabrata. Twenty-two of the new conotoxins have not been assigned to a particular superfamily because of a lack of information on their corresponding signal peptide sequences or because they are new superfamilies that are not yet reported in the databases. This study demonstrated the usefulness of applying a transcriptomic approach and molecular assays to discover novel conopeptides in a poorly studied species. This is the first time that these conopeptide sequences have been reported, which contributes to the expansion of the knowledge of C. spurius conotoxins.