An Integrated Proteomic and Transcriptomic Analysis Reveals the Venom Complexity of the Bullet Ant Paraponera clavata

A critical hurdle in ant venom proteomic investigations is the lack of databases to comprehensively and specifically identify the sequence and function of venom proteins and peptides. To resolve this, we used venom gland transcriptomics to generate a sequence database that was used to assign the tandem mass spectrometry (MS) fragmentation spectra of venom peptides and proteins to specific transcripts. This was performed alongside a shotgun liquid chromatography–mass spectrometry (LC-MS/MS) analysis of the venom to confirm that these assigned transcripts were expressed as proteins. Through the combined transcriptomic and proteomic investigation of Paraponera clavata venom, we identified four times the number of proteins previously identified using 2D-PAGE alone. In addition to this, by mining the transcriptomic data, we identified several novel peptide sequences for future pharmacological investigations, some of which conform with inhibitor cysteine knot motifs. These types of peptides have the potential to be developed into pharmaceutical or bioinsecticide peptides.


Introduction
Animal venoms are an increasingly popular source of drug leads, as they are rich in bioactive peptides. These peptides have evolved over millions of years, and they target a wide variety of receptors and other biological processes [1,2]. Currently there are six venom-derived peptides on the therapeutic drug market and one insecticide [2]. However, there is potential for many more, as only a small fraction of venom-derived peptides has been investigated. New developments in sensitivity and accuracy in mass spectrometry and transcriptomic sequencing are likely to accelerate this process. One of the hurdles of venom proteomics investigations has been the lack of appropriate databases to assign peptide sequence information using bottom-up mass spectrometry. The use of venom gland mRNA to deduce protein sequence information is becoming increasingly common, particularly in conjunction with venom proteomics analysis, as this overcomes some of the problems of predicting open reading frames (ORFs) or de novo assembly.
Transcriptomics is a useful tool for de novo generation of comprehensive venom gland mRNA profiles, and it requires only a small amount of tissue. Transcriptomic approaches are relatively unbiased as they capture almost all the diversity present in the venom gland at the time of tissue harvesting, as opposed to proteomics where ion suppression can significantly confound analyte ionisation and detection for certain types of instrumentation. However, using transcriptomics alone is insufficient. For example, transcriptomics involves prediction of the peptide/ protein sequence from the thousands of transcripts in the transcriptome, the majority of which do not encode for venom peptides/ protein, rather they are part of the cellular machinery [3]. Furthermore, post-translational modifications (PTMs), including proteolytic cleavage, are not detected by transcriptomics, unlike proteomic approaches. Combining proteomics and transcriptomics allows both synthesised proteins and PTMs to be identified and enables confirmation of the mature peptide/protein sequence. In addition, the transcriptome permits expression estimation for genes of interest.
In this study, we investigated the venom profile of P. clavata using an integrated proteomic/ transcriptomic approach. Paraponera clavata is a large ant species (>2 cm in length) found in the New World tropics and renowned for its potent and painful sting [4,5]. The venom of P. clavata has received significant attention at a proteomic level, with seven publications to date [4][5][6][7][8][9][10][11]. All of these previously published studies focused on poneratoxin, a 25-residue peptide, and the most abundant component of the venom. Poneratoxin elicits neurotoxic activity in cockroaches by prolonging action potentials and generating slow repetitive activity by inducing a voltage-activated sodium channel (Na V 1.7) current with slowed inactivation at negative potentials [6,10]. Nevertheless, there are several other components in this venom that remain uncharacterised [12].
In this investigation, we identified novel components of P. clavata venom and highlight the advantages of using combined proteomics and transcriptomics approaches for a holistic overview of the complexity of ant venom arsenals.

P. clavata Venom Gland Transcriptome Profile
In order to investigate the diversity and expression of venom toxins and toxin-like sequences in P. clavata, transcriptome sequencing was performed on venom glands and sacs. The transcriptome assembly was used as a database for exploring the P. clavata venom proteome.

Illumina Sequencing
Illumina Hiseq 2500 sequencing of the cDNA library from P. clavata venom glands yielded over 17 million paired end reads after quality control (QC). Briefly, trimmed reads were de novo assembled using Trinity [13], producing a total of 54,242 contigs. RSEM was used to determine expression levels of contigs from reads mapped using Bowtie using the transcripts per million (TPM) estimates. Assembly statistics are provided in Table 1. Table 1. Assembly statistics and downstream metrics from the P. clavata venom gland transcriptome analysis.

After QC Counts
Total number of reads (paired end) 17

P. clavata Transcriptome Annotation
To identify the types of proteins and toxins expressed in the venom gland of P. clavata, the contig library was used to interrogate the NCBInr database using the BLASTx algorithm [14] with an e-value cut-off of e −4 . The search generated 37,140 hits matching proteins in the database, which represent 68% of the transcriptome. Twenty of the top 35 hits were to other ant species and the remainder were to other hymenopteran species. The highest number of hits recorded were to the ant Harpegnathos saltator ( Figure S1), reflecting the close phylogenetic relationship between these species.

Identification and Classification of P. clavata Toxins
We identified 354 toxin-like sequences that were classified into 17 different families ( Figure 1). These were identified through a manual search of the BLASTx output using over 50 different keywords such as "phospholipase" and "toxin" (Table S1). A total of 17,102 contigs (32%) had no significant BLASTx hit. Figure 1 also includes hits from the output of the Tox|Blast module [15], which recovered other toxin-like transcripts not observed in the BLASTx output.
The venom gland transcriptome revealed that the majority of transcripts were proteases, e.g., serine proteases and metalloproteinases ( Figure 1). However, neurotoxins were the most highly expressed contigs, based on TPM values. These were subsequently identified, through proteomic analyses, as poneratoxin isoforms (Section 2.3.1). Contigs in the "toxin-like" category included peptides such as ant "ω-conotoxin-like" proteins which, despite having an assigned mode of action to inhibit voltage-activated calcium channels, have not been proven to have any toxin activity. The full list of toxin and toxin-like sequences identified through BLASTx and Tox|Note can be found in Supplementary File S1. Among the transcripts, a poneratoxin-encoding contig was the second most abundantly expressed at 39,657 TPM, with the structural protein actin being the most highly expressed protein (55,551 TPM). Other highly expressed toxin-like transcripts included phospholipases, serine proteases, arginine kinases and metalloproteinases. The remainder of the toxin-like transcripts had relatively lower TPM values such as defensin-2 (89 TPM) and cathepsin (77 TPM). The most highly expressed toxin and toxin-like transcripts, identified through BLASTx, are summarized in Table 2. All P. clavata venom gland contigs shown in Table 2 were subsequently named using the nomenclature proposed in the Methods section.
Although the majority of hits from Tox|Note were to toxins with proven toxicity, several unusual hits were observed e.g., the 130 kDa toxin α-latrotoxin. α-Latrotoxin is composed of several ankyrin repeats and only the full-length protein, presently only detected in the venom of theridiid spiders, is neurotoxic [1,16]. Transcripts from P. clavata only matched to the ankyrin domains of the 130 kDa toxin, but not other regions important for activity. Ankyrin domains are also present in other non-toxic proteins [1,16]. Calglandulin was also identified by Tox|Note as one of the highly expressed toxin-like transcripts in P. clavata (196 TPM), but it is involved in processing and releasing venom proteins, rather than being a toxin per se [17]. Neither of these hits were included in the toxin-like transcript list generated by Tox|Note.

P. clavata Venom Proteome
The assembled contigs were translated in six frames using TransDecoder [13], generating 34,586 ORFs. This was then uploaded into PEAKS Studio v8.5 as a database to search the shotgun proteomic LC-MS/MS data. The mass spectra search against the transcriptome assembly was performed using PEAKS Studio v8.5 and generated peptide matches to 438 contigs with a minimum -10lgP score of 15. The hits were manually classified into 11 functional categories based on their BLASTx ID using an e-value cut off of e -4 ( Figure 2). If a contig had no BLASTx hit, the matched peptides were searched on BLASTp to confirm that the protein was indeed uncharacterised. This approach ensured contigs such as poneratoxin (9279 TPM) were identified, as the BLASTx search identified it as an uncharacterised protein whereas BLASTp confirmed it as poneratoxin. Nevertheless, the majority of proteins detected in the venom were uncharacterised (Figure 2), followed by proteins with structural/ motor or cellular function which is comparable to the GO results from the transcriptome. Among other protein categories were toxin-like proteins, and proteins involved in metabolic processes, and regulation as well as chaperones, oxidoreductases, kinases and miscellaneous proteins. Positive matches between proteins identified by proteomics and transcriptomics allowed for the confirmation of proteomic data as well as complete predicted full-length protein sequences. For a full list of the proteins present in Figure 2, see Supplementary File S2. The toxin-like category contained a total of 44 contigs, shown in Table 3. The best match, based on a PEAKS score of −10lgP, was to hyaluronidase followed by dipeptidyl peptidase, poneratoxin and phospholipase A 2 . All contigs in Table 3 were also re-named based on BLASTx or BLASTp hits according to the proposed nomenclature systems and queried for signal peptides using SignalP v4.1 (DTU Bioinformatics, Lyngby, Denmark; http://www.cbs.dtu.dk/services/SignalP-4.0/) [18]. Many contigs had no predicted signal peptide despite the fact that they were detected in the venom proteome indicating the presence of non-secreted proteins in our venom sample. These were excluded from our list of toxins.

P. clavata Toxins
Many of the toxin sequences identified in this study have not been previously reported in P. clavata. Therefore, we undertook a comparative study with homologous sequences to determine potential pharmacological targets. This was achieved through comparison of critical residues for activity between sets of homologous sequences. Proteins chosen for alignment are detailed below.
The mature toxin present in this transcriptome also has an alanine to valine substitution at position 61 (A23V in the mature toxin), and additional GK residues at the C-terminus, compared to the original Brazilian δ-paraponeritoxin-Pc1a [4]. It seems to be most similar to the recently published "glycl-PoTx" [10], re-named here as δ-paraponeritoxin-Pc1b, as it also contains the additional C-terminal glycine but not the lysine residues. Nevertheless, the shotgun proteomics data revealed a mature δ-paraponeritoxin-Pc1e mass of 2783 Da, consistent with the terminal GK residues undergoing cleavage and a loss of 185.1 Da ( Figure S2). If these two C-terminal residues are removed from the alignment, δ-paraponeritoxin-Pc1e has 96% identity and 100% similarity to the Peruvian isoform Pc1d (Figure 3). RP-HPLC of P. clavata venom indicated a relatively simple venom composition ( Figure 4A), consistent with previous publications [4,5,11]. The RP-HPLC profile is characterised by the majority of peaks eluting at high acetonitrile concentrations, dominated by a large late-eluting peak. Purification of this component ( Figure 4B) and sequencing by MS/MS ( Figure S3) identified this as poneratoxin (δ-paraponeritoxin-Pc1e). (B) δ-paraponeritoxin-Pc1e was purified from the venom ( Figure S3 illustrates the match to the δ-paraponeritoxin-Pc1e contig). (C) Addition of whole P. clavata venom (100 µg/mL) to dissociated mouse DRG cells produced an increase in [Ca 2+ ] i in all DRG cells, which was particularly pronounced in neuronal cells, and was followed by some dye leakage from a small proportion of cells (see the right hand panels in C representing a snapshot at 180 s). An equivalent amount of isolated δ-paraponeritoxin-Pc1e produced an increase in [Ca 2+ ] i specifically in excitable cells. Each trace represents a single DRG cell in the field of view. Snapshots shown of the recording are at 0 s (baseline) and 33 s (3 s after addition of venom or δ-paraponeritoxin-Pc1e). Scale bar: 100 µm.
We investigated the effects of both the whole venom of P. clavata and isolated δ-paraponeritoxin-Pc1e on mammalian sensory neurons using a high-content calcium imaging assay. Application of whole P. clavata venom (100 µg/mL) to dissociated mouse DRG cells caused an immediate sharp and sustained (over the 3 min course of the experiment) increase in intracellular calcium concentration ([Ca 2+ ] i ) in all cells ( Figure 4C). The largest increases were in neuronal cells, while in a small proportion of cells some dye leakage into the extracellular medium was observed (indicative of mild cytolytic activity). The observed high increase in [Ca 2+ ] i observed in DRG cells is consistent with cellular depolarisation and ultimately pain. Isolated δ-paraponeritoxin-Pc1e caused a similar immediate and sustained increase in [Ca 2+ ] i , but was selective for neuronal cells and was not accompanied by any dye leakage ( Figure 4C). These observations show that δ-paraponeritoxin-Pc1e is capable of causing a rapid depolarization of sensory neurons, which is consistent with δ-paraponeritoxin-Pc1e being a major, if not the major, algogenic component in the venom of P. clavata.

Inhibitor Cysteine Knot Peptide
A transcript encoding a peptide highly similar to ant "ω-conotoxin-like" peptide was found in the P. clavata venom gland transcriptome ( Figure S4). The sequence of the P. clavata "ω-conotoxin-like" peptide was 90% identical to that from Camponatus floridanus, and 90% homologous to the ω-conotoxin-like protein from ant Trachymyrmex cornetzi. Interestingly, identity notably decreased when compared to the same peptide from the bee Apis florea (63%). Manual BLASTx results did not produce significant hits to any marine cone snail peptides. We therefore re-searched with the organism filter restricted to "Conus" in order to confirm the similarity of these ant "conotoxin-like-proteins" to that of cone snails. As shown in Figure S4, the highest match was to a peptide from Conus lividus, and had a very different sequence to all the ant sequences, with an identity of only 15% and similarity of 31% to that of P. clavata. The alignment also highlighted that the main conserved amino acids between the sequences were the cysteine residues, which are normally conserved due to their role in the structural framework. Therefore, we did not name the P. clavata contig using the same nomenclature as the ant ω-conotoxin-like proteins. Instead, we named it U 1 -paraponeritoxin-Pc1a because the main activity associated with ω-conotoxins-an action to inhibit voltage-gated calcium channels [21]-is very unlikely given the low sequence similarity.

Phospholipase A 2 (PLA 2 )
The second most highly expressed toxin transcript in the P. clavata venom gland transcriptome after poneratoxin, was a 202-residue phospholipase A 2 (PLA 2 ) ( Table 2) (6328 TPM). These proteins have previously been shown to be abundant in P. clavata venom [11]. PLA 2 proteins are not unique to ant venoms, but are found in numerous animal venoms [22][23][24][25]. The size of the P. clavata PLA 2 and the presence of a His48/Asp49 pair suggests that this is a group III PLA 2 which is also present in other hymenopteran venoms including bees and ants [26,27]. The presence of PLA 2 in the venom proteome (Table 3) also confirms its secretion in high enough abundance for detection by mass spectrometry.
BLASTx analysis identified a number of putatively homologous insect PLA 2 sequences ( Figure 5). The PLA 2 from C. floridanus shared 68% identity while other ant species shared 60% identity. Interestingly, identity was not very high to any of the ant species, with the highest being 52% and the next highest identity to the beetle Tribolium castaneum with an identity of 36%. Although the identities were not high, it should be noted that all critical residues were found in the P. clavata contig. For example, almost all cysteine residues were present and the previously described calcium binding loop and active site was found across the sequences, particularly the critical His48, Asp49 and Gly32 residues. The predicted signal peptide (green tube) is indicated above the sequences. Residues identical to the P. clavata peptide sequences are boxed in yellow, while conservative substitutions are shown in red italic text. Cysteines are boxed in black. Gaps were introduced to optimize the alignments. The blue triangle indicates the predicted N-terminus [23]. The underlined coloured regions indicate the Ca 2+ -binding loop (blue), active site (orange) and conserved region (green) [23,26,28]. The red triangles indicate residues critical for calcium binding and the green triangles indicate the active site residues [23,26,29,30]. Percentage identity (%I) is relative to the first peptide of each family, while percentage similarity (%S) includes conservatively substituted residues.

Hyaluronidase
A transcript composed of 357 residues with sequence similarity to hyaluronidase was identified by a BLASTx search of the NCBInr database. Its presence and expression in the venom was confirmed through the shotgun proteomics, and showed the highest -10lgP score of 333. There were a total of eight contigs (1a_1 to 1a_8) for hyaluronidase-1a-P-clavata ( Figure S5). The translated protein sequence had the highest similarity to a hyaluronidase-like sequence from the ant Pogonomyrmex barbatus (85% similarity; Figure S5). All cysteines were conserved across matched proteins with the majority of the differences located near the N-terminus of the sequences. All ant hyaluronidases contain the conserved active site residues previously described in bee venom hyaluronidases (Asp111, Phe112, Glu113, and Glu247; numbered according to the bee venom hyaluronidase) [31].

Icarapin
The protein icarapin was detected in both the proteome and the transcriptome. The transcript icarapin-1-P-clavata is a 163-residue protein that had relatively low similarity to known proteins, with the highest match to icarapin from the ant Linepithema humile (58% identity; Figure S6). Interestingly, from residues 73 to 118 (numbering from icarapin-1-P-clavata) all sequences share very high similarity, which is where the consensus icarapin sequence is located [32]. However, the P. clavata transcript was not 100% identical to the published sequence [4], as there were a few substitutions (positions 87, 100, 108 and 111) and a threonine insertion at position 92. The role of icarapin in venoms is unclear.

Arginine Kinase
Of all the proteins described here, the most conserved of all matches was that of the 355 residue arginine kinase, with the lowest similarity of 97% ( Figure S7). This protein was detected initially by BLASTx homology searching; it was then confirmed through proteomic analysis of the venom. Importantly, all residues critical for arginine binding, ATP binding and catalysis are found amongst all sequences [33]. Additionally, the two specificity loops implicated in the arginine kinase activity and the phosphagen kinase site were also conserved. No signal peptide was predicted by SignalP v4.1 for the P. clavata transcript or the other matched proteins. For this reason, we cannot rule out the possibility that this protein serves an endogonous function in the venom gland of P. clavata, rather than as a toxin.

Serine Proteases
Serine proteases were the most diverse of the toxin-like transcripts detected in the BLASTx search ( Figure S8), although only one serine protease isoform was detected in our venom sample.The longest transcript recovered from our transcriptome translated to 256 amino acid residues and was named serine-protease-5a-P-clavata. Interestingly, we also found transcripts that appear to be truncated versions of isoform 6a, namely isoforms 5a and 8a. These isoforms also had a few non-conserved and conserved substitutions. The BLASTx match with the highest identity was to a serine proteinase stubble protein from the ant Ooceraea biroi (81% identity). However, it must be noted that the O. biroi protein has 173 amino acid residues preceding the ones shown in Figure S8, which suggests that the P. clavata proteins are all truncated versions of this full-length proteinase. An important feature of serine proteinases is the catalytic triad (His37, Asp87 and Ser190, naming according to the P. clavata serine proteinase). This was conserved across all the ant serine proteinases ( Figure S8) apart from the serine-protease-8a-P-clavata, which has a glycine instead of a serine at position 190. The residues important for the specificity pocket were also present in all of the P. clavata isoforms, except serine-protease-5a-P-clavata which had a conservatively substituted alanine instead of a glycine. These proteins may serve an endogenous function e.g., in toxin maturation, in the P. clavata venom gland or as toxins themselves.

Novel Toxin-Like Proteins
In order to identify novel toxin-like peptides in the venom gland transcriptome of P. clavata, the transcriptome was analysed with Tox|Note, a pipeline developed mainly for the identification of novel spider-venom toxins [15]. This platform was chosen due to the lack of ant-or hymenopteran-venom specific platforms. The output contains the predicted toxin-like sequences and information regarding the predicted prepropeptides and mature toxin. Predicted peptide toxins were then sorted manually to identify those with a "toxin-like" cysteine framework, defined as peptides with an even number of greater than four cysteines, and then double checked for novelty by BLASTp. As seen in Figure 6, there were a total of 190 peptides conforming to these conditions with the majority (72%) having four cysteines and the highest number of cysteines being 12 which would indicate six disulfide bonds. The disulfide-bond pairs were predicted according to the conotoxin frameworks (www.conoserver.org). Table 4. Predicted P. clavata peptides conforming to ICK or conotoxin cysteine framework I frameworks. This table shows peptides identified using Tox|Note and their corresponding expression level indicated by TPM. However, peptides containing eight or more cysteines that conform to the ICK framework typically have four cysteines after the "CC" doublet, where two of the cysteines turn the fourth loop into a disulfide-stabilised hairpin. P. clavata peptides with eight or more cysteines may therefore represent putatively novel cysteine-rich scaffolds.  6. Distribution of novel P. clavata toxin-like peptides with four or more cysteines. Predicted peptide toxins were identified by Tox|Note. Upon closer inspection of the sequences that contained six cysteines, seven peptides were found that adhered to a canonical inhibitor cysteine knot (ICK) framework similar to the conotoxin VI/VII framework as shown in Table 4. There were also other peptides conforming to the ICK framework, however, they had eight and 10 cysteines.

Number of Cysteines
The majority of the peptides with cysteine patterns consistent with an ICK structural framework (containing six or more cysteines conforming to "C-C-CC-C-C") showed very low expression ( Table 4). The highest had a TPM of 44, while conotoxin cysteine framework 1 toxins (containing four cysteines conforming to "CC-C-C") had higher transcript levels overall, with the highest TPM value of 771 for one of the contigs. Given their low abundance, or possible masking by more highly expressed peptides, we did not find any evidence for these disulfide-rich peptides in the venom proteome. This suggests that while they were annotated by Tox|Note as toxin-like, it is unclear if they represent a large proportion of the venom components in P. clavata.

Discussion
Here, we present the transcriptome and proteome of the bullet ant, P. clavata, which is infamous for inflicting the most painful sting among the Hymenoptera [34,35]. Investigating P. clavata venom is critical for understanding the basis for its painful venom and also as a source of potential bioactive compounds for development of insecticides or therapeutics [10,36]. We reveal the identification of a novel paraponeritoxin and the first report of the signal and propeptide sequences for this family of toxins. In addition, we have identified the presence of a wide range of toxins that no doubt contribute to the overall toxicity of the whole venom and provide for novel pharmacological lead compounds.

P. clavata Transcriptome
In order to annotate the venom gland transcriptome of P. clavata, a BLASTx search of the NCBInr database was performed and generated over 37,000 significant hits (68%) which were queried for GO terms to determine the broad functions of the peptides and proteins. It was found that overall GO terms relating to molecular functions predominated, which was also seen with the T. bicarinatum venom gland transcriptome [27] and the sea nettle, Chrysaora fuscescens, transcriptome [37]. The most common predicted molecular functions of P. clavata venom gland transcripts were binding and catalytic activity and the most common biological processes were related to metabolic and cellular processes, similar to that previously reported for the ant T. bicarinatum [27]. Such proteins may be involved in the folding of venom peptides, their cleavage or secretion in the venom gland.
The majority of contigs with BLASTx hits matched transcripts of the ants H. saltator and D. quadriceps which was not surprising, given the availability of a D. quadriceps transcriptome and an H. saltator genome [38]. Both of these ants also belong to the subfamily Ponerinae, one of the phylogenetically closest subfamilies to Paraponerinae [39,40]. This is further supported by the lack of matches to T. bicarinatum, an ant belonging to the subfamily Myrmicinae, a phylogenetically distant subfamily to Paraponerinae.
The expression levels of poneratoxin and PLA 2 (up to 39,657 TPM and 6328 TPM, respectively) was consistent with their detection in the venom here as well as previous 2D-PAGE proteomic studies of P. clavata venom [11]. Proteases, which were expressed at relatively lower levels (all <834 TPM), were the most diverse group of toxin-like transcripts, in terms of contig number, with 149 different contigs. Other ant venom gland transcriptomes report allergens and phospholipases as the common proteins [27,36]. The following section summarises the most important characteristics of a selected group of five highly expressed toxin and toxin-like families.

Neurotoxins
P. clavata is one of only a few ant species to have had its venom studied extensively and from which a peptide neurotoxin has been sequenced and its pharmacology determined [4,5,10,19]. The pain inflicted by the sting of P. clavata has been assumed to derive from poneratoxin (here renamed δ-paraponeritoxin-Pc1a), first identified by Piek [4,5,19]. Surprisingly, our initial BLASTx search of contigs against the NCBInr database did not identify δ-paraponeritoxin-Pc1a as a hit. However, a BLASTp search of peptides identified by LC-MS/MS analysis identified the toxin and matched this sequence to the contigs δ-paraponeritoxin-Pc1e in the translated database generated in this work. One of these contigs (δ-paraponeritoxin-Pc1e_1) displayed very high expression levels (39,657 TPM) and revealed that poneratoxin was indeed the most abundant toxin transcript expressed in the venom gland of P. clavata. Importantly, we report, for the first time, a signal peptide and propeptide for a δ-paraponeritoxin. The propeptide comprises the 14-residue segment (EAVAKPSAEAVSEA) following the 24-residue signal peptide (MRIGKLILISVAIIAIMISDPVKS), and the present proteomic experiments as well as previous studies indicate that Phe38 is the N-terminus of the mature toxin [4][5][6]10,19]. This signal peptide seems to be unique to ant venoms, with no sequence similarity reported to date. There appears to be some similarity to the precursor sequence (signal and propeptide) for the aculeatoxins, however this is insufficient to suggest that these sequences arise from an aculeatoxin gene superfamily [41]. Unexpectedly, the Tox|Note BLASTx search did not identify a hit to δ-paraponeritoxin-Pc1a either, which might be due to the algorithm's generation of incomplete alignments in some cases resulting in an incorrect annotation [15]. Interestingly, Tox|Note then classified the majority of the δ-paraponeritoxin-Pc1e contig as a propeptide, perhaps because the Spider|ProHMM predictive tool used by Tox|Note to predict propeptide cleaving sites was designed to detect spider venom toxins. Most notably, the propeptide in δ-paraponeritoxin-Pc1e seems to be unusual and completely different to the propeptides reported in spiders and cone snails, where a conserved arginine is usually located at position −1 from the N-terminus of the mature peptide [42,43]. Only once there is sufficient transcriptomic and proteomic data for ant venom peptides would it be possible to develop a similar script, capable of predicting propeptides in ants, and perhaps other hymenopterans.
As seen in Figure 3, the alignment of P. clavata transcripts with other homologous δ-paraponeritoxins shows that the isoforms detected as part of this study were similar, but not identical to any of the previously described sequences, differing by only one or two amino acids. This subtle difference between isoforms may cause changes in the activity and specificity of venom peptides. For example, natural mutations in specific regions of δ-paraponeritoxins showed that a decrease in net charge and hydrophobicity at the C-terminus decreased the activity of the toxin while an increase in net charge and hydrophobicity increased the activity [10]. These results highlight the fact that even a subtle change in the primary sequence can result in altered activity, and therefore correct assignment of function can only be determined by activity testing. The additional Gly and Lys residue at the end of the δ-paraponeritoxin-Pc1e transcript is an amidation signal that is cleaved during the maturation of the peptide [44]. However, comparison of the measured mass via MALDI-TOF MS, also collected in French Guiana (2783.4 Da), was consistent with the calculated mass of the P. clavata δ-paraponeritoxin-Pc1e transcripts, without the Gly and Lys residues at the C-terminus (2783.6 Da), indicating C-terminal cleavage of these residues and no amidation, or perhaps that these residues are absent in the precursor.
Although the algesic activity seen following P. clavata envenomation has been attributed to δ-paraponeritoxin [4], the present investigation suggests that there are other proteins in this venom that could contribute to the pain-producing activity during envenomation. Nevertheless, the majority of the pain signal portrayed by the sensory neuron assay does implicate δ-paraponeritoxin as the primary agent.
The majority of the algesic activity observed in the sensory neuron assay suggests δ-paraponeritoxin as the primary pain producing agent, which is consistent with previous studies [4]. Nevertheless, the present holistic proteomic/transcriptomic approach has increased the range of proteins and identified proteins present in ant venom and suggests that there are other proteins in this venom that could contribute to the pain-producing activity during envenomation. For example, the protein arginine kinase has not been previously identified and characterised in ant venoms. Arginine kinases are major components of wasp venoms where they have been associated with the induction of pain and have been shown to be paralytic to spiders in some cases [33,45]. Similar to wasp venoms, P. clavata venom seems to have significant levels of this protein as it was found both proteomically and with moderate expression levels in the transcriptome (382 TPM). Importantly, there was a 97% similarity of arginine-kinase-1a-P-clavata to the arginine kinase from the venom of the wasp Cyphononyx dorsalis, the first venom in which arginine kinase was identified and found to be paralytic [46]. Additionally, all residues critical for arginine binding, catalysis and ATP binding [33] were conserved in arginine-kinase-1a-P-clavata. It should be noted that although this protein was detected proteomically, no signal peptide was identified, suggesting the ORF is not complete despite the detection of a start (ATG) codon. This suggests that perhaps there is a start codon further upstream or that the protein is not secreted.

Proteases
Proteases are major components of some venoms where they are involved in digestion [47], haemostasis and thrombosis [48,49] and have also been found to be allergenic [27,50]. They are commonly found in the venoms of wasps [51], bees [47], ants [27] and snakes [48] where they have also been identified to act as factors that facilitate dispersion of other venom components and normal intracellular degradation of proteins [27]. In the present study, proteases were the most common toxin-like transcript in the venom gland transcriptome and isoforms were also detected in the venom. The major proteases were serine proteases, which have been implicated in many biological processes in arthropods such as activation of the Toll signalling cascade and regulation of the polyphenol oxidase activation that affects immune defence mechanisms and inhibition of melanisation [51,52]. Serine proteases found in snake venoms have been implicated in thrombin-like activity, formation of fibrin clots, kininogenase activity and activation of coagulation factor V, platelets or plasminogen [48]. The alignment of the P. clavata serine protease sequence with other homologous proteins (see Figure S8) showed that all residues critical for catalytic activity (His37, Asp87, Ser190) corresponding to the "catalytic triad" [47,53] are conserved, except isoform 8, indicating that the majority are likely to retain their catalytic activity. The presence of a glycine instead of a serine residue at position 190 in isoform 8 indicates that this serine protease may belong to the PPAF-II family of serine proteases. PPAF-II serine proteases lack catalytic activity but still have the three disulphide bonds, referred to as a "clip" domain [54], and have been found in solitary wasp venom and more recently in the D. quadriceps venom gland transcriptome [36,51]. Interestingly, two truncated versions of this same group of proteins were also identified that may, or may not, be functional. These truncations may also represent an active form of zymogen which has been previously reported in snakes and bees [47].

Phospholipases
One of the enzymes with the highest levels of expression in the P. clavata venom gland transcriptome were isoforms of PLA 2 , commonly seen in other ant venom gland transcriptomes [27,36,55]. This is consistent with the large number of spots identified as PLA 2 proteins in a 2D-PAGE analysis of P. clavata venom [11]. Phospholipases are not exclusive to ants, and are found in a wide variety of animal venoms including scorpions [56], snakes [29,57] and bees [23]. Amongst hymenopterans, the transcriptomic data from the ants T. bicarinatum and O. monticola have been previously reported to have low levels of phospholipase expression, with levels being significantly higher in wasps [27] and the ant D. quadriceps, which is consistent with our data [27,36].
To date, there are at least 15 classes of secreted PLA 2 s that have been described in the literature including several subtypes within each class [22]. BLASTx searches revealed high similarity of two P. clavata transcripts to a PLA 2 found mainly in hymenopteran venoms ( Figure 5). Nevertheless, it can be assumed that this isoform is not myotoxic due to the presence of an aspartate residue instead of a lysine at position 117. P. clavata PLA 2 is also likely to be a group III phospholipase [24] which has been reported in other ant venom transcriptomes [27,36,55]. In support, group III PLA 2 s contain highly conserved active site residues: His48 immediately preceding the calcium binding residue Asp49 as well as a conserved Gly32 [23,25].

Hyaluronidases
Hyaluronidases are a class of highly conserved proteins found in almost all venoms and are believed to aid in the distribution and spread of other venom components by hydrolysing hyaluronic acid, a key component of the extracellular matrix of vertebrates [58]. Eight transcripts from the P. clavata venom transcriptome had significant similarity to hyaluronidase-like proteins from other ants and their expression was confirmed by the proteomics investigation. Alignment of these sequences with ClustalW confirmed that all residues critical for the tertiary structure (four cysteine residues) and the active site (Asp135, Phe136, Glu137) were conserved [58]. The high sequence conservation across hyaluronidases may play a role in the cross-reactivity seen in allergic reactions to hymenopterans where patients can be allergic to both bee and wasp venoms [58,59]. Ants, however, seem to have less hyaluronidase expression, with the highest level reported from the harvester ant Pogonomyrmex from which allergic reactions have been reported [60]. Interestingly, the transcripts with similarity to known hyaluronidases in P. clavata were highly expressed; therefore, they may induce similar activity in their victims. However, a biochemical characterisation is necessary to understand its true role as no allergic reactions have been reported following P. clavata envenomation [34].

Allergenic Proteins
There were also hits to proteins such as icarapin and pilosulin 4, both of which are allergenic proteins found in ant venoms. Icarapin is a protein first isolated from the honeybee, A. mellifera, and is involved in IgE binding [32]. Recently, icarapin was reported in the transcriptome of the ant O. monticola, which is the first report of this peptide in ant venoms. The high similarity between the transcripts identified in this transcriptome with that of the bee A. mellifera suggests that it might have a similar role in this venom. Pilosulin 4, an ant venom allergenic peptide originally isolated from the ant Myrmecia pilosula, was also identified in the P. clavata venom gland transcriptome (Supplementary File S1). However, the low TPM value of this transcript (8 TPM) suggests that this peptide may not play a major role in the venom of P. clavata.

Toxin-Like Proteins
Several transcripts not detected in the venom proteome displayed similarity to toxins without a known activity such as U 8 -agatoxin-Ao1a-like protein from the spider Agelena orientalis and a number of "ω-conotoxin-like" peptides from other ants. These peptides are yet to be investigated for their pharmacological activity. However, alignment of these other ant peptides, and those from P. clavata ( Figure S4) with an actual ω-conotoxin peptide from the marine cone snail Conus lividus revealed that there was very little similarity to that of the cone snail aside from the cysteine framework (15% identity). ω-Conotoxins are known to block voltage-gated calcium channels and result in analgesic effects [21,61]. However, the activity of the P. clavata transcript will need to be tested in order to identify its target, particularly because P. clavata is known for its algesic, not analgesic, effects [34,35]. Therefore, the transcripts isolated from P. clavata have been named U 1 -paraponeritoxin-Pc1a, based on the King (2008) nomenclature systems for toxins with unknown activity [62].

Insecticidal Proteins
Three transcripts with similarity to sphingomyelinase phosphodiesterase proteins (highest: 27 TPM) were identified in this investigation but were not found in the venom proteome by MS analysis. Sphingomyelinase D is a spider venom toxin that has been identified as a potent dermonecrotic protein as well as an insecticidal protein, as it has a role in immobilising prey [63]. This is highly interesting as this is the first report of this protein in a venom gland transcriptome other than that of the Loxosceles spider [64]. This finding suggests that the insecticidal activity seen in P. clavata might not be due to poneratoxin alone, as previously suggested [8], but rather a combination of several insecticidal components. However, the sting from P. clavata has never been reported to cause dermonecrosis.

Other Predicted P. clavata Toxin-Like Peptides
An interesting finding of the present study was the discovery of several peptides that have the potential of being novel bioactive molecules. Tox|Note identified hundreds of transcripts as encoding novel predicted toxins. Cysteine-rich peptides were of particular interest due to their high diversity, stability and likelihood of bioactivity [65]. Further analysis of these sequences revealed several peptides conforming to the inhibitor cystine knot (ICK) framework. This is a highly stable protein fold with a characteristic "pseudo-knot" formed by an α-helix, β-sheet or random coil connected by two disulphide bonds and interlaced with a third disulphide bond [66]. ICK peptides are yet to be identified in ant venoms, only recently identified in the venom gland transcriptome of T. bicarinatum. That study also reported only a single peptide conforming to this framework and with only low expression levels [36]. The present study identified six peptides with cysteines conforming to the general ICK framework. However, these had low expression levels, with TPM values ranging from 3 to 44, and were not detected in the venom proteome.
One of the highly expressed toxin-like peptides identified by Tox|Note (771 TPM) was a peptide conforming to conotoxin cysteine framework I, also observed with α-conotoxins [67]. We found nine transcripts with this framework which is the first report of this type of peptide in ant venom transcriptomes. It would be interesting to test the activity of these toxins, as α-conotoxins are known for their activity on nicotinic acetylcholine receptors [61]. Medical applications of these peptides include potential use in neuropsychiatric disorders such as Parkinson's and Alzheimer's as this receptor has been implicated in their pathophysiology [67].

Combined Proteomic/ Transcriptomic Approach
The identification of individual peptides from complex mixtures such as venom remains a challenge despite recent advances in technologies. In fact, there is no single method capable of identifying all components of a given sample. For example, RNAseq alone would not have detected poneratoxin, which was identified through proteomic approaches. Several groups are therefore now moving to a combined proteomic / transcriptomic methodology in an attempt to overcome this issue [37,55]. Unsurprisingly, minimal overlap has often been obtained between the two techniques [37]. This was also the case in our investigation, as only 44 of the predicted 453 toxin-like transcripts identified in the transcriptome were identified in the proteome. However, this may be a taxa-dependent issue or a technical issue as approximately 90% overlap was obtained between the peptidome and transcriptome of the cone snail Conus marmoreus when using triple quadrupole mass spectrometry and high-depth pyrosequencing [68]. Therefore, in the future, high resolution mass spectrometry coupled to high resolution chromatography should be employed to try and maximise this overlap. The low overlap might be due to the inherent limitations in resolution of either technique. Furthermore, the fact that venoms are a mixture of hundreds of components often with similar molecular size, isoelectric point, and closely related isoforms with different expression levels, all makes HPLC separation difficult. The poor overlap may also be due to the many potential PTMs in the peptides/proteins.
There are many advantages to this holistic approach, as it allows for confirmation of translation of certain transcripts found within the transcriptome. It also allows for better protein identification by proteomics with a bottom-up approach [3]. This is exemplified in the present study where four times the number of proteins were detected compared to our previously published P. clavata proteome [11]. In the previous study we found a total of 95 proteins, whereas here we report 495 using the shotgun approach. There are many reasons for this discrepancy, one is no doubt the improved sensitivity of the mass spectrometer employed in the present investigation. It is also noteworthy that according to the transcriptome, there should be over 54,000 expressed proteins with differing expression levels. However, a transcriptome is representative of transcripts being transcribed at the point of time that they were collected, and one cannot assume that all are expressed at the protein level. Nevertheless, there is no clear explanation as to why the correlation remains so imperfect, and further work needs to be undertaken in order to improve the match between venom proteome and transcriptome.

Conclusions
This study presents the first holistic investigation of the venom of the bullet ant P. clavata, with a focus on the toxin proteins and peptides which could be responsible for the extreme pain elicited during envenomation. Venom gland transcriptomics and proteomics resulted in an in-depth coverage of the venom profile of P. clavata and revealed a number of highly expressed toxins including neurotoxins, proteases, PLAs and hyaluronidase that likely contribute to the overall toxicity of the venom. We confirmed δ-paraponeritoxin (formerly poneratoxin) as the major toxin in this venom that produces significant activation of primary afferent neurons along with the presence of high levels of PLA 2 . Additionally, δ-paraponeritoxin-Pc1e is a novel paralog not previously identified in the venoms of P. clavata from Peru, Brazil, Panama or Costa Rica.

Venom Gland Transcriptomics
Paraponera clavata worker ants were collected in the locality of "la Montagne de Kaw" in French Guiana (N4 • 38 25" W52 • 17 33"). Both venom glands and venom sacs of 52 worker ants were dissected in distilled water and immediately placed into 1 mL of RNAlater [69]. The samples were stored at −80 • C prior to RNA extraction. The RNAlater was removed with a glass Pasteur pipette and the glands disrupted with a TissueLyser II (Qiagen, Germantown, MD, USA) in RLT buffer containing 10% (v/v) of 2-mercaptoethanol (Rneasy Mini Kit, Qiagen). RNA was first isolated with a phenol-chloroform (5:1) solution followed by washing with a solution of chloroform-isoamyl alcohol (25:1) to remove any traces of phenol. The RNA was then bound to a Qiagen column and washed as per the manufacturer's instructions. DNAse I (Roche Diagnostics GmbH, Mannheim, Germany) was added in order to remove any remaining fragments of DNA. The RNA was eluted in sterile water and total RNA was determined using a Qubit 3.0 fluorometer (Thermo Fisher Scientific, Waltham. MA, USA) with the RNA HS assay kit (Life Technologies Corp., Carlsbad, CA, USA). A NanoDrop 2000 UV-Vis spectrophotometer (Thermo Fisher Scientific) was used to determine 260/280 and 260/230 nm ratios. Finally, RNAsTable™ LD (Biomatrica, San Diego, CA, USA) was added to the purified RNA and the sample dried using a Speed Vac (RC1010, Jouan, Saint Herblain, France).
Total RNA was dried with RNAstable ® reagent (Biomatrica, San Diego, CA, USA) and shipped to the Department of Biological Sciences, National University of Singapore for transcriptomics analysis. The dried sample was resuspended in 41 µL Molecular Biology Grade water and RNA quality and quantity was further assessed using Agilent 2100 Bioanalyzer. Using approximately 850 ng of total RNA, bead-based Poly-AAA tail selection was performed to purify mRNA. A cDNA library was constructed using the NEBNext Ultra Directional Library Prep Kit according to the manufacturer's protocol. Fragment size distribution of the library was verified using an Agilent 2100 Bioanalyzer. The 250 bp paired-end library was sequenced on an Illumina HiSeq 2500 sequencer on 1/14th of a lane. Due to the small amount of venom gland material from ants, no replicates were made. A summary of the combined proteomic and transcriptomic investigation of the bullet ant P. clavata venom is shown in Figure S9.
Bowtie was used to assess the assembly quality by mapping the reads to the contigs [72]. Expression levels were computed in transcripts per million (TPM) using the RSEM package [73].

BLASTx
In order to annotate the P. clavata venom gland transcriptome, all assembled contigs were queried against the NCBI non-redundant online database (National Center for Biotechnology Information, August 2017) using a BLASTx algorithm. All sequences with hits to the database with an e-value below 1e −4 were considered for further analysis. Translated protein sequences were assigned Gene Ontology (GO) terms using UniProtKB (Figure S10 [74,75]. Toxins were identified by a manual search of ca. 50 keywords, including toxin, phospholipase, metalloproteinase, acid phosphatase and dipeptidyl peptidase (See Table S1 for a list of all terms). The top 20 expressed toxin and toxin-like transcripts, based on TPM value, were annotated based on their matched proteins and re-named according to the nomenclature system in Section 5.3.6.

TransDecoder
The assembled and annotated venom gland transcriptome was translated using TransDecoder (version 3.0.1) [13,76] using the parameters for six-frame translation with a minimum ORF of 50 amino acids. The resulting ORFs were uploaded onto PEAKS 8.5 as a database (Bioinformatics Solutions, ON, Canada) [77]. This transcriptome database was used to query the LC-MS/MS data obtained from the shotgun mass spectrometry experiment. The peptides from LC-MS/MS which matched to contigs from the translated transcriptome were also searched on BLASTp to confirm the BLASTx match assigned to the contig [14].

Signal Peptide Prediction
The signal peptide cleavage site was predicted using the SignalP server (http://www.cbs.dtu. dk/services/SignalP/) [18]. Propeptides could not be predicted, as there is no available predictor for ant venom peptides. These were therefore determined only if proteomic data was available for that transcript.

Prediction of Toxin-Like Peptides
To predict novel toxin-like peptides, the assembled transcriptome fasta file was uploaded onto the Arachnoserver's Tox|Note pipeline [15] since a similar pipeline for ants, or hymenopterans in general, is unavailable. The pipeline annotates transcripts by similarity-based putative homology or de novo using BLASTx or the Tox_Seek| tool, respectively. The Tox|Note output also predicts the transcript cleavage sites for the signal peptide and a propeptide using a combination of the SignalP and Spider|ProHMM tools, respectively [15]. A toxin was considered novel if it had no BLASTx or Tox|Note match.

Toxin Identification
Toxin matches from BLASTx and Tox|Note were compiled and sorted based on their respective expression TPMs. A manual curation of these proteins was performed by checking the NCBI BLASTx results and through a ClustalW multiple alignment on homologous proteins with Mega7 [20] 5.3.6. Nomenclature Throughout this paper, peptide toxins (<10 kDa) have been named according to the proposed rational nomenclature system for peptide toxins from animal venoms of King et al. [62]. This same nomenclature was used to re-name known ant venom peptide toxins in Touchard et al. [78]. A brief summary of the naming system is provided in Figure S11. However, as this rational nomenclature system did not consider proteins, we adapted it to include putative protein toxins. The proposed protein toxin nomenclature method begins with the generic protein name (if known), which is based on a previously assigned name based on homology to other proteins of the same sequence and activity, for example hyaluronidase or phospholipase. If the protein has not been characterised, then the prefix is designated as "Uncharacterised" (U). This generic toxin name is followed by a number that designates a particular family of paralogous toxins with the same activity assigned to the generic protein name. This designator was introduced because often there is more than one group of proteins from the same species that act on the same molecular target. This designator is simply incremented as new groups of toxins are discovered. The toxin-family designator is followed by a lowercase letter that is used to distinguish isoforms as many venomous animals diversify their toxin repertoire through post-translational modifications that alter the original toxin by a few amino acids. However, if the isoform from a transcript is identical at the protein level with a different nucleotide sequence, a numeral is added after the isoform letter preceded by an underscore. The last part of the protein name is an uppercase letter that identifies the genus of origin and the lowercase species name. Both genus and full species name are required to avoid confusion because of the large number of ant species. An example of a protein isolated from the present investigation is provided in Figure S12.

Identification of Potential Pharmacological Targets
In order to assess the extent of similarity of contigs to the matched proteins and to assess whether the critical residues were present, several potential toxin contigs were manually searched using BLASTx. These were then aligned using the MEGA's ClustalW alignment feature [20].

Transcriptome Data Submission
All annotated transcripts and sequence data are being deposited into the EMBL-EBI European Nucleotide Archive (ENA).

Venom Collection for Proteomics and Neuron Cell Assay
After collection, ants were stored at −20 • C prior to manual dissection of the venom reservoirs. After dissection, the reservoirs were pooled in 10% (v/v) acetonitrile (ACN)/water. Samples were then centrifuged for 5 min at 14,400 rpm (12,000 g av ) and the supernatant was collected and lyophilised prior to storage at −20 • C.
To identify proteins present in the venom reservoir, we used a bottom-up proteomics approach. Crude venom was re-suspended in MilliQ water and digested using trypsin (Promega, Madison, WI, USA). Peptides were then analysed using an Eksigent 415 autosampler connected to a 415 nanoLC system (Eksigent, Dublin, CA, USA). Five µL of the sample was loaded at 300 nl/min with MS buffer A consisting of 2% ACN + 0.2% formic acid (FA) by direct injection onto a PicoFrit column (75 mm × 150 mM; New Objective, USA) packed with C18AQ resin (1.9 µm, 200 Å, Dr. Maisch, Germany). Peptides were eluted from the column and into the source of a 6600 TripleTOF hybrid quadrupole-time-of-flight mass spectrometer (Sciex, Redwood City, CA, USA) using the following gradient: 2%-35% MS buffer B (80% ACN + 0.2% FA) over 90 min, 35%-95% MS buffer B over 9 min, 95% MS buffer B for 9 min, 95%-2% for 2 min. The eluting peptides were ionised at 2300 V. An Intelligent Data Acquisition (IDA) experiment was performed, with a mass range of 350-1500 Da continuously scanned for peptides of charge state 2 + -5 + with an intensity of more than 400 counts/s. Up to 50 candidate peptide ions per cycle were selected and fragmented and the product ion fragment masses measured over a mass range of 100-2000 Da. The mass of the precursor peptide was then excluded for 15 seconds.

Protein Identification
Proteins present in the venom were identified by mapping the mass spectra to the translated transcriptome assembly combined with a contaminants database using the software PEAKS Studio v8.5 (BSI, Waterloo, ON, Canada). Since there is a known variability of toxins post-translationally, several parameters were employed in order to maximise the identified proteins. These parameters included a semi-tryptic peptide with biological modification of deamidation and oxidation and a parent mass error tolerance of 50 ppm. They also included a fragment mass error tolerance of 0.1 Da and use of the enzyme trypsin, with a maximum number of three missed cleavages. The results of the search were then filtered to include peptides with a -log10P score that was determined by the false discovery rate (FDR) of <2%, the score being that where decoy database search matches were <2% of the total matches to call a positive hit. Sequences with less than 95% confidence were excluded. At least two peptides from the shotgun MS/MS analysis were required for protein identification using PEAKS.

Purification of δ-paraponeritoxin-Pc1e by RP-HPLC
Whole venom (0.5 mg) was separated on a Phenomenex Gemini NX-C18 column (250 × 4.6 mm, 3 µm particle size, 110 Å pore size) using a gradient of 15%-45% solvent B (90% ACN, 0.05% TFA) over 30 min at a flowrate of 1 mL/min. Fractions were collected based on absorbance at 214 nm, and the identity of the major peak confirmed by mass spectrometry as δ-paraponeritoxin-Pc1e. An aliquot of the purified fraction was rerun under the same conditions to assess purity. Purified δ-paraponeritoxin-Pc1e was dried by vacuum concentration and resuspended in 100 µL pure water from which a 6 µL aliquot was used for LC-MS/MS analysis, and 7 µL aliquot was used for calcium imaging experiments.

LC-MS/MS
An aliquot of purified δ-paraponeritoxin-Pc1e was reconstituted in 0.5% FA, and separated on a Shimadzu (Tokyo, Japan) Nexera uHPLC with an Agilent Zorbax sTable-bond C 18 column (2.1 × 100 mm, 1.8 µm particle size, 300 Å pore size). A flow rate of 180 µL/min was used with a gradient of 1%-40% solvent B (90% ACN, 0.1% FA) in 0.1% FA over 45 min and analysed on an AB Sciex 5600 TripleTOF mass spectrometer equipped with a Turbo-V source heated to 550 • C. MS1 survey scans were acquired at 300-1800 m/z over 250 ms, and the 20 most intense ions with a charge of +2 to +5 and an intensity of at least 120 counts were selected for MS2. The unit mass precursor ion inclusion window mass ± 0.7 Da, and isotopes within ±2 Da were excluded from MS2, with scans acquired at 80-1400 m/z over 100 ms and optimized for high resolution.

Calcium Imaging of Sensory Neurons
Dorsal root ganglia (DRG) from a 50-day-old C57BL/6 mouse were dissociated, plated in DMEM (Gibco, MD, USA), 10% (v/v) foetal bovine serum (Assaymatrix, Melbourne, VIC, Australia), 1× penicillin/streptomycin (Gibco, Gaithersburg, MD, USA) on a 96-well poly-D-lysine-coated culture plate (Corning, ME, USA) and maintained overnight. Cells were loaded with Fluo-4 AM calcium indicator, according to the manufacturer's instructions (Thermo Fisher Scientific). After loading (1 h), the dye-containing solution was replaced with an assay solution (1X Hanks' balanced salt solution, 20 mM HEPES). Fluorescence corresponding to intracellular calcium concentration ([Ca 2+ ] i ) of 100-150 DRG cells were monitored in parallel using a Nikon Ti-E Deconvolution inverted microscope, equipped with a Lumencor Spectra LED Lightsource. Images were acquired at 20× objective at 1 fps (excitation 485 nm, emission 521 nm). Baseline fluorescence was monitored for 20 s, and at 30 s the assay solution was replaced with an assay solution containing whole venom (100 µg/mL) or purified δ-paraponeritoxin-Pc1e (1:10 dilution of resuspended fraction as described above). Experiments involving the use of mouse tissue were approved by the University of Queensland animal ethics committee, approval code: TRI/IMB/093/17, approval date: 31 March 2017.
Supplementary Materials: The following are available online at http://www.mdpi.com/2072-6651/12/5/324/s1, Figure S1: Distribution of protein hits to different hymenopteran species, Figure S2: δ-Paraponeritoxin-Pc1e LC-MS/MS coverage, Figure S3: Isolated P. clavata poneratoxin MSMS match to δ-paraponeritoxin-Pc1e_1 contig, Figure S4: Amino acid sequence alignment of ω-conotoxin-like contigs, Figure S5: Alignment of hyaluronidase-like proteins from P. clavata and other ant species, Figure S6: Alignment of icarapin-like proteins, Figure S7: Amino acid alignment of arginine kinase transcripts, Figure S8: Alignment of serine proteases from P. clavata and other insect species, Figure S9: Summary of the P. clavata combined proteome/ transcriptome methodology, Figure S10: Gene ontology classification of contigs with BLASTx hits, Figure S11: Peptide toxin nomenclature system using a spider venom peptide example, Figure S12: Proposed protein toxin nomenclature system using an ant venom protein example, Table S1: Toxin keyword search list. Supplementary file S1: toxins identified by BLASTx along with the TPM values, Supplementary file S2: an excel worksheet containing all protein hits from the proteome investigation using the transcriptome database.