Venom of the Annulated Sea Snake Hydrophis cyanocinctus: A Biochemically Simple but Genetically Complex Weapon

Given that the venom system in sea snakes has a role in enhancing their secondary adaption to the marine environment, it follows that elucidating the diversity and function of venom toxins will help to understand the adaptive radiation of sea snakes. We performed proteomic and de novo NGS analyses to explore the diversity of venom toxins in the annulated sea snake (Hydrophis cyanocinctus) and estimated the adaptive molecular evolution of the toxin-coding unigenes and the toxicity of the major components. We found three-finger toxins (3-FTxs), phospholipase A2 (PLA2) and cysteine-rich secretory protein (CRISP) in the venom proteome and 59 toxin-coding unigenes belonging to 24 protein families in the venom-gland transcriptome; 3-FTx and PLA2 were the most abundant families. Nearly half of the toxin-coding unigenes had undergone positive selection. The short- (i.p. 0.09 μg/g) and long-chain neurotoxin (i.p. 0.14 μg/g) presented fairly high toxicity, whereas both basic and acidic PLA2s expressed low toxicity. The toxicity of H. cyanocinctus venom was largely determined by the 3-FTxs. Our data show the venom is used by H. cyanocinctus as a biochemically simple but genetically complex weapon and venom evolution in H. cyanocinctus is presumably driven by natural selection to deal with fast-moving prey and enemies in the marine environment.


Introduction
Snakebite envenomation, as a neglected tropical disease, affects at least 1.8-2.7 million people, with upper estimates of 81,000-138,000 deaths annually in the world [1]. Many people are impressed with snakebite envenomation occurring on land, where most of the venomous snakes are, but they know little about the envenomation that occurs in the sea, largely due to low encounter rates with sea snakes and their less aggressive performance [2][3][4]. However, the snakebite envenomation caused by sea snakes should

Venomic Profile
Twenty protein fractions (chromatographic peaks) were distinguished in venom by RP-HPLC and 26 protein bands from SDS-PAGE were identified by MS analysis ( Figure 1A and Table 1). These protein bands could be classified into three toxin families: 3-FTx (58.09%), PLA 2 (40.11%) and CRISP (1.80%) ( Figure 1B and Table 1). Moreover, chromatographic peaks 1-9 were identified as 3-FTx and accounted for 54.94% of the total venom protein, peaks 10-19 were comprised of a high abundance of PLA 2 (40.11%) and a relatively low abundance of 3-FTx (3.15%) and peak 20 contained only CRISP. Collectively, longneurotoxin (LNX, 38.90%) was the predominant component in the 3-FTx family and shortneurotoxin (SNX, 19.19%) and acidic (17.53%) and basic (22.58%) PLA 2 s differed slightly from each other in expression abundance. Among four RP-HPLC factions of H. cyanocinctus venom, the SNX (i.p. 0.09 µg/g) and LNX (i.p. 0.14 µg/g) both presented fairly low LD 50 s for mice, whereas the basic and acidic PLA 2 s did not cause any death to the mice at the upper doses of 0.6 and 2.0 µg/g, respectively (Table 2). It is thus clear that the SNX is the most toxic component in H. cyanocinctus venom and that both SNX and LNX are far more toxic than PLA 2 s. Due to the high abundance of neurotoxins (SNX and LNX), H. cyanocinctus venom displayed relatively low LD 50 for ICR mice (crude venom i.p. 0.26 µg/g and whole venom protein i.p. 0.16 µg/g). Another three major fractions (peaks 9, 16 and 17) were also determined to not cause death at a dose of 0.2 µg/g in mice; however, they were not further employed to determine the lethality with any other doses and thus are not displayed in the final results.
protein, peaks 10-19 were comprised of a high abundance of PLA2 (40.11%) and a relatively low abundance of 3-FTx (3.15%) and peak 20 contained only CRISP. Collectively, long-neurotoxin (LNX, 38.90%) was the predominant component in the 3-FTx family and short-neurotoxin (SNX, 19.19%) and acidic (17.53%) and basic (22.58%) PLA2s differed slightly from each other in expression abundance. Among four RP-HPLC factions of H. cyanocinctus venom, the SNX (i.p. 0.09 μg/g) and LNX (i.p. 0.14 μg/g) both presented fairly low LD50s for mice, whereas the basic and acidic PLA2s did not cause any death to the mice at the upper doses of 0.6 and 2.0 μg/g, respectively (Table 2). It is thus clear that the SNX is the most toxic component in H. cyanocinctus venom and that both SNX and LNX are far more toxic than PLA2s. Due to the high abundance of neurotoxins (SNX and LNX), H. cyanocinctus venom displayed relatively low LD50 for ICR mice (crude venom i.p. 0.26 μg/g and whole venom protein i.p. 0.16 μg/g). Another three major fractions (peaks 9, 16 and 17) were also determined to not cause death at a dose of 0.2 μg/g in mice; however, they were not further employed to determine the lethality with any other doses and thus are not displayed in the final results.  Table 1.  Long neurotoxin 0.14 (0.09-0.21) 11 Basic PLA 2 >0. 6 18 Acidic PLA 2 >2.0 *: values in parentheses are 95% confidence limits. **: Whole venom protein was defined as the protein in crude venom and quantified by the crude venom after determination of the protein concentration.

Correlation between Translational and Transcriptional Abundances of Toxins
To evaluate the correlation between translational and transcriptional abundances of toxins, the MS data of protein bands were further assigned to 11 genes from an in-house database constructed from toxin-coding transcripts ( Table 1). Considering that each of the  , Tan et al. [29] and Durban et al. [27], respectively. NC, non-conventional neurotoxin; SVSP, snake venom serine proteinase; AchE, acetylcholinesterase. Relative abundances less than 0.01% are not indicated in the toxin families.

Correlation between Translational and Transcriptional Abundances of Toxins
To evaluate the correlation between translational and transcriptional abundances of toxins, the MS data of protein bands were further assigned to 11 genes from an in-house database constructed from toxin-coding transcripts ( Table 1). Considering that each of the  [29] and Durban et al. [27], respectively. NC, non-conventional neurotoxin; SVSP, snake venom serine proteinase; AchE, acetylcholinesterase. Relative abundances less than 0.01% are not indicated in the toxin families.

Correlation between Translational and Transcriptional Abundances of Toxins
To evaluate the correlation between translational and transcriptional abundances of toxins, the MS data of protein bands were further assigned to 11 genes from an in-house database constructed from toxin-coding transcripts ( Table 1). Considering that each of the five protein bands, including three 3-FTxs, one PLA 2 and one CRISP, could be matched with 2-3 toxin-coding transcripts due to having the same scores, the abundance of each Toxins 2021, 13, 548 7 of 16 toxin at the protein level was divided equally among these transcripts, although this may have arbitrarily reflected the protein abundance translated by these transcripts. The results indicated that the abundance of these 11 toxins at the protein level was strongly correlated with that at the mRNA level (ρ = 0.73, p = 0.01) according to Spearman's rank correlation analysis, and this was validated by the linear regression analysis with a relatively high Pearson's correlation coefficient (R = 0.84, p = 0.001). Furthermore, the transcript abundances could explain the majority of variation in protein abundance (R 2 = 0.71, p = 0.001) (Figure 4).
Toxins 2021, 13, x FOR PEER REVIEW five protein bands, including three 3-FTxs, one PLA2 and one CRISP, could be m with 2-3 toxin-coding transcripts due to having the same scores, the abundance toxin at the protein level was divided equally among these transcripts, although t have arbitrarily reflected the protein abundance translated by these transcripts. sults indicated that the abundance of these 11 toxins at the protein level was stron related with that at the mRNA level (ρ = 0.73, p = 0.01) according to Spearman's ra relation analysis, and this was validated by the linear regression analysis with a re high Pearson's correlation coefficient (R = 0.84, p = 0.001). Furthermore, the tr abundances could explain the majority of variation in protein abundance (R 2 = 0 0.001) (Figure 4).

Positive Selection in Evolutionary Adaption
Either the codeml or yn00 programs in PAML 4.8 were used to evaluate the selection in evolutionary adaption according to the full CDS homologs aligned t coding unigenes from H. cyanocinctus. Eighteen of the fifty-nine unigenes were e from the analysis due to a lack of full CDS homologs. Twenty-two codeml tests w formed on 27 full-length unigenes from 18 toxin families, with a significance level following Bonferroni correction (Tables 3 and S3). Furthermore, the selection of length unigenes from 10 toxin families together with their homologs was then analysed using yn00 (Table S4).

Positive Selection in Evolutionary Adaption
Either the codeml or yn00 programs in PAML 4.8 were used to evaluate the positive selection in evolutionary adaption according to the full CDS homologs aligned to toxincoding unigenes from H. cyanocinctus. Eighteen of the fifty-nine unigenes were excluded from the analysis due to a lack of full CDS homologs. Twenty-two codeml tests were performed on 27 full-length unigenes from 18 toxin families, with a significance level of 0.002 following Bonferroni correction (Table 3 and Table S3). Furthermore, the selection of 14 full-length unigenes from 10 toxin families together with their homologs was then directly analysed using yn00 (Table S4). The 3-FTx (SNX), CRISP and NGF families were comprised of 2-3 unigenes and the sequence divergence in each family was lower than 10%. Thus, the unigenes from the same family were combined for codeml analysis with their homologs. The results rejected M1 (null hypothesis model) in favour of M2 (positive selection model), with all p < 0.05 after Bonferroni correction. For these tests, 28% (SNX), 22% (CRISP) and 15% (NGF) of codon sites exhibited positive selection of 4.74 ≤ ω ≤ 9.84. M0 showed that all sequence sites and branches in these unigenes exhibited an average strength of selection of 1.30 ≤ ω ≤ 2.79. The nucleotide sequence divergence between two PLA 2 unigenes was >10%, and these unigenes were therefore analysed separately with their homologs. M1 could be easily rejected in favour of M2 with p < 0.001 following Bonferroni correction. Moreover, 32-38% of codon sites exhibited positive selection of 4.16 ≤ ω ≤ 8.00. M0 indicated that all the sequence sites and branches in these two PLA 2 unigenes had an average strength of selection of 1.57 ≤ ω ≤ 1.95.
The nucleotide sequence divergence between two 5 NT unigenes was <10%, and these unigenes were therefore combined for positive selection analysis with their homologs. M1 could not be rejected with p = 0.12. M0 indicated that all the sequence sites and branches in Toxins 2021, 13, 548 9 of 16 these two unigenes had an average strength of selection of ω = 0.40. Three CTL unigenes presented relatively large divergences (>10%) in their nucleotide sequences; therefore, these unigenes were analysed separately with their homologs. However, M1 could not be rejected in these unigenes, with p > 0.05 in all cases. M0 indicated that all sequence sites and branches in three CTLs had an average strength of selection of 0.52 ≤ ω ≤ 0.62. Two kunitz unigenes were divided into two groups due to the large nucleotide sequence divergence. M1 could only be rejected in kunitz (2) in favour of M2, with p < 0.001 after Bonferroni correction. Thirty-eight percent of codon sites exhibited positive selection of ω = 7.95. M1 could not be rejected in kunitz (1), with p > 0.05 after Bonferroni correction. M0 showed that all the sequence sites and branches in these two unigenes had an average strength of selection of 0.33 ≤ ω ≤ 3.14.
For those unigenes subjected to condeml tests, six (cystatin, DPP IV, LAAO, PLB, SVMP and VF families) easily rejected M1, and they accepted M2 with p < 0.01 in all cases after Bonferroni correction. Additionally, 2-19% of codon sites were estimated to have positive selection of 3.82 ≤ ω ≤ 11.02. M0 indicated that all the sites and branches of the six unigenes had selection strengths of 0.43 ≤ ω ≤ 1.38. In contrast, tests for the AP, HA, PLA 2 inhibitor, QC and VEGF unigenes could not reject M1, with all p > 0.05 after Bonferroni correction; M0 indicated a value for ω of 0.21-0.82.
Of the 14 remaining unigenes from 10 protein families, only one homolog with high similarity could be assigned to each sequence. In each pair of sequences, we found dN of 0-0.137 and dS of 0.009-0.347 for these unigenes from the venom-gland transcriptome. We calculated that dN/dS = 6.540, 3.673 and 4.251 for three LNX and 1.032 for one SVMP unigenes, from which we inferred that these sequences probably underwent positive selection. Moreover, 10 unigenes from eight protein families, including 5 NT, aminopeptidase, CREGF, cystatin, PDE, PLA 2 inhibitor, MP inhibitor and waprin, were considered to exhibit purifying selection with dN/dS ranging from 0-0.529.

Discussion
As much attention has been paid to the composition of snake venom at the omics level, the strategy combining venomic analysis with de novo NGS analysis has been developed as an important means to unravel variation in snake venom and also provided an important reference for investigating the function of snake venom, the clinical symptoms of snakebites and the development of novel drugs. Here, we deployed a classic snake venomic workflow combined with RP-HPLC, SDA-PAGE and MS analyses [34] to decomplex the venomic profile of a pooled crude venom sample of H. cyanocinctus. With regard to the relative abundance of toxins at the protein level, 3-FTx and PLA 2 could be defined as predominant toxin families, and this was similar to two previous venomic studies using samples from populations from Hara, Iran [19], and Haikou, Hainan, China [21]. However, the Xisha population exhibited the lowest abundance of 3-FTx but the highest abundance of PLA 2 among the three populations and a comparatively low divergence between these two toxin families. Specifically, the expression of acidic and basic PLA 2 s was approximately 1.9-and 1.8-fold higher in the Xisha population than in the Haikou population, respectively. Furthermore, the ratio of SNX to LNX (1:2) in the Xisha population was opposite to that in the Hara (1.8:1) and Haikou (2.5:1) populations, and such an inconsistent divergence has also been found in H. curtus venoms [20,22,24]. This suggests that the potential symptoms of snakebites caused by H. cyanocinctus might vary among populations. Moreover, the lethality analysis indicated that the toxicity of H. cyanocinctus venom was largely determined by the 3-FTx family. The neurotoxins of H. cyanocinctus venom were less abundant in this study (58.09%) than in an earlier one using samples from the Hara population (81.1%; [19]).This finding explains why the venom from the Xisha population is less toxic than that from the Hara population (i.p. 0.172 µg/g). Importantly, the venom components with strong neurotoxicity in sea snakes have always been inferred to play an important role in paralyzing and killing fast-moving prey [18]; they may have evolved under the selective pressure imposed by such a diet in the marine environment.
Thus, compared with the Hara population, the Xisha population might have experienced a regional diet with a relatively lower percentage of fast-moving prey and further evolved to express less abundant neurotoxins in its venom. However, this inference should be verified in further studies.
Concordance and discordance in venom compositions and abundance at the protein and mRNA levels always receive much scientific attention due to their role in uncovering the regulatory mechanisms at the protein/mRNA levels and the factors likely to affect the evolution and function of snake venom [37][38][39][40][41][42][43][44][45][46]. Our data show that H. cyanocinctus presents an apparent discordance in venom composition between protein (three major families) and mRNA (24 families). Due to the low abundance, some toxin families can easily be neglected at the protein level. Moreover, some toxin families with extremely low abundance might contribute little to the adaptive radiation of sea snakes and have even never been reported in closely related species. These toxin families can very likely be considered as pseudogenes that cannot be detected at the protein level. However, as our gene annotation was not based on a species-specific reference genome, whether the toxincoding unigenes with low abundance and which are undetected at the protein level can be simply judged as pseudogenes or functional genes still needs to be verified. Concordance was verified in the abundance of 11 toxins at the protein and mRNA levels, suggesting that the post-transcriptional regulation might contribute little to the abundance of these toxins in H. cyanocinctus.
As a complicated biochemical arsenal, snake venom is believed to evolve to capture prey and attack potential enemies, thus comprising an ideal model to discuss the strength of natural selection exhibited by each toxin. Positive selection in the adaptive evolution of snake venom has been detected at both the individual toxin gene and venom-gland transcriptomic levels, with most studies conducted using site model analysis [28,[47][48][49][50][51][52][53]. Here, considering that the power of the likelihood-ratio test might be reduced by the excessive sequence divergence [54], we divided the unigenes from the same toxin family into several groups at a threshold of 10% nucleotide sequence divergence and conducted codeml test separately with their homologs. Briefly, 17 of 27 toxin-coding unigenes from H. cyanocinctus were found to exhibit positive selection according to the likelihood ratio between M1 and M2, and this was verified by that between M7 and M8 (the results are not shown as they are nearly identical to those for M1 and M2). The percentage (63%, 17/27) of toxin-coding unigenes that underwent positive selection in H. cyanocictus was significantly lower than that for Crotalus adamanteus (89%, 24/27; [50]) and H. curtus (73%, 24/33; [24]), and it would even be much lower if the sequences analysed by yn00 were taken into account. Notably, all of the unigenes from the 3-FTx, PLA 2 and CRISP families in H. cyanocictus, which could be detected in high abundance at the protein level and might have a practical function in predation and defence, were found to undergo positive selection. This finding suggests that the positive selection of toxin-coding unigenes in H. cyanocictus might be strongly driven by the fast-moving prey and enemies in the sea environment. Toxin-coding unigenes experiencing no positive selection might play no substantive role because H. cyanocictus has evolved to prefer a simplified diet consisting mainly of fish.

Conclusions
Here, we used an integrated omics strategy to investigate the diversity of venom toxins at the protein and mRNA levels in H. cyanocinctus from the South China Sea. We found an apparent discordance in venom composition between protein (three major families) and mRNA (24 families) and thus concluded that H. cyanocinctus uses venom as a biochemically simple but genetically complex weapon. The abundance of 11 toxins at the protein level was strongly correlated with that at the mRNA level, indicating that the post-transcriptional regulation contributes little to the abundance of these toxins. Nearly 51.2% (21 unigenes) of the toxin-coding unigenes with full lengths in H. cyanocinctus were found to have undergone positive selection, with the proportion being much lower than reported for other venomous snakes so far studied. Among four major venom components, the shortchain neurotoxin (SNX: i.p. 0.09 µg/g) and long-chain neurotoxin (LNX: i.p. 0.14 µg/g) of 3-FTx had fairly high toxicity (LD 50 ) towards mice, whereas both basic and acidic PLA 2 s were not lethal to mice at the upper doses of 0.6 and 2.0 µg/g, respectively. Moreover, the crude venom and whole venom protein separately expressed toxicities of 0.26 and 0.16 µg/g toward mice. It is thus clear that the toxicity of H. cyanocinctus venom is largely determined by the 3-FTx family. For further recognition of the contribution of each venom component to the whole venom toxicity, the remaining RP-HPLC fractions should be employed to determine the lethality, especially for those with high abundance (e.g., peaks 9, 16 and 17).

Animals and Ethics
We collected six adults of the H. cyanocinctus species in the waters of the Xisha Islands in the South China Sea and transferred them to the Herpetological Research Center at Hainan Tropical Ocean University where they were maintained in a circulatory sea water system. The collection of snakes was approved by Hainan Tropical Ocean University (12 September 2017), and the experimental scheme was approved by the Animal Research Ethics Committee of Hangzhou Normal University (AREC2019109).

Isolation of Venom Proteins by RP-HPLC and SDS-PAGE
Fresh venom was extracted from each snake using a 100 µL plastic pipette, then centrifuged to remove impurities for 15 min at 10,000× g 4 • C, lyophilized, equally pooled and stored at −80 • C until use. Three milligrams of venom powder was re-dissolved in 0.1% TFA and centrifuged for 15 min at 10,000× g, 4 • C, and the supernatant was automatically loaded onto a Kromasil C18 column (250 × 4.6 mm, 5 µm particle size, 300 Å pore size; AkzoNobel, Bohus, Sweden) and separated at a flow rate of 1 mL/min using a Waters E600 HPLC system (Waters, Milford, MA, USA). The whole process was performed with a linear gradient of mobile phase A (0.1% TFA in water) and B (100% ACN): 0-15% B for 30 min, followed by 15-45% B for 120 min and 45-70% B for 20 min. Protein detection was monitored at 215 nm. The fractions were collected manually and concentrated in a Labconco CentriVap ® Centrifugal Concentrator (Labconco, Kansas, MO, USA). Protein concentration was determined according to Bradford [55]. The proteins of each fraction were separated by 18% SDS-PAGE under reduced conditions, and the gels were stained in 0.2% Coomassie Brilliant Blue R-250 and imaged using a Tanon Imaging system (Tanon Science & Technology, Shanghai, China).

Identification and Relative Abundance Estimation of Venom Proteins
Protein bands in the gels were excised and split into pieces with sizes of 1 × 1 mm, destained with 50 mM NH 4 HCO 3 in 50% ACN and rinsed with 100% ACN in a Ther-moMixer (ThermoFisher Scientific, Waltham, MA, USA). The proteins in gels were further treated with 50 mM DTT in 50 mM NH 4 HCO 3 and alkylated with 0.14 M IAA in 50 mM NH 4 HCO 3 , rinsed again with 100% ACN in a ThermoMixer, and then were digested with trypsin (Promega, Madison, WI, USA) for 16 h at 37 • C. The digests (peptide mixture) were collected, lyophilized and re-dissolved in 0.1% TFA, desalted and enriched using an Acclaim™ PepMap™ 100 C18 column (Trap Cartridge; 5 × 0.3 mm, 5 µm; ThermoFisher Scientific), then separated by capillary RP-HPLC using an Acclaim™ PepMap™ RSLC 100 C18 column (NanoViper; 75 µm × 15 cm, 2 µm; ThermoFisher Scientific) with mobile phase A of 0.1% FA in water and B of 20% ACN in 0.1% FA as follows: 4% B for 3 min, 4-50% B for 47 min, 50-99% B for 4 min and 99% B for 6 min. Peptide eluents were subjected to a Q Exactive Orbitrap platform (ThermoFisher Scientific) according to the manufacturer's instructions. The original MS/MS spectra were processed using Xcalibur software, and the sequence similarity was conducted using PEAKS X based on the UniProt database (strictly limited to "Serpentes") or an in-house database (toxin transcripts from H. cyanocinctus venom-gland transcriptome). The mass tolerance was set at 0.1 Da; carbamidomethyl (C) and oxidation (M) were set as fixed and variable modifications, respectively.
The integration of the fractions in a chromatogram and the densitometry of the protein bands in an electropherogram were used to estimate the relative abundance of the venom composition [34,56]. Briefly, the relative abundance of each fraction was defined as the peak area measured by Empower software. When the fraction contained only one protein band, the relative abundance was directly defined from the peak area measurement, whereas for fractions containing more than one protein band, the relative abundance of each band was estimated by densitometry using Tanon-3500R software (Tanon Science & Technology, Shanghai, China).

Venom Gland cDNA Synthesis and Sequencing
After extraction of the venom, the snakes were kept in reptile pet terrariums and allowed to recover for four days to maximize the transcription in the venom glands, then sacrificed by injection of sodium pentobarbital (i.p. 100 mg/kg). Venom glands from both sides in each specimen were removed and dissected into pieces with sizes of < 2 mm × 2 mm, pooled and permeated with RNAlater (Qiagen, Hilden, Germany) at 4 • C overnight, and then stored at −80 • C until further use. Venom gland RNA of each specimen was extracted using TRIzol (Life Technologies, Carslbad, CA, USA), purified, concentrated and resuspended in 100 µL THE Ambion RNA storage solution (Life Technologies). The purity, integrity and concentration of RNA were evaluated using an Implen NanoPhotometer (Implen, München, Germany), Agilent 2100 Bioanalyzer (Agilent Technologies, Santa Clara, CA, USA) and Qubit 2.0 fluorometer (ThermoFisher Scientific), respectively. The mixture of total RNA for sequencing was prepared by equally mixing the RNA from the above six samples, in which the RNA integrity number (RIN) ranged from 7.4 to 8.0. Subsequently, mRNA was purified and enriched from the total RNA mixture using magnetic beads attached to oligo (dT), fragmented and then used as the template to synthesize a first-strand cDNA with random hexamer primers and reverse transcriptase (RNase H). After synthesis of the second-strand cDNA with dNTPs, RNase H and DNA polymerase I, the double-stranded cDNA was absorbed and purified using AMPure XP beads (Beckman Coulter), which was conducted with end repair and ligation of a poly (A) tail and adapters. The adapter-ligated fragments of 250-300 bp in length were preferentially screened, amplified by PCR and purified using AMPure XP beads to generate a final cDNA library. After assessment of the quality using an Agilent 2100 Bioanalyzer, the cDNA library was transferred to the Illumina HiSeq TM 2500 platform (Illumina, San Diego, CA, USA) for high throughout sequencing.

Transcriptome Assembly, Annotation and Quantification
Prior to sequence assembly, the raw reads containing adapters and expressing low quality (Q ≤ 20) were eliminated. The clean reads were then assembled into contigs using Trinity [57] with the following steps: a k-mer dictionary (k = 25) was initially assembled from the different clean read sets and developed into a collection of linear contigs greedily searching using Inchworm; de Bruijn graphs were constructed using Chrysalis based on the contigs that shared at least one k-1-mer and the reads spanned the junction between contigs; the de Bruijn graphs with clean reads and paired-end reads were finally reconciled and the full-length transcripts for spliced isoforms and paralogs were reconstructed and arranged using Butterfly. The longest sequences with no redundancy derived from the transcripts in each gene locus by Corset [58] and CAP3 [59] were defined as unigenes and used as references for further analyses. Gene annotation was executed by searching against the NCBI NT/NR and UniProt protein databases (strictly limited to "Serpentes"). All clean reads were assigned to the reference unigenes using RSEM, and the number of reads matched to a given unigene was defined as the readcount and converted into FPKM for estimating the abundance of unigenes [60,61].

Detection of Positive Selection
Positive selection was detected on 42 transcripts with full-length CDS from 22 toxin families. Prior to analysis, the homologs of each transcript were gathered from the NCBI nucleotide database with a threshold of 10% divergence in sequences. The sequences were aligned by MAFFT 7.313 on the basis of the AA sequence. Signal peptides, gaps and stop codons were excluded from all analyses. Then, the best-fitting model for partitions was evaluated by PartitionFinder 2.1.1 [62] based on the Akaike information criterion (AIC) and a complete search. A maximum-likelihood phylogeny was constructed by IQ-TREE 1.6.8 in PhyloSuite 1.2.2 [63]; each operation was conducted by performing 5000 ultrafast bootstraps, and the SH-aLRT branch test was conducted by performing 1000 replicates. After confirmation that the chains were converged and mixed adequately, the maximum clade credibility tree was collected as the target tree.
Positive selection in nucleotide sites based on a likelihood-ratio test was evaluated using the codeml program in PAML 4.8 [64]. Generally, to evaluate whether the sites exhibited positive selection, the difference in log-likelihoods between models M1 (the null model refers to neutral selection with dN/dS = 1 and purifying selection with dN/dS < 1) and M2 (the positive model refers to positive selection with dN/dS > 1) was determined and compared to a χ 2 distribution with two degrees of freedom. Then, the initial results were further verified by comparing models M7 and M8. A single ratio for all sites with average dN/dS was calculated by the M0 model. If only one homolog with complete CDS could be matched to our transcript, then dN and dS were directly calculated with the yn00 program in PAML 4.8, then the dN/dS value was calculated manually.

Lethality
Determination of the median lethal doses (LD 50 ) was conducted by intraperitoneal injection of H. cyanocinctus venom and the major RP-HPLC fractions into ICR mice (22-26 g, N = 4; Laboratory Animal Center of Hangzhou Normal University) of either sex as previously described [65]. The mortality was recorded over 24 h, and LD 50 was calculated using the Spearman-Karber method.

Statistical Analyses
The abundance of each toxin at the protein and mRNA levels was transformed by centred log-ratio (clr) as described by Rokyta et al. [66], and the correlation between the abundances of each toxin at both levels was assessed by three coefficients (Spearman's rank correlation coefficient, Pearson's correlation coefficient and determination coefficient), which were calculated through non-parametric correlation and linear regression analyses using Statistica 8.0 (StatSoft, Tulsa, OK, USA). The significance level was set at α = 0.05.

Supplementary Materials:
The following are available online at https://www.mdpi.com/article/ 10.3390/toxins13080548/s1, Table S1. Amino acid sequences translated from toxin unigenes in the venom-gland transcriptome of the annulated sea snake (Hydrophis cyanocinctus) from the South China Sea. Sequences with partial CDS are indicated in italics; Table S2. FPKM% of protein family in the venom-gland transcriptome of the annulated sea snake (Hydrophis cyanocinctus) from the South China Sea; Table S3. Sequences used for positive selection detection in the codeml test; Table S4. Sequences used for positive selection detection in the yn00 test.