De Novo Venom-Gland Transcriptomics of Spine-Bellied Sea Snake (Hydrophis curtus) from Penang, Malaysia—Next-Generation Sequencing, Functional Annotation and Toxinological Correlation

Envenomation resulted from sea snake bite is a highly lethal health hazard in Southeast Asia. Although commonly caused by sea snakes of Hydrophiinae, each species is evolutionarily distinct and thus, unveiling the toxin gene diversity within individual species is important. Applying next-generation sequencing, this study investigated the venom-gland transcriptome of Hydrophis curtus (spine-bellied sea snake) from Penang, West Malaysia. The transcriptome was de novo assembled, followed by gene annotation and sequence analyses. Transcripts with toxin annotation were only 96 in number but highly expressed, constituting 48.18% of total FPKM in the overall transcriptome. Of the 21 toxin families, three-finger toxins (3FTX) were the most abundantly expressed and functionally diverse, followed by phospholipases A2. Lh_FTX001 (short neurotoxin) and Lh_FTX013 (long neurotoxin) were the most dominant 3FTXs expressed, consistent with the pathophysiology of envenomation. Lh_FTX001 and Lh_FTX013 were variable in amino acid compositions and predicted epitopes, while Lh_FTX001 showed high sequence similarity with the short neurotoxin from Hydrophis schistosus, supporting cross-neutralization effect of Sea Snake Antivenom. Other toxins of low gene expression, for example, snake venom metalloproteinases and L-amino acid oxidases not commonly studied in sea snake venom were also identified, enriching the knowledgebase of sea snake toxins for future study.


Introduction
Snakebite envenomation is a World Health Organisation (WHO)-classified neglected tropical disease that heavily affects many impoverished populations in the tropics and subtropics [1]. Each year, it causes 81,000 to 138,000 deaths across the world and approximately three times as many permanent disabilities and psychological trauma in those who survive [2,3]. Snakebite cases are mostly reported from inland areas where agricultural activities are predominant, whereas sea snake bite remains an under-estimated, fatal occupational hazard to fishermen and coastal communities [4,5]. In recent years, the risk of sea snake bite increases due to various environmental and anthropogenic factors, for example,

Sequencing and De Novo Transcriptome Assembly
Sequencing of the cDNA libraries of H. curtus venom-gland tissue yielded a total of 54,140,326 clean reads. De novo assembly of the reads resulted in 126,790 contigs (N50 = 921), which were further clustered and streamlined into 82,209 unigenes (N50 = 2073). Of these, a total of 70,564 transcripts were identified at FPKM ≥ 1, a cut-off for gene expression of the current study ( Table 1). The length distributions of contigs and unigenes from the de novo assembly of the venom-gland transcriptome were shown in Figure 1. Q20 percentage is the proportion of nucleotides with a quality value larger than 20; N percentage is the proportion of unknown nucleotides in clean reads; GC percentage is the proportion of guanidine and cytosine nucleotides among total nucleotides; N50 is the shortest contig length needed to cover 50% of the transcriptome; FPKM stands for Fragments Per Kilobase of transcript per Million mapped reads; Redundancy refers to the abundance of expression per gene transcript.  Q20 percentage is the proportion of nucleotides with a quality value larger than 20; N percentage is the proportion of unknown nucleotides in clean reads; GC percentage is the proportion of guanidine and cytosine nucleotides among total nucleotides; N50 is the shortest contig length needed to cover 50% of the transcriptome; FPKM stands for Fragments Per Kilobase of transcript per Million mapped reads; Redundancy refers to the abundance of expression per gene transcript. Based on BLASTx search, the transcripts were assigned into three categories: (a) "Toxin"; (b) "Non-toxin"; and (c) "Unidentified" (Figure 2A; Table 1). Transcripts in the "Toxin" category encoded known and putative snake toxins; these constituted the venomgland transcriptome by a total FPKM of 48.18%. The remaining portion of the transcriptome was shared between "Non-toxin" transcripts, which represent cellular or house- Based on BLASTx search, the transcripts were assigned into three categories: (a) "Toxin"; (b) "Non-toxin"; and (c) "Unidentified" (Figure 2A; Table 1). Transcripts in the "Toxin" category encoded known and putative snake toxins; these constituted the venom-gland transcriptome by a total FPKM of 48.18%. The remaining portion of the transcriptome was shared between "Non-toxin" transcripts, which represent cellular or house-keeping genes (33.86% of the total FPKM) and transcripts with no identifiable hits from the BLASTx search (17.97% of total FPKM). The venom-gland transcriptome of H. curtus was dominated (virtually 50%) by transcripts with toxin annotation, reflective of the specialized toxin-secreting function of the gland tissue. The high expression of "genes for toxins" is comparable to previous findings in other terrestrial elapid snakes, including cobras (Naja spp.) [21,22], king cobra (Ophiophagus hannah) [23] and Micrurus spp. (American coral snakes) [24,25], where toxin transcripts accounted for more than 40% of the venom-gland transcriptomes.

Toxin Gene Expression Profile
A total of 70,564 transcripts were expressed in the H. curtus venom-gland, while only 96 were classified under the "Toxin" category. The transcripts showed distinct, non-redundant sequences and were clustered by sequence similarity into 21 families of toxin genes ( Figure 2B). Despite their extremely small number (96 out of 70,564 or 0.14% by total transcript count), these toxin transcripts were highly expressed, together contributing to an exceptionally high redundancy value of 4847.54 FPKM/transcript, which is in sharp contrast to the remaining genes with low expressions (non-toxin: 13.16 FPKM/transcript; unidentified: 3.89 FPKM/transcript, respectively) ( Table 1). The high redundancy of toxin genes probably reflects multigene duplication in snake venom evolution, in which functional diversity of toxins with increasing prey-specificity is generated. By natural selection, this is essential for the evolving snakes to become more adapted to survive environmental perturbations and to occupy new niches for survival [26,27]. (B) Profiling of toxin transcripts by gene families. Abbreviations: S-3FTx, short three-finger toxin; L-3FTx, long three-finger toxin; NC-3FTx, non-conventional three-finger toxin; PLA 2 , phospholipase A 2 ; CRiSP, cysteine-rich secretory protein; PLA 2 I, phospholipase A 2 inhibitor; Snaclec, snake venom C-type lectin/lectin-like protein; KSPI, Kunitz-type serine protease inhibitor; SVMP, snake venom metalloproteinase; CYS, cystatin; DPP IV, dipeptidylpeptidase IV; SVSP, snake venom serine protease; 5 NUC, 5 nucleotidase; VEGF, vascular endothelial growth factor; HYA, hyaluronidase; PDE, phosphodiesterase; WAP, waprin; NP, natriuretic peptide; CVF, cobra venom factor; NGF, nerve growth factor; AP, aminopeptidase; LAAO, L-amino acid oxidase; and AChE, acetylcholinesterase. The expression of 5 NUC, VEGF, HYA, PDE, WAP, NP, CVF, NGF, AP, neprilysin, LAAO and AChE were each <0.01% of total toxin FPKM.

Toxin Gene Expression Profile
A total of 70,564 transcripts were expressed in the H. curtus venom-gland, while only 96 were classified under the "Toxin" category. The transcripts showed distinct, nonredundant sequences and were clustered by sequence similarity into 21 families of toxin genes ( Figure 2B). Despite their extremely small number (96 out of 70,564 or 0.14% by total transcript count), these toxin transcripts were highly expressed, together contributing to an exceptionally high redundancy value of 4847.54 FPKM/transcript, which is in sharp contrast to the remaining genes with low expressions (non-toxin: 13.16 FPKM/transcript; unidentified: 3.89 FPKM/transcript, respectively) ( Table 1). The high redundancy of toxin genes probably reflects multigene duplication in snake venom evolution, in which functional diversity of toxins with increasing prey-specificity is generated. By natural selection, this is essential for the evolving snakes to become more adapted to survive environmental perturbations and to occupy new niches for survival [26,27].

Profiling of Toxin Transcripts
The de novo transcriptome of H. curtus venom-gland also revealed high sequence similarity of all toxin transcripts to sequences from various elapid species (Table 2). Amongst the 96 toxin genes identified, 45 have full-length sequences with ≥90% amino acid coverage ( Table 3). These proteins with full-length sequences included previously uncharacterized families of CRISP, CTL, KSPI, SVMP, cystatin, SVSP, 5NT, VEGF, hyaluronidase, waprin, CVF and neprilysin from sea snake venoms, in addition to the well-established 3FTx and PLA 2 proteins. The de novo sequences of these toxins, which are unique to H. curtus are now available in the public repository database and provided in Supplementary Table S1 for deeper insights in the diversity of sea snake toxins.    Of the 21 toxin gene families, 3FTxs are the most diversified and abundantly expressed (18 transcripts, 77.26% of total toxin FPKM), supporting that 3FTXs play a major role in the predatory function of H. curtus venom. More specifically, it was the alpha-neurotoxins that formed the bulk of 3FTx in the venom (~77% of total toxin FPKM) ( Table 2). The diversely expressed 3FTx genes were further categorized into long-chain, short-chain and non-conventional groups ( Table 2) and elaborated based on their functional attributes in the context of envenomation. PLA 2 proteins constituted the second most abundantly expressed toxin genes (18.88% of total toxin FPKM), followed by CRISP (3.34%), PLA 2 inhibitor (0.13%), CTL (0.12%), KSPI (0.09%), SVMP (0.087%), cystatin (0.06%), dipeptidyl peptidase (0.02%), SVSP (0.01%) and miscellaneous (5NT, VEGF, hyaluronidase, PDE, waprin, NP, CVF, NGF, aminopeptidase, neprilysin, LAAO and AChE, at <0.01% of total toxin FPKM, respectively). The transcriptomic profile showed a greater toxin diversity compared to the proteomic profile of H. curtus venom from the same geographical source (Penang) as reported previously [13], in which a few minor families (SVSP, NP, CTL, peptidases, neprilysin, AChE, waprin) were undetected at the protein level, implying that these minor proteins were present at a very low abundance in the venom but potentially serve ancillary function. From the evolutionary perspective, the transcriptomic finding indicates that the genes are conserved in the Hydrophis lineage, while their ecological significance awaits further elucidation.

Sequence Analysis and Phylogenetics of Three-Finger Toxins
Three-finger toxins are non-enzymatic polypeptides containing 60-74 amino acid residues orientated in three beta-stranded loops, resembling three protruding fingers [28,29]. Based on the protein structure, we categorized the 19 3FTx transcripts in H. curtus venomgland transcriptome into short-chain 3FTx (S-3FTx, with four disulfide bridges), long-chain 3FTx (L-3FTx, with an additional fifth disulfide bridge on the second loop) and nonconventional 3FTx (NC-3FTx, with an additional fifth disulfide bridge on the first loop) [29]. The majority of the transcripts that contributed to 56.48% of total toxin FPKM, were found to be S-3FTX (10 transcripts). The L-3FTX, comprising 7 transcripts, constituted 20.78% of total toxin FPKM, while there was only one NC-3FX transcript present at a negligible abundance (<0.01%) ( Table 2).
Within S-3FTX and L-3FTX subgroups, the short neurotoxin transcript, Lh_FTX01 and the long neurotoxin transcript, Lh_FTX13 were, respectively, the most abundantly expressed transcripts. Lh_FTX01 was most similar to the short neurotoxin SN160 (UniProt: Q8UW27), previously cloned from Lapemis hardwickii (Guangxi, China) [30]. Both Lh_FTX13 and SN160 (Q8UW27) encoded proteins consisting of 60 amino acid residues but with minor variation at two residual positions: Gly 19 in Lh_FTX01 was substituted by Glu 19 in SN160 and Ser 46 in Lh_FTX_01 was substituted by Arg 46 in SN160, as shown in Figure 3. The variation observed could be due to genetic differences between distant geographical populations, as the present specimen was from the northern waters of Malacca Straits (coastal Penang Island) while the previous specimen was sourced from Beihai, Guangxi, Southern China. In comparison, high homology was observed amongst the short neurotoxin (SNTX) sequences of congeneric sea snakes (Hydrophis spp.), sea kraits (Laticauda spp.) and Asiatic cobras (Naja spp.), all of which are polypeptides of 60 amino acid residues reinforced by 4 disulfide bridges (8 highly conserved cysteine residues) (Figure 3). The sequence of Lh_FTX13, on the other hand, matched identically to the long neurotoxin 2 (UniProt: A3FM53), which was cloned from the same Chinese specimen. Lh_FTX013 showed conserved cysteine residues and disulfide bridges as with the long neurotoxin sequences of other comparing elapid species, although long neurotoxin (LNTX) sequences, in general, were more variable in amino acid composition. Notably, the LNTX of L. colubrina (P0C8R6) has only four disulfide bridges instead of five [31]. Comparing to Lh_FTX013 and other related LNTX sequences, P0C8R6 lacks the additional fifth disulfide bridge with amino acid mutation at residue-26 (C→D) and residue-30 (C→G), although the mutation did not appear to compromise its neurotoxicity and lethality [32]. (C→G), although the mutation did not appear to compromise its neurotoxicity and lethality [32].

Figure 3.
Multiple sequence alignment of the major short neurotoxins (SNTX) and long neurotoxins (LNTX) from Hydrophis curtus (Penang) and related representative species of sea snakes, sea kraits and cobras. All SNTX shared highly conserved cysteine residues and four disulphide bridges that reinforce the three-finger structure of the molecule. LNTX has an additional fifth disulphide bond in the second loop, except Ls3 from Laticauda colubrina. Figure 4 illustrates the phylogenetic tree of the major neurotoxins from H. curtus (Penang) and representative species of sea snakes, sea kraits and cobras. The SNTX and LNTX groups formed two distinct paraphyletic clades. The SNTX of sea snakes (Hydrophis spp.) and cobras (Naja spp.) appeared to be related to a recent ancestral protein that shared a common node with SNTX from the sea kraits (Laticauda spp.). Within the sea snake SNTX, Lh_FTX001 from H. curtus (Penang) is closely related to the more basal SNTX of H. schistosus and the further derived SN160 (H. curtus, Beihai) and Short Neurotoxin 1 (H. cyanocinctus) but the branch lengths were short and this implied little genetic differences. On the other hand, the LNTX sequences of Lh_FTX013 (Penang) and A3FM53 from H. curtus were identical, while there is no LNTX sequence of H. schistosus available for comparison.
The close phylogenetic relationship among the SNTX and LNTX of sea snakes, sea kraits and cobras support the wide cross-reactivity of Sea Snake Antivenom (SSAV) [33,34], which is the only specific antivenom indicated for the treatment of sea snake envenomation. SSAV is raised against the beaked sea snake (H. schistosus, Penang) specifically but studies have extensively demonstrated that it could effectively cross-neutralize the toxicity of most other marine elapids of various Hydrophis spp. (including H. platurus and its most important neurotoxin), sea kraits (Laticauda spp.) and related principal toxicity [13,14,32,33]. The extensive cross-reactivity of SSAV is indicative of substantially conserved antigenicity in the SNTX and LNTX, respectively. The antigenicity of SNTX and LNTX, however, may possibly vary in view of the more variable amino acid compositions and the further relatedness between the two toxin groups. Multiple sequence alignment of the major short neurotoxins (SNTX) and long neurotoxins (LNTX) from Hydrophis curtus (Penang) and related representative species of sea snakes, sea kraits and cobras. All SNTX shared highly conserved cysteine residues and four disulphide bridges that reinforce the three-finger structure of the molecule. LNTX has an additional fifth disulphide bond in the second loop, except Ls3 from Laticauda colubrina. Figure 4 illustrates the phylogenetic tree of the major neurotoxins from H. curtus (Penang) and representative species of sea snakes, sea kraits and cobras. The SNTX and LNTX groups formed two distinct paraphyletic clades. The SNTX of sea snakes (Hydrophis spp.) and cobras (Naja spp.) appeared to be related to a recent ancestral protein that shared a common node with SNTX from the sea kraits (Laticauda spp.). Within the sea snake SNTX, Lh_FTX001 from H. curtus (Penang) is closely related to the more basal SNTX of H. schistosus and the further derived SN160 (H. curtus, Beihai) and Short Neurotoxin 1 (H. cyanocinctus) but the branch lengths were short and this implied little genetic differences. On the other hand, the LNTX sequences of Lh_FTX013 (Penang) and A3FM53 from H. curtus were identical, while there is no LNTX sequence of H. schistosus available for comparison.
The close phylogenetic relationship among the SNTX and LNTX of sea snakes, sea kraits and cobras support the wide cross-reactivity of Sea Snake Antivenom (SSAV) [33,34], which is the only specific antivenom indicated for the treatment of sea snake envenomation. SSAV is raised against the beaked sea snake (H. schistosus, Penang) specifically but studies have extensively demonstrated that it could effectively cross-neutralize the toxicity of most other marine elapids of various Hydrophis spp. (including H. platurus and its most important neurotoxin), sea kraits (Laticauda spp.) and related principal toxicity [13,14,32,33]. The extensive cross-reactivity of SSAV is indicative of substantially conserved antigenicity in the SNTX and LNTX, respectively. The antigenicity of SNTX and LNTX, however, may possibly vary in view of the more variable amino acid compositions and the further relatedness between the two toxin groups. Figure 4. A phylogenetic tree of short and long neurotoxins from representative species of sea snakes, sea kraits and cobras. Stars indicated the major short and long neurotoxins derived from de novo venom-gland transcriptome of Hydrophis curtus, Penang. Tree was constructed with PAM model of Dayhoff and bootstrapping was performed with 1000 replicates on MEGA Version X [35]. Numbers indicate branch support values. Red/blue stars indicate the specimen studied in this work.

Clinical Relevance and Antigencity of Three-Finger Toxins
The dominant 3FTX expressed, Lh-FTX13 and Lh_FTX01, were corresponding to the major SNTX and LNTX reported in the venom proteome of H. curtus [13]. Both the SNTX and LNTX of H. curtus are highly lethal (LD50 = 0.10 µg/g and 0.24 µg/g, respectively) and contributing to the neurotoxicity and lethality of the venom (LD50 = 0.20 µg/g). In the current transcriptomic study, Lh_FTX001 (SNTX) has a higher relative abundance compared to Lh_FTX013 (LNTX), in agreement with the proteome reported in which SNTX was more abundantly present than LNTX in the venom. The SNTX-predominating venom phenotype is common in several other sea snake species besides H. curtus (Penang), including the congeneric H. schistosus [36], H. platura [14], H. cyanocinctus [16] and the paraphyletic Aispyrus laevus [37]. The ecological role of SNTX and LNTX in the venom is associated with predatory function, whereby the venom composition is streamlined to incapacitate the fast-moving teleost-based prey (fishes). In envenomation, these are the toxins that block post-junctional nicotinic receptors, resulting in neuromuscular paralysis, respiratory failure and death [38]. Ergo, the treatment outcome of envenomation is principally determined by the antivenom efficacy in neutralizing the principal toxins of the venom. It has been shown that SNTX are more reversible than LNTX in the binding of nicotinic receptors (nAChR), notwithstanding the fact that they are less effectively neutralized by antivenoms [32,34,39,40]. From the immunological perspective, it is possible that LNTX and SNTX vary in their antigenicity, hence the discrepancy in immunorecognition and efficacy of antivenom. Figure 5 shows the predicted antigenicity of alpha-neurotoxin proteins from H. curtus (SNTX and LNTX), H. schistosus (SNTX) and L. colubrina (LNTX). SNTX of H. curtus and H. schistosus have, respectively, 3 prominent epitopes with antigenicity scores beyond 1.10. All three antigenic peptide segments of the two Hydrophis sea snakes comprise residues across 19-25, 38-47 and 49-56, with each antigenic pair sharing highly conserved amino acid residues. The epitope prediction suggested that SNTX of Figure 4. A phylogenetic tree of short and long neurotoxins from representative species of sea snakes, sea kraits and cobras. Stars indicated the major short and long neurotoxins derived from de novo venom-gland transcriptome of Hydrophis curtus, Penang. Tree was constructed with PAM model of Dayhoff and bootstrapping was performed with 1000 replicates on MEGA Version X [35]. Numbers indicate branch support values. Red/blue stars indicate the specimen studied in this work.

Clinical Relevance and Antigencity of Three-Finger Toxins
The dominant 3FTX expressed, Lh-FTX13 and Lh_FTX01, were corresponding to the major SNTX and LNTX reported in the venom proteome of H. curtus [13]. Both the SNTX and LNTX of H. curtus are highly lethal (LD 50 = 0.10 µg/g and 0.24 µg/g, respectively) and contributing to the neurotoxicity and lethality of the venom (LD 50 = 0.20 µg/g). In the current transcriptomic study, Lh_FTX001 (SNTX) has a higher relative abundance compared to Lh_FTX013 (LNTX), in agreement with the proteome reported in which SNTX was more abundantly present than LNTX in the venom. The SNTX-predominating venom phenotype is common in several other sea snake species besides H. curtus (Penang), including the congeneric H. schistosus [36], H. platura [14], H. cyanocinctus [16] and the paraphyletic Aispyrus laevus [37]. The ecological role of SNTX and LNTX in the venom is associated with predatory function, whereby the venom composition is streamlined to incapacitate the fast-moving teleost-based prey (fishes). In envenomation, these are the toxins that block post-junctional nicotinic receptors, resulting in neuromuscular paralysis, respiratory failure and death [38]. Ergo, the treatment outcome of envenomation is principally determined by the antivenom efficacy in neutralizing the principal toxins of the venom. It has been shown that SNTX are more reversible than LNTX in the binding of nicotinic receptors (nAChR), notwithstanding the fact that they are less effectively neutralized by antivenoms [32,34,39,40]. From the immunological perspective, it is possible that LNTX and SNTX vary in their antigenicity, hence the discrepancy in immunorecognition and efficacy of antivenom. Figure 5 shows the predicted antigenicity of alpha-neurotoxin proteins from H. curtus (SNTX and LNTX), H. schistosus (SNTX) and L. colubrina (LNTX). SNTX of H. curtus and H. schistosus have, respectively, 3 prominent epitopes with antigenicity scores beyond 1.10. All three antigenic peptide segments of the two Hydrophis sea snakes comprise residues across 19-25, 38-47 and 49-56, with each antigenic pair sharing highly conserved amino acid residues. The epitope prediction suggested that SNTX of H. schistosus, the species whose venom is used in raising Sea Snake Antivenom, is antigenic to produce antibodies that should be equally effective in cross-neutralizing the SNTX of H. curtus. This is in line with the reported neutralization potency of Sea Snake Antivenom against the neurotoxins of H. schistosus and H. curtus at 0.35 mg/mL and 0.34 mg/mL, respectively. On the other hand, the LNTX of H. curtus exhibited two epitopes (residues 17-24 and 43-49), while the LNTX sequence of H. schistosus is not available from the database for comparison. We predicted that the LNTX of H. schistosus should share similar epitopes with the LNTX of H. curtus, since the Sea Snake Antivenom could effectively cross-neutralize the H. curtus LNTX (potency = 0.78 mg/mL), albeit less potent than it was against that of H. schistosus (potency = 1.38 mg/mL) [13,34]. Interestingly, the SNTX and LNTX do not seem to share much common epitopes, implying limited synergistic cross-reactivity that can be resulted from one antibody toward both toxins. Furthermore, the neutralization potency of antivenom against SNTX is generally lower than LNTX, despite the presence of prominent epitopes in the SNTX protein. Hence, antivenom manufacturers should ensure that the product contains adequate antibodies that are sufficiently immunoreactive toward both types of neurotoxins, so that the reversal of neurotoxicity caused by either SNTX or LNTX can be effective. The production of antivenom toward specific toxin targets can be improved through recent innovations of recombinant technologies [41,42] and reformulation of specific toxin-targeting antivenom [43,44] to achieve higher potency against the different toxins.

Phospholipases A2
Phospholipase A2 transcripts represented the second most abundantly expressed toxin genes in H. curtus venom-gland transcriptome. The major transcript coding for

Phospholipases A 2
Phospholipase A 2 transcripts represented the second most abundantly expressed toxin genes in H. curtus venom-gland transcriptome. The major transcript coding for PLA 2 , that is, Lh_PLA01 contains a full sequence of 118 amino acid residues and was annotated to the basic PLA 2 73 (Q8UW30) from Hardwick's sea snake (of unknown locale, possibly from southern China), based on 92% sequence similarity. Lh_PLA01 belongs to Group IA PLA 2 and is a D49 subtype of snake venom PLA 2 . It has a conserved Ca 2+ binding loop that lies between residues 25 and 33 (consensus sequence: Y25-G-C-Y/F-C-G-X-G-G33) and His48 as well as Asp49 which are critical for enzymatic activity [46] ( Figure 6). High sequence similarity was also observed when comparing Lh_PLA01 with the basic PLA 2 of H. schistosus (P00610) (Figure 6), a highly lethal myotoxin that causes systemic myotoxicity and renal failure secondary to rhabdomyolysis [47]. Unlike the myotoxic PLA 2 of H. schistosus, the major enzymatic PLA 2 of H. curtus was found to be non-lethal in mice [13,36]. The finding implied that Lh_PLA01 has a variable sequence that probably does not contribute to toxicity, or, it requires the presence of subunit to form PLA 2 complex in order to produce toxic activity. Lind and Eaker [48] pointed out that in toxic elapid PLA 2 s that act in monomeric form, such as the myotoxin from H. schcistosus, notexin and notechis-II5 (both are neurotoxic systemically while myotoxic locally) from Notechis scutatus, have a unique Lys-Lys-Lys sequence at positions 82-84 ( Figure 6) not shared by beta-bungarotoxin PLA 2 chain and most other non-myotoxic PLA 2 variants. This sequence thus could be important for basic PLA 2 to exert myotoxic and/or neurotoxic activity in monomeric form. Lh-PLA01 lacks this feature: the positively charged Lys82 was substituted with the neutral Thr82 ( Figure 6) and the mutation probably has modified the characteristic cationic site crucial for the myotoxic activity of monomeric PLA 2 [46]. More extensive sequence comparison in conjunction with chemical modification studies should clarify the phenomenon.
bungarotoxin PLA2 chain and most other non-myotoxic PLA2 variants. This sequence thus could be important for basic PLA2 to exert myotoxic and/or neurotoxic activity in monomeric form. Lh-PLA01 lacks this feature: the positively charged Lys82 was substituted with the neutral Thr82 ( Figure 6) and the mutation probably has modified the characteristic cationic site crucial for the myotoxic activity of monomeric PLA2 [46]. More extensive sequence comparison in conjunction with chemical modification studies should clarify the phenomenon. Figure 6. Multiple sequence alignment of the major phospholipase A2 (PLA2, Lh_PLA01) of Hydrophis curtus (Penang) and related sequences. The PLA2 shared highly conserved cysteine residues and seven disulfide bridges. Blue lines: conservative disulfide bonds; black lines: additional disulfide bond for Group IA PLA2.
In the present study, the transcript expression level of PLA2 (~20% of total toxin FPKM) was lower than the protein abundance of PLA2 reported in proteomics (50-70% of the total venom proteins) [8,13]. The discrepancy could be due to the fact that the mRNAs of various toxins were synthesized at different rates over days and weeks, while the venom-gland tissue was harvested at a certain time point, typically a few days after venom milking. Moreover, it is reasonable to think that the diverse mRNAs had varying half-lives and were subjected to complex regulation processes like post-transcriptional and post-translational modifications [49] which further modulated the maturation of the proteins in the final venom product. The lack of correlation between venom gene expression levels and protein abundances has also been observed in several previous studies [22][23][24][25]50], presumably due to the reason(s) above.

Conclusions
The venom-gland transcriptome of H. curtus from the Peninsula of Malaysia was de novo assembled, unveiling the diversity of venom genes in this species. Three-finger toxins constituted the major genes expressed in the venom glands, with SNTX and LNTX being the most abundant, consistent with their role as the principal toxins implicated in the pathophysiology of snakebite envenomation. The findings enriched the toxin knowledgebase of sea snakes and shed light on the medical importance of the venom. In the present study, the transcript expression level of PLA 2 (~20% of total toxin FPKM) was lower than the protein abundance of PLA 2 reported in proteomics (50-70% of the total venom proteins) [12,13]. The discrepancy could be due to the fact that the mRNAs of various toxins were synthesized at different rates over days and weeks, while the venomgland tissue was harvested at a certain time point, typically a few days after venom milking. Moreover, it is reasonable to think that the diverse mRNAs had varying half-lives and were subjected to complex regulation processes like post-transcriptional and post-translational modifications [49] which further modulated the maturation of the proteins in the final venom product. The lack of correlation between venom gene expression levels and protein abundances has also been observed in several previous studies [22][23][24][25]50], presumably due to the reason(s) above.

Conclusions
The venom-gland transcriptome of H. curtus from the Peninsula of Malaysia was de novo assembled, unveiling the diversity of venom genes in this species. Three-finger toxins constituted the major genes expressed in the venom glands, with SNTX and LNTX being the most abundant, consistent with their role as the principal toxins implicated in the pathophysiology of snakebite envenomation. The findings enriched the toxin knowledgebase of sea snakes and shed light on the medical importance of the venom.

Preparation of Snake Venom-Gland Tissue
The sea snake, H. curtus was an adult specimen from the northern waters of Penang Island west of Peninsular Malaysia. The venom was milked four days prior to venom gland tissue collection to promote transcription [51]. The venom glands were collected following euthanasia and sectioned into dimensions of 5 × 5 mm. The sectioned tissue was immersed in RNAlater ® solution (Ambion, TX, USA) at 4 • C overnight and stored at −80 • C until further use. The study was carried out in line with protocols approved by the Institutional Animal Use and Care Committee (IACUC) of University of Malaya, Malaysia (Approval code: #2013-11-12/PHAR/R/TCH).

RNA Extraction and Purification
The venom-gland tissue was homogenized in a 1 mL glass homogenizer with TRIzol solution (Invitrogen, Carlsbad, CA, USA). This was followed by the isolation using chloroform and treated with RNA-free DNAase I (Thermo Fisher Scientific, Waltham, MA, USA), to separate cellular debris and residual DNA. The isolated RNA was then purified via isopropyl alcohol ethanol precipitation. Polyadenylated mRNA was subsequently purified with oligo(dT) magnetic beads (Illumina TruSeq Stranded mRNA) (Illumina, San Diego, CA, USA) as per manufacturer's instructions. The quality of the purified total RNA was assessed using Agilent 2100 Bioanalyzer (RNA 6000 NanoKit) (Agilent Technologies, Waldbronn, Germany).
Enriched poly(A) + mRNA isolated from the total venom-gland RNA was used for cDNA construction. The isolated mRNA was fragmented into short fragments, which acted as templates for cDNA synthesis [52]. Random hexamer-primer (N6) was used to synthesis the first-stranded cDNA, followed by second-strand cDNA synthesis with the double-stranded cDNA as input materials, using second strand buffers, dNTPs, RNase H and DNA polymerase I. From these cDNA, a paired-end library was synthesized using the Genomic Sample Prep kit (Illumina, San Diego, CA, USA), according to the manufacturer's instructions. The cDNA fragments generated were purified with QIAquick PCR extraction kit (Qiagen, Valencia, CA, USA) and dissolved in elution buffer for end repair and the addition of poly(A) to aid in the subsequent ligation of Illumina adaptors that contain a single thymine (T) base overhang at the 3 ends. Following the ligation, these cDNA fragments were amplified via polymerase chain reaction (PCR) electrophoresed on a 1.5-2% TAE (Tris base, acetic acid and EDTA) agarose gel. From the gel, suitable fragments (200-700 bp) were selected as templates for subsequent PCR amplification. Sequencing of the amplified samples library was achieved in a single lane on the Illumina HiSeq™ 2000 platform (Illumina, San Diego, CA, USA)) with 100-base-pair, paired-end reads.

Filtration of Raw Sequenced Reads
Sequenced data generated from Illumina HiSeq™ 2000 platform were transformed by base calling into sequence data, called the raw reads and stored in a FASTQ format. Prior to transcriptome assembly, raw reads were filtered to generate clean reads as part of the quality control process in the pre-analysis stage [53]. This involved the removal of (i) adaptors; (ii) reads with >5% of unknown nucleotides or (iii) low-quality reads with >20% of low-quality bases (determined as base quality < 10).

De Novo Transcriptome Assembly
The de novo transcriptome assembly was performed using a short-reads assembly program, Trinity (version 2.0.6) [54,55]. Three independent software modules, that is, Inchworm, Chrysalis and Butterfly, comprised the Trinity program were sequentially applied to process the large volumes of RNA-seq reads. In brief, this was based on the algorithm of de Bruijn graphs construction, which began by aligning k-mers (k = 25) and reads with a certain length of overlap were joined to form linear contigs. The reads were mapped back onto contigs and by referring to paired-end reads, contigs from the same transcript, as well as the distances between them were determined. The contigs were then partitioned into clusters, each of which carried a complete set of de Bruijn graphs (representing the transcriptional complexity at a given gene or locus). The graphs were independently processed to obtained full-length transcripts for alternatively spliced isoforms and to tease apart transcripts that corresponded to paralogous genes. The clean read Q20 percentage, a point of reference for quality control assessment was obtained as a benchmark for successful de novo assembly of the transcriptome.

Clustering and Functional Annotation of Transcripts
The transcript sequences generated through Trinity were called Unigenes. Unigenes from the transcriptome assembly were further processed for sequence splicing and redundancy removal with TGI clustering tools (TGICL, version 2.1) to acquire non-redundant (NR) transcripts at the longest possible length [56]. The transcripts were then subjected to family clustering, which resulted in two classes of transcripts: (a) clusters, with a prefix CL and the cluster ID behind as contig; (b) singletons, whose ID was simply left with a prefix of Unigene. In each cluster, there were several transcripts with sequence similarities among them being >70%; while the singletons 'Unigenes' lack overlapping with other fragments at the given stringency. The value 70% was used to categorize the assembled sequences based on similarity; sequences similar to each other (may or may not be homologous as having >90% similarity) were grouped under a cluster comprising various contigs.
Following this, transcript Unigenes were then aligned with BLASTx to protein database in priority order to NCBI non-redundant database (NR), with a cut-off value of E < 10 −5 . Proteins with the highest ranks in the BLASTx results were referred to determine the coding region sequences of Unigenes, followed by translation into amino acid sequences (using standard codon table). Hence, both nucleotide sequences (5 to 3 ) and amino acid sequences of the Unigene-coding regions was acquired. To remove redundancy from each cluster, the longest sequence in each cluster was chosen as the transcript, meanwhile, the length of scaffold was extended based on overlapping sequences using Phrap assembler (release 23.0) (http://www.phrap.org). The distributions of the length of contigs, scaffolds and Unigenes were calculated and the N50 length (assembly quality indicator) was set at N50 > 500 for assembly success.
Fragments per kilobase of exon model per million reads mapped (FPKM) were used to determine the transcript abundance for the identified genes [59]. FPKM is the summation of normalized read counts based on gene length and the total number of mapped reads. The data was obtained using RSEM tool in conjunction with Trinity based on a computational formula: FPKM o f gene A = 10 6 B NC/1000 FPKM is the expression of gene A; B is the number of fragments/reads which are aligned to gene A; N is the total number of fragments/reads that are aligned to all genes; C is the base number in the coding sequence of gene A.

Categorization of Transcripts
The de novo assembled transcripts were subjected to BLASTx search to obtain the closest resembling sequences from the NR protein database for further classification based on functional annotations. The transcripts (Unigenes) were then sifted to remove those with an FPKM value of less than 1, followed by categorization into three groups: "toxins," "non-toxins" and "unidentified" [21,23]. "Toxin" transcripts were recruited by toxin-related keyword searches against the annotated transcripts. "Non-toxin" and "unidentified" groups contain transcripts of cellular proteins or house-keeping genes and transcripts that could not be identified, respectively. The redundancy of gene expression was determined by dividing the total FPKM of each group by the total number of transcripts in the respective group of transcripts [21]. In the toxin group, the amino acid sequences were used to further validate the toxin identity through BLASTp suite (Basic Local Alignment Search Tool-Protein) in the UniProt (Universal Protein Resource Knowledgebase) database platform. The transcripts were searched against Serpentes database (taxid: 8570) and validated based on the lowest E-score value with the highest percentage of sequence similarity (updated as of 29 June 2020).

Multiple Sequence Alignment
Multiple sequence alignment was conducted using Jalview software (version 2.11.1.0) [60] and MUSCLE (Multiple Sequence Comparison by Log-Expectation) [61] program. Sequences of related species used in multiple sequence alignment were retrieved from UniProtKB depository (accessed date: 14 September 2020) (http://www.uniprot.org/). The selection was based on their relevance to the toxins in comparison to elucidate the similarity and variation as well as conserved regions of the sequences.

Phylogenetic Analysis
Sequences of long and short neurotoxins annotated for H. curtus of Penang (this work) and representative species of sea snakes, sea kraits and cobras (retrieved from Universal Protein Knowledgebase, UniProtKB, http://www.uniprot.org/) (accessed date: 14 September 2020) were used to construct the phylogenetic tree. The tree was constructed with Molecular Evolutionary Genetics Analysis (MEGA) Version X [35] applying the Dayhoff (PAM) substitution model (+G) [62]. Bootstrap test (1000 replicates) was computed for the confidence limits of the constructed phylogenetic tree [63].

Scale-Based B-Cell Epitope Prediction
The antigenic determinants (epitopes) of toxins were predicted using a scale-based Bcell epitope prediction software applying Kolaskar and Tongaonkar antigenicity prediction algorithm (http://tools.immuneepitope.org) (accessed on 27 November 2020) [45]. Default parameters and window size 7 were used in the analysis for predicting potentially antigenic regions of amino acids in the sequences

Supporting Data
Sequencing data from the de novo venom-gland transcriptomics of H. curtus was deposited in National Centre for Biotechnology Information (NCBI) Sequence Read Archive (https://submit.ncbi.nlm.nih.gov/subs/sra/) (submitted on 29 December 2020) under SRA accession: PRJNA688573.  Data Availability Statement: Sequencing data from the de novo venom-gland transcriptomics of H. curtus was deposited in National Centre for Biotechnology Information (NCBI) Sequence Read Archive (https://submit.ncbi.nlm.nih.gov/subs/sra/) under SRA accession: PRJNA688573.