Gene Sequences of Potential Targets of Insecticidal PF2 Lectin Identified from the Larval De Novo Transcriptome of the Mexican Bean Weevil (Zabrotes Subfasciatus; Boheman 1833)

Simple Summary The Mexican bean weevil Zabrotes subfasciatus is a major insect pest of stored beans. We have previously reported that the PF2 lectin, which is a protein found in the desert wild legume Olneya tesota (Palo Fierro), is toxic to Z. subfasciatus by inhibiting its early larval development. The use of proteomic means allowed us to identify PF2 targets in the midgut of Z. subfasciatus larvae. However, efforts to completely elucidate the insecticidal mechanism of PF2, as well as novel potential targets for insecticidal compounds, have been hindered by the lack of available genomic and proteomic information of non-model insects. Therefore, in this work we massively sequenced and analyzed the transcripts expressed in the larval stage of Z. subfasciatus, which is the first transcriptome reported for this insect. A total of 29,029 transcript sequences were identified, of which 30 sequences encode putative targets of PF2. The functional characteristics and biochemical, biological, or molecular roles for 15,124 sequences were established by means of bioinformatics tools. This study significantly increased the available genetic resources for Zabrotes and related insect species and will be helpful for any kind of future study that requires information on genes or protein sequences. Abstract The available genomic and proteomic information of non-model organisms is often underrepresented in public databases hindering their study at molecular, cellular, and physiological levels. Information on Zabrotes subfasciatus (Mexican bean weevil) is poorly represented in databases, yet it is a major pest of common beans. We report the transcriptome of Z. subfasciatus larvae; transcripts were sequenced using an Illumina RNA-Seq technology and assembled de novo identifying 29,029 unigenes with an average size of 1168 bp and an N50 value of 2196 bp. About 15,124 unigenes (52%) were functionally annotated and categorized. Further analysis revealed 30 unigene sequences encoding putative targets of the insecticidal PF2 lectin. The complete deduced amino acid sequences of eight selected proteins potentially related to insecticidal mechanism of Palo Fierro 2 (PF2) were used for predicting probable N-glycosylation sites and analyzing phylogenetic relationships with insect sequences. This work provides a dramatic increase in the genetic resources available for Coleopterans and set the basis for developing future studies on biological aspects and potential control strategies for Z. subfasciatus.


Unigene Annotation and Classification
The functional annotation of assembled unigenes was performed comparing the sequences with those of protein databases including NCBI non-redundant (NR, https://www.ncbi.nlm.nih. gov/), RefSeq (https://www.ncbi.nlm.nih.gov/refseq/) and SwissProt (https://www.uniprot.org/) using blastx (E-value < 1 −10 for NCBI nr and an E-value < 1 × 10 −5 for the other databases). Additionally, we compared the assembled unigenes of Z. subfasciatus with the predicted proteins of phylogenetically-related organisms including, Anoplophora glabripennis (Asian long-horned beetle; [20]), Dendroctonus ponderosae (Mountain pine beetle; [21]), Leptinotarsa decemlineata (Colorado potato beetle; [22]) and Tribolium castaneum (Red flour beetle; [23]) using blastx with an E-value < 1 × 10 −5 . In all blast searches, the best hits were chosen to annotate the unigene sequences, and in the case of inconsistencies, annotation was prioritized in the following order: NR (non-redundant), SwissProt, RefSeq. We also used InterPro databases to identify the protein domains. Functional categorization of unigenes based on Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) were carried out using the BLAST2GO suite [24] using the GO-Slim mode to eliminate redundancy of GO terms and the KEGG Automatic Annotation Server (KAAS, https://www.genome.jp/kegg/kaas/) using the single-directional best hit mode.

Identification and Phylogenetic Analysis of Potential Targets of PF2 Lectin
According to the functional annotation, we identified the putative sequences coding for amylase, imaginal disc growth factor 4 precursor, mitochondrial-processing peptidase subunit beta, NADH-ubiquinone oxidoreductase subunit mitochondrial, prohibitin, and V-type proton ATPase. The unigene sequences selected were translated using the Open Reading Frame Finder tool of NCBI (https://www.ncbi.nlm.nih.gov/orffinder/). We searched the characteristic functional domains of these proteins and made pairwise alignments with proteins of known annotation from related species. The resulting annotations were manually verified and sequences with inconsistencies were discarded.
Unigene sequences identified for each protein were aligned using the Clustal Omega tool; redundant sequences or those with high homology (>95% identity) were eliminated from the analysis to avoid identical transcripts.
Using the deduced amino acid sequences from each identified unigene, we predicted putative N-glycosylation sites (NetNGlyc 1.0 Server at http://www.cbs.dtu.dk/services/NetNGlyc/) and analyzed their phylogenetic relationships to similar proteins of related organisms. A blastp analysis (E-value < 1 −10 , identity >30% and Query coverage >50%) was performed to find homologous sequences from other insects using as query the amino acid sequences deduced from assembled unigenes. Phylogenetic trees for each identified protein were constructed using the MEGAX software (Sudhir Kumar, AZ, USA) and the neighbor-joining method with bootstrap of 1000 replications.

Sequence Analysis and De Novo Assembly
The sequences from Z. subfasciatus larval total RNA yielded a total of 92 million paired-end reads with an average length of 100 bp. Digital normalization [16], resulted in about 11 million paired-end reads and 1.7 million single reads. These reads were used to perform the de novo assembly of the Z. subfasciatus transcriptome [18].
Assembly generated a total of 29,029 transcripts (unigenes) with an average size of 1168 bp and an N50 value of 2196 bp that represented an assembled transcriptome of 33.9 Mbp with a length range of sequences from 200 to 15,454 bp ( Table 1). The length distribution of assembled sequences showed that most of the unigenes were 200 and 800 bp; however, about 10,300 unigenes were more than 1000 bp in length ( Figure 1A) suggesting they could correspond to full length transcripts. Further, transcriptome analysis using the BUSCO v3 tool with the Insecta database (n = 1658 BUSCO orthologs genes) indicated that 87.7% (1454) were complete; of these 80.3% (1332) were complete single-copy and 7.4% (122) were duplicates, 8.5% (141) were fragmented and only 3.8% (63) were missing ( Figure 1B). A high BUSCO score is associated with a high-quality transcriptome assembly.

Unigene Annotation
To identify the functions of the assembled unigenes, we compared the sequences with publicly available protein databases including, non-redundant (NR) NCBI, RefSeq and SwissProt. About 52% (15,124) of the translated sequences of 29,029 unigenes matched at least one predicted protein. The greatest number of database hits were found in RefSeq, 51.9% (15,079 unigenes), followed by NR NCBI 48.2% (14,003 unigenes) and SwissProt 36.7% (10,643 unigenes) ( Table S1). The BLAST Top-hits species analysis (Figure 2A) shows the greatest number of matches is with the predicted proteins of A. glabripennis (63.5%), followed by T. castaneum (8.3%), Aethina tumida (small hive beetle; 6.6%) and D. ponderosae (4.3%). Further comparative analyses using blastx tool (E-value < 1e-5) with proteins of the A. glabripennis (Ag), D. ponderosae (Dp), L. decemlineata (Ld) and T. castaneum (Tc) genomes shows that the greatest number of putative orthologs of Z. subfasciatus were obtained from the comparison with Ag (14,394 unigenes) and Ld (13,980 unigenes). A total of 12,236 sequences (42.1%) from Zabrotes shared putative proteins with the four insect species analyzed, while some unigenes matched only with Ag (286), Ld (234), Tc (90) or Dp, (82) ( Figure 2B). Overall, 52% of the unigenes (15,079) found a match in these protein databases, however 13,950 unigenes (48%) had no hits in the blastx searches . Analysis of transcriptome completeness using Benchmarking Universal Single-Copy Orthologs (BUSCO) software. Pie charts show the percentage of the 1658 single-copy Insecta ortholog genes database (OrthoDB v9) covered by the assembly: C, complete genes found; SC, complete genes found in single-copy; CD, complete genes found in duplicate; F, fragmented genes found; M, missing ortholog genes.

Unigene Annotation
To identify the functions of the assembled unigenes, we compared the sequences with publicly available protein databases including, non-redundant (NR) NCBI, RefSeq and SwissProt. About 52% (15,124) of the translated sequences of 29,029 unigenes matched at least one predicted protein.
The greatest number of database hits were found in RefSeq, 51.9% (15,079 unigenes), followed by NR NCBI 48.2% (14,003 unigenes) and SwissProt 36.7% (10,643 unigenes) ( Table S1). The BLAST Top-hits species analysis (Figure 2A) shows the greatest number of matches is with the predicted proteins of A. glabripennis (63.5%), followed by T. castaneum (8.3%), Aethina tumida (small hive beetle; 6.6%) and D. ponderosae (4.3%). Further comparative analyses using blastx tool (E-value < 1 × 10 −5 ) with proteins of the A. glabripennis (Ag), D. ponderosae (Dp), L. decemlineata (Ld) and T. castaneum (Tc) genomes shows that the greatest number of putative orthologs of Z. subfasciatus were obtained from the comparison with Ag (14,394 unigenes) and Ld (13,980 unigenes). A total of 12,236 sequences (42.1%) from Zabrotes shared putative proteins with the four insect species analyzed, while some unigenes matched only with Ag (286), Ld (234), Tc (90) or Dp, (82) ( Figure 2B). Overall, 52% of the unigenes (15,079) found a match in these protein databases, however 13,950 unigenes (48%) had no hits in the blastx searches with the four species analyzed. This suggests that these sequences could represent non-conserved regions or unique proteins of Z. subfasciatus. with the four species analyzed. This suggests that these sequences could represent non-conserved regions or unique proteins of Z. subfasciatus.

Protein Domains Annotation
To complete the functional annotation, we identified conserved domains using the InterPro database. Our analysis showed that 15,830 unigenes (54.5%, Table S1) were annotated and classified in 4640 InterPro families and 2748 InterPro domains; a list of the 30 InterPro domains with the greatest representation is shown in

Protein Domains Annotation
To complete the functional annotation, we identified conserved domains using the InterPro database. Our analysis showed that 15,830 unigenes (54.5%, Table S1) were annotated and classified in 4640 InterPro families and 2748 InterPro domains; a list of the 30 InterPro domains with the greatest representation is shown in

Functional Categorization and Metabolic Pathway Annotation
In order to obtain the functional categorization of the transcriptome, we compared the unigenes with the GO database using Blast2GO; the GO terms identified were mapped to obtain the GO-slim terms. A total of 11,741 unigenes (40.4%, Table S1) were annotated to at least one GO-slim term, and these were classified by the GO functional categories (Biological Process, Molecular Functions, and Cellular Component). 8743 unigenes were assigned to 39 molecular functions, 5649 unigenes to 66 biological process and 5163 unigenes to 29 cellular components; GO terms with less than 5 sequences assigned were discarded (File S1). Subsequently, the levels for GO terms based on the annotation coverage and specificity were analyzed; level 3 provided us the best specificity for greatest coverage. Level 3 GO terms distribution analysis revealed that most represented biological processes were: cellular metabolic process, nitrogen compound metabolic process, organic substance metabolic process, primary metabolic process, and biosynthetic process. The most represented molecular functions were ion binding, hydrolase activity, transferase activity, heterocyclic compound binding and oxidoreductase activity. While the top five cellular components identified were: intracellular, intracellular part, intracellular organelle, membrane-bounded organelle, and non-membrane-bounded organelle part ( Figure 3).
The identification of biological pathways represented in the Z. subfasciatus transcriptome was performed using the KEGG database. All the unigene sequences were compared with those of the KEGG database using the KEGG KAAS. This analysis allowed us to annotate a total of 8195 unigenes (28.2%) into five KEGG categories including: metabolism, genetic information processing, environmental information processing, cellular process and organismal systems. Distribution analysis showed that the most represented category was metabolism, where 268 unigenes were assigned to the carbohydrate metabolism pathway, 227 to amino acid metabolism, 169 to lipid metabolism, 140 to glycan biosynthesis and metabolism, and 92 to energy metabolism among others ( Figure 4). In addition, as shown in Figure 4, a large number of unigenes were assigned to signal transduction (1195 unigenes), immune system (402), translation (361), transport and catabolism (360), cell growth and death (356), endocrine system (317), folding, sorting and degradation (310), environmental adaptation (176) and digestive system (151).
process, primary metabolic process, and biosynthetic process. The most represented molecular functions were ion binding, hydrolase activity, transferase activity, heterocyclic compound binding and oxidoreductase activity. While the top five cellular components identified were: intracellular, intracellular part, intracellular organelle, membrane-bounded organelle, and non-membranebounded organelle part ( Figure 3).  The identification of biological pathways represented in the Z. subfasciatus transcriptome was performed using the KEGG database. All the unigene sequences were compared with those of the KEGG database using the KEGG KAAS. This analysis allowed us to annotate a total of 8195 unigenes (28.2%) into five KEGG categories including: metabolism, genetic information processing, environmental information processing, cellular process and organismal systems. Distribution analysis showed that the most represented category was metabolism, where 268 unigenes were assigned to the carbohydrate metabolism pathway, 227 to amino acid metabolism, 169 to lipid metabolism, 140 to glycan biosynthesis and metabolism, and 92 to energy metabolism among others ( Figure 4). In addition, as shown in Figure 4, a large number of unigenes were assigned to signal transduction (1195 unigenes), immune system (402), translation (361), transport and catabolism (360), cell growth and death (356), endocrine system (317), folding, sorting and degradation (310), environmental adaptation (176) and digestive system (151).

Identification and Phylogenetic Analysis of Potential PF2 Lectin Target Sequences
We previously reported that PF2 lectin is toxic to Z. subfasciatus larvae, and we identified several proteins from the midgut of Z. subfasciatus larvae that were specifically recognized by PF2 and thus could serve as potential targets for PF2 lectin insecticidal activity [11,12]. All the possible unigenes associated to these previously suggested PF2 targets were manually curated from the Z. subfasciatus transcriptome, which allowed us to identify a total of 30 different unigenes encoding putative PF2 lectin targets; of these, we were able to identify the complete open reading frame (ORF) for 8 sequences. Imaginal Disc Growth Factor 4 (IDGF4-chitinase), ATP synthase, and prohibitin were among the most represented protein families. Furthermore, generally, lectins interact with other proteins by binding to carbohydrates found on their target proteins. We analyzed the sequences of the most promising targets for PF2 for the identification of putative glycosylation sites. Our results showed that several proteins from those we analyzed had at least one putative glycosylation site (Table 3).

Identification and Phylogenetic Analysis of Potential PF2 Lectin Target Sequences
We previously reported that PF2 lectin is toxic to Z. subfasciatus larvae, and we identified several proteins from the midgut of Z. subfasciatus larvae that were specifically recognized by PF2 and thus could serve as potential targets for PF2 lectin insecticidal activity [11,12]. All the possible unigenes associated to these previously suggested PF2 targets were manually curated from the Z. subfasciatus transcriptome, which allowed us to identify a total of 30 different unigenes encoding putative PF2 lectin targets; of these, we were able to identify the complete open reading frame (ORF) for 8 sequences. Imaginal Disc Growth Factor 4 (IDGF4-chitinase), ATP synthase, and prohibitin were among the most represented protein families. Furthermore, generally, lectins interact with other proteins by binding to carbohydrates found on their target proteins. We analyzed the sequences of the most promising targets for PF2 for the identification of putative glycosylation sites. Our results showed that several proteins from those we analyzed had at least one putative glycosylation site (Table 3).  Based on this analysis, we found 10 candidates for IDGF4-chitinase proteins with a length range of 179 to 935 amino acids. According to the phylogenetic analysis, the putative IDGF4-chitinases are clustered with their homologs into six groups: chitinase 2 (1 unigene), chitinase 3 (1 unigene), chitinase 8 (1 unigene), endochitinase (1 unigene), chintinase 7 (3 unigenes) and Imaginal Disc Growth Factor 4 (3 unigenes). The last group of the unigenes ZS15447, ZS07488, and ZS04622 were closely related to chitinase-like proteins Idgf4 isoform X2 of Apis mellifera (western honeybee; XP_016769016), D. ponderosae (XP_019767108) and A. glabripennis (XP_018571342), respectively (supported by bootstrap values of 89%, 38% and 94%, respectively). These results support the presence of multiple IDGF4 genes in Z. subfasciatus (Table 3, Figure 5).
Nine unigene sequences with an average length of 2900 bp encoding for putative ATP synthase subunits or V-type H+-ATPase subunits with an average length of 417 amino acids were identified in the transcriptome. Of these, we obtained the complete ORF for five (Table 3). According to the phylogenetic analysis of the unigenes (with 43-100% of bootstrap support) and their homologs obtained by blast, six annotated as mitochondrial ATP synthase subunits alpha (3 unigenes) and beta (3 unigenes). In addition, of those classified as alpha, two sequences (ZS00399 and ZS00098) could be isoforms of the same gene. Another 3 unigenes were annotated and classified as V-type H+-ATPase subunit A (ZS02386), B (ZS01938), and E (ZS09298, Figure 6). Nine unigene sequences with an average length of 2900 bp encoding for putative ATP synthase subunits or V-type H+-ATPase subunits with an average length of 417 amino acids were identified in the transcriptome. Of these, we obtained the complete ORF for five (Table 3). According to the phylogenetic analysis of the unigenes (with 43-100% of bootstrap support) and their homologs obtained by blast, six annotated as mitochondrial ATP synthase subunits alpha (3 unigenes) and beta (3 unigenes). In addition, of those classified as alpha, two sequences (ZS00399 and ZS00098) could be isoforms of the same gene. Another 3 unigenes were annotated and classified as V-type H+-ATPase subunit A (ZS02386), B (ZS01938), and E (ZS09298, Figure 6).  We also identified five sequences as candidates encoding prohibitin proteins with an average length of 227 amino acids, for two of these, ZS00843 and ZS09582, we obtained the complete ORF with a length of 276 and 309 amino acids, respectively (Table 3). With a strong support the phylogenetic analysis showed that amino acid sequences encoded by these unigenes classified in the prohibitin 1 (2 unigenes), prohibitin 2 (2 unigenes), and band7/SPFH-like (1 unigene) families suggesting the presence of at least two different genes with the potential function of prohibitin 1 and two genes with the function of prohibitin 2 in Z. subfasciatus (Figure 7). We also identified five sequences as candidates encoding prohibitin proteins with an average length of 227 amino acids, for two of these, ZS00843 and ZS09582, we obtained the complete ORF with a length of 276 and 309 amino acids, respectively (Table 3). With a strong support the phylogenetic analysis showed that amino acid sequences encoded by these unigenes classified in the prohibitin 1 (2 unigenes), prohibitin 2 (2 unigenes), and band7/SPFH-like (1 unigene) families suggesting the presence of at least two different genes with the potential function of prohibitin 1 and two genes with the function of prohibitin 2 in Z. subfasciatus (Figure 7).
The search for potential PF2 lectin target sequences also allowed us to identify 3 unigenes encoding for mitochondrial-processing peptidase proteins (MPP) that were classified into subunit alpha (1 unigene) and beta (2 unigenes, ZS03356 and ZS04132), the latter could be isoforms of the same gene (Table 3, Figure S1). Finally, we found 2 unigenes encoding α-amylase proteins (ZS13222 and ZS18820) and 1 unigene encoding a mitochondrial NADH-ubiquinone oxidoreductase protein (ZS03487) ( Table 3). The phylogenetic analysis of these sequences is shown in Figures S2 and S3, respectively. The sequences encoded by the unigenes ZS13222 and ZS18820 are related to A. mellifera α-amylase (AAM20738) and α-amylase 2-like A. dorsata (south and southeast Asian honeybee; XP_006614569), respectively. While that for mitochondrial NADH-ubiquinone oxidoreductase protein, ZS03487, is closely related to the mitochondrial NADH-ubiquinone oxidoreductase 75 kDa subunit of A. glabripennis (XP_018567499). The search for potential PF2 lectin target sequences also allowed us to identify 3 unigenes encoding for mitochondrial-processing peptidase proteins (MPP) that were classified into subunit alpha (1 unigene) and beta (2 unigenes, ZS03356 and ZS04132), the latter could be isoforms of the same gene (Table 3, Figure S1). Finally, we found 2 unigenes encoding α-amylase proteins (ZS13222 and ZS18820) and 1 unigene encoding a mitochondrial NADH-ubiquinone oxidoreductase protein (ZS03487) ( Table 3). The phylogenetic analysis of these sequences is shown in Figures S2 and S3, respectively. The sequences encoded by the unigenes ZS13222 and ZS18820 are related to A. mellifera α-amylase (AAM20738) and α-amylase 2-like A. dorsata (south and southeast Asian honeybee; XP_006614569), respectively. While that for mitochondrial NADH-ubiquinone oxidoreductase protein, ZS03487, is closely related to the mitochondrial NADH-ubiquinone oxidoreductase 75 kDa subunit of A. glabripennis (XP_018567499).

Discussion
The infestation of the stored common bean by Z. subfasciatus alters seed viability and nutritional value, and causes significant productivity and financial losses. Although important efforts have been made to search for strategies to control this pest, few studies have been conducted at the genomic level and little information on Z. subfasciatus is available from genomic and protein public databases. To our knowledge, there are only 40 nucleotide sequences and 26 amino acid sequences stored in the NCBI database. In the present study, we report the first transcriptomic analysis of Z. subfasciatus

Discussion
The infestation of the stored common bean by Z. subfasciatus alters seed viability and nutritional value, and causes significant productivity and financial losses. Although important efforts have been made to search for strategies to control this pest, few studies have been conducted at the genomic level and little information on Z. subfasciatus is available from genomic and protein public databases. To our knowledge, there are only 40 nucleotide sequences and 26 amino acid sequences stored in the NCBI database. In the present study, we report the first transcriptomic analysis of Z. subfasciatus larvae that contains 29,029 unigenes with an average length of 1168 bp; we were able to assign a functional annotation for 18,032 (62.1%) assembled unigenes using public databases. Our data show that Z. subfasciatus shares greater similarity with A. glabripennis (63.5%) than with T. castaneum (8.3%) indicating that it might be phylogenetically closer to A. glabripennis. In agreement with this, a comparison with four species from the order Coleoptera showed that Z. subfasciatus has the highest number of putative orthologs with A. glabripennis, followed by L. decemlineata, and then T. castaneum and D. ponderosae. On the other hand, it was not possible to identify a probable function for 10,997 (37.9%) sequences, some of which were greater than 2000 bp, suggesting that several could be new genes, unique proteins or non-coding RNA sequences of Z. subfasciatus. Furthermore, the functional annotation of the Z. subfasciatus larvae transcriptome allowed us to categorize and to determine the classification of the most represented proteins in Biological Processes, Molecular Functions, and Metabolic Pathways. About 4640 InterPro families, 66 biological processes, 39 molecular functions, and 12 metabolic pathways are represented in the Zabrotes larval transcriptome. When the most represented GO terms between the transcriptomes of Z. subfasciatus and those Coleopterous insects with the highest putative orthologs were compared marked similarities were found in metabolic, catabolic, and binding processes [21,25]. The most represented GO categories tend to be very similar in closely related organisms (within the same order). However, for the Cellular Component classification, the most represented category in L. decemlineata was protein complex [26], while intracellular stood out in the Z. subfasciatus transcriptome. Such variations may occur between transcriptomes from different developmental stages or may be a sign of transcriptome incompleteness. So far, this provides a dramatic increase in the genetic resources for Z. subfasciatus and will help to support studies on genetic expression of this insect, as well as studies focusing on the biology and physiology of closely related organisms. Furthermore, this information could serve to accelerate the development of pest management tools through the analysis of potential protein targets of insecticidal molecules.
The toxicity of synthetic insecticides has encouraged research into alternative pest control strategies. Several plant proteins such as, lectins, chitinases, arcelins, and glycohydrolases exert entomotoxic activity and offer the promise of environmentally safe options for pest control. The insecticidal mechanism of lectins relies on the specific recognition of glycans which can be part of proteins, forming glycoproteins. Therefore, although a protein is ubiquitous, its glycosylation pattern can vary between species. In insects, protein glycosylation profiles have been observed to change depending on the species and reproductive and developmental stages [27]. Our research group has shown that PF2 is highly toxic for the first larval stages of Z. subfasciatus [10]. Although we have identified several proteins and glycoproteins that could be the targets of PF2 lectin in the larval midgut, the mechanism(s) of action remain unclear. Thus, we are directing our current efforts towards clarifying the insecticidal processes of PF2. We have used information provided from our transcriptome database to identify 30 unigenes encoding for potential PF2 lectin targets. The translated sequences of several of these targets have putative glycosylation sites and could provide us with the building blocks to design strategies for elucidating the molecular mechanisms of the entomotoxic effect of PF2 on the bean weevil larvae.
IDGFs are proposed PF2 targets in Z. subfasciatus; they belong to a family of glycoproteins classified as class V chitinases. Unlike conventional chitinases class V chitinases have 24 additional amino acid residues and lack chitinase enzymatic activity. In insects, these proteins are important for regulating growth and proliferation [28,29]. IDGF4 genes are involved in the molting process and are essential for organizing the exoskeletal barrier in these organisms [30]. IDGFs studies generally focus on functional characterization in model Diptera and Lepidoptera insects, few report findings for non-model species such as the bruchid, Z. subfasciatus. In this work, we report the presence of multiple IDGF genes expressed in Z. subfasciatus larvae. These data agree with observations in other insects such as T. castaneum, with a total of 22 genes encoding chitinases and chitinase-like proteins (IDGFs) and Drosophila with 16 genes encoding 10 chitinases and six IDGF [29]. All deduced amino acid sequences identified in this work for Z. subfasciatus IDGFs exhibited one potential N-glycosylation site; this supports the idea that IDGFs could play a role as a target receptor for PF2 lectin.
Other putative receptors for PF2 in Z. subfasciatus are mitochondrial ATP synthase and Vacuolar-type ATPase. While ATP synthase is mainly associated with cell energy production, recent reports indicate that the beta subunit of mitochondrial ATP synthase is also located on the plasma membrane of insect cells where it acts as receptor of a circulating lipoprotein for the midgut and fat body cells of Panstrongylus megistus, probably mediating lipid transfer to the insect's fat body [31]. Vacuolar-type ATPases belong to a family of ATP-dependent proton pumps relevant in various membrane trafficking pathways that also function to acidify the lumen of some organelles and cellular compartments. In insects, the V-ATPases are present on the midgut brush border, and by acidification of the intestinal lumen, they provide proton-driving force for the secondary active transport of nutrients across this membrane [32]. In the present work, nine unigene sequences of ATP synthase and Vacuolar H+-ATPase subunits were identified. The analysis of potential N-glycosylation of these proteins predicted two sites for ATP synthase subunit beta, and three for V-ATPase subunits alpha and beta. Studies carried out in T. castaneum glycoproteins revealed that V-type ATPases are N-glycosylated [33]. Plasma membrane V-ATPase isolated from the midgut and Malpighian tubules of Manduca sexta (tobacco hornworm) is a N-glycosylated protein [34], while V-ATPase subunits from Acyrthosiphon pisum (pea aphid), D. melanogaster (common fruit fly), A. mellifera, Bombyx mori (silkworm), and T. castaneum show interaction with Galanthus nivalis (snowdrop) lectin indicating the presence of carbohydrates. The predicted glycosylation of Z. subfasciatus ATP synthase and V-ATPase subunits spur our continued study of them as potential target proteins for PF2 lectin. PF2 interaction with the β-chain of ATP synthase could interfere with both the production of ATP and the accumulation of lipids, while we might expect the interaction of PF2 with V-ATPase alpha and beta subunits affect lumen acidification and active transport of nutrients in Z. subfasciatus larvae. Thus, these two proteins represent key candidates for explaining PF2 insecticidal activity.
We have previously reported that an α-amylase isoform from Z. subfasciatus larvae midgut interacts with PF2. Amylases are an important family of enzymes required for the survival of insect larvae due to their involvement in carbohydrate metabolism [35]. We found two unigenes encoding α-amylase proteins (ZS13222 and ZS18820). The deduced amino acid sequences for these unigenes each showed one potential N-glycosylation site. Interestingly, the sequences obtained are more closely related to A. mellifera than to other coleopterans and failed to get the best blastx hit to a Z. subfasciatus α-amylase sequence already available in the NCBI database. This suggests the existence of variants that differ from constitutive amylase. Future experiments will be necessary to explore the specific expression of amylases obtained in this work at different developmental stages of the insect.
The mitochondrial-processing peptidase and mitochondrial NADH-ubiquinone oxidoreductase could be other putative PF2 targets. Mitochondrial-processing peptidase catalyzes the cleavage of N-terminal signal sequences of precursor proteins targeted for transport from cytosol to the mitochondria including those internalized into the matrix [36,37]. We found three unigenes of mitochondrial-processing peptidase in Z. subfasciatus larvae. Mitochondrial NADH-ubiquinone oxidoreductase is an enzyme of the mitochondrial electron transport chain and plays an important role during insect development as tissues are restructured and obsolete cells are destroyed by programmed cell death during metamorphosis [38]. In lepidopterans, such as Chilo partellus (spotted stalk borer) and M. sexta, changes were observed in mitochondrial oxidative activities during the larval-pupal-adult transitions [38][39][40]. However, the deduced amino acid sequences of mitochondrial-processing peptidase and mitochondrial NADH-ubiquinone oxidoreductase we identified showed no potential N-glycosylation sites and no reports were found of glycosylation of both proteins in other organisms. This suggests that PF2 recognition of these proteins is not likely to occur through a protein-carbohydrate interaction.
Another interesting target for PF2 in Z. subfasciatus is prohibitin. Prohibitins (PHBs) are multifunctional, but their main roles are as receptors of various cellular compartments such as the nucleus, nucleolus, mitochondria, endoplasmic reticulum, and plasma membrane [41,42]. PHBs have been associated with multiple functions depending on their intercellular location. In insects, PHBs are thought vital for normal development, since PHB1 and PHB2 have been implicated in the regulation of cell proliferation and apoptosis [12]. PHB1 is a receptor for the Cry3Aa protein from Bacillus thuringiensis (Gram-positive, soil-dwelling bacterium), making it a target protein for pest insect control. The putative prohibitin proteins of Z. subfasciatus were identified with five unigenes, two of which were for PHB1 and three for PHB2. PHBs have been reported to undergo post-translational modifications including glycosylation [41,43]; we report putative glycosylation sites for both PHBs.
In general, the sequences grouping of the putative PF2 lectin targets show strong support (>70% bootstrap), but the phylogenetic relationships among some of them remain unresolved. This may be because in some cases it was not possible to assemble the complete transcript which resulted in incomplete protein sequences. Therefore, nodes with low bootstrap values could be the result of a deficient number of informative sites from the alignment [44]. To solve this in related future investigations it will be necessary to obtain the complete sequence of these genes, in the meantime the specific sequences of Z. subfasciatus obtained in the present work will allow designing probes for gene expression and gene silencing studies, which may help unveil the role of putative PF2 targets in weevil development and physiology.

Conclusions
In conclusion, we report the first transcriptome of Z. subfasciatus that includes 29,029 assembled unigenes from 92 million paired-end reads. The data generated in this work will provide an important source of genomic data and will serve as a reference in studies on the biology of Z. subfasciatus. Our analysis of the transcriptome allowed us to identify the most active processes at the transcriptional level in Z. subfasciatus larvae, as well as gene families that code for possible targets of the PF2 lectin. This will provide information that will increase our understanding on the Z. subfasciatus larval development and relevant processes, as well as aid us in developing studies to evaluate the mechanisms of PF2 toxicity in this insect.

Supplementary Materials:
The following are available online at http://www.mdpi.com/2075-4450/11/11/736/s1: Figure S1: Phylogenetic analysis of putative mitochondrial-processing peptidase (MPP) subunit proteins of Z. subfasciatus and other species, Figure S2: Phylogenetic analysis of putative α-amylase proteins of Z. subfasciatus and other species, Figure S3: Phylogenetic analysis of putative Mitochondrial NADH-ubiquinone oxidoreductase proteins of Z. subfasciatus and other species, Table S1: Summary of annotation statistics of assembled unigenes, File S1: List of GO-slim terms represented for the assembled Z. Subfasciatus transcriptome. Funding: This work was supported by funds from the Agricultural Experiment Station, and the College of Agriculture and Life Sciences at the University of Arizona. This research did not receive any specific grant funding from specific agencies in the public, commercial, or not-for-profit sectors.