Genome-Wide Identification and Characterization of Carboxypeptidase Genes in Silkworm (Bombyx mori)

The silkworm (Bombyx mori) is an economically-important insect that can secrete silk. Carboxypeptidases have been found in various metazoan species and play important roles in physiological and biochemical reactions. Here, we analyzed the silkworm genome database and characterized 48 carboxypeptidases, including 34 metal carboxypeptidases (BmMCP1–BmMCP34) and 14 serine carboxypeptidases (BmSCP1–BmSCP14), to better understand their diverse functions. Compared to other insects, our results indicated that carboxypeptidases from silkworm have more family members. These silkworm carboxypeptidases could be divided into four families: Peptidase_M2 carboxypeptidases, Peptidase_M14 carboxypeptidases, Peptidase_S10 carboxypeptidases and Peptidase_S28 carboxypeptidases. Microarray analysis showed that the carboxypeptidases had distinct expression patterns, whereas quantitative real-time PCR demonstrated that the expression level of 13 carboxypeptidases significantly decreased after starvation and restored after re-feeding. Overall, our study provides new insights into the functional and evolutionary features of silkworm carboxypeptidases.


Introduction
In insects, food proteins are preliminary digested by midgut endopeptidases and then by exopeptidases into single free amino acids that are further absorbed by intestinal cells [1]. Endopeptidases, such as trypsin, chymotrypsin, elastase, thermolysin, pepsin, glutamyl endopeptidase, cathepsin B, cathepsin L and neprilysin, are proteolytic peptidases that break the peptide bonds of nonterminal amino acids. On the other hand, exopeptidases, such as aminopeptidases and carboxypeptidases, are applied to the N-terminal and C-terminal peptide bonds, respectively, of the protein to release single free amino acids [2]. Carboxypeptidases are classified into two sub-categories, according to their catalytic mechanism: serine carboxypeptidases with an active serine residue in the active site and metal carboxypeptidases with a metal ion in the active site [3,4]. In general, carboxypeptidases perform important physiological functions, such as food digestion, blood clotting, growth factor production and regulation of biological processes in tissues and organs [5][6][7].
The silkworm is commonly known as an economically-important insect that secretes silk protein threads to build a cocoon. The study of protein digestion and nutrient absorption in silkworm may reveal the underlying mechanism of silk protein synthesis. Most of the silkworm endopeptidases are serine proteases. In 2010, 51 serine proteases and 92 serine proteases homologs were identified in the silkworm [17]. However, information on carboxypeptidases is still limited. The carboxypeptidase gene MF-CPA from silkworm molting fluid has been previously cloned, characterized [18] and identified in the embryos at the end of the organogenesis [19]. Since silkworm genome sequencing is completed, it is possible to identify the carboxypeptidases family members in the whole genome of silkworm, B. mori [20][21][22]. In the present work, we identified the silkworm carboxypeptidases family and analyzed its genomic organization, expression and molecular evolution in order to reveal additional unknown carboxypeptidases. Moreover, we investigated the characteristics of the silkworm carboxypeptidase family, including phylogeny relationships, as well as spatial and temporal expression profiles. Our results provided preliminary evidence to support the functional roles of carboxypeptidases in food digestion.

Identification and Characterization of the Carboxypeptidase Family
By searching the silkworm genome database (SilkDB), 48 members of the carboxypeptidase family were identified; these included, 34 metal carboxypeptidases (nine metal carboxypeptidases containing the Peptidase_M2 domain and twenty five metal carboxypeptidases containing the Peptidase_M14 domain) and 14 serine carboxypeptidases (five serine carboxypeptidases containing the Peptidase_S10 domain and nine serine carboxypeptidases containing the Peptidase_ S28 domain; Table 1).

Expression Profiles
Microarray data from SilkDB were used to analyze the expression profiles of carboxypeptidases in different tissues. As shown in Figure 2A, 44 carboxypeptidases had corresponding oligonucleotide probes, and a heat map was created based on signal intensity values. The majority of Peptidase_M14 carboxypeptidases was highly expressed in the midgut, whereas other high expression levels in the testis. The rest of carboxypeptidases were expressed in various tissues (Figure 2A).

Expression Profiles
Microarray data from SilkDB were used to analyze the expression profiles of carboxypeptidases in different tissues. As shown in Figure 2A, 44 carboxypeptidases had corresponding oligonucleotide probes, and a heat map was created based on signal intensity values. The majority of Peptidase_M14 carboxypeptidases was highly expressed in the midgut, whereas other high expression levels in the testis. The rest of carboxypeptidases were expressed in various tissues (Figure 2A).  The expression profiles of four carboxypeptidase genes (BmMCP10, BmMCP16, BmMCP34 and BmSCP9) that lack specific oligonucleotide probes in the database were replenished by quantitative real-time PCR (qRT-PCR). BmSCP9 was mainly expressed in the fat body epidermis and sex gland.
BmMCP10 was highly expressed in the midgut. BmMCP16 was expressed in the sex gland. However, the gene BmMCP34 was not expressed in all tissues at Day 3 of the fifth instar larval stage ( Figure 2B).
The expression profile of the midgut-specific carboxypeptidases was further examined by qRT-PCR in nine different tissues, including the head, epidermis, testis, ovary, midgut, silk gland, Malpighian tubules, hemocytes and fat body at Day 3 of the fifth instar larval stage (Figure 3). The results were consistent with those of the heat map. The expression profiles of four carboxypeptidase genes (BmMCP10, BmMCP16, BmMCP34 and BmSCP9) that lack specific oligonucleotide probes in the database were replenished by quantitative real-time PCR (qRT-PCR). BmSCP9 was mainly expressed in the fat body epidermis and sex gland. BmMCP10 was highly expressed in the midgut. BmMCP16 was expressed in the sex gland. However, the gene BmMCP34 was not expressed in all tissues at Day 3 of the fifth instar larval stage ( Figure 2B).
The expression profile of the midgut-specific carboxypeptidases was further examined by qRT-PCR in nine different tissues, including the head, epidermis, testis, ovary, midgut, silk gland, Malpighian tubules, hemocytes and fat body at Day 3 of the fifth instar larval stage (Figure 3). The results were consistent with those of the heat map.  Moreover, we created two temporal expression pattern heat maps, one for each gender ( Figure S1). Midgut-specific carboxypeptidases were expressed throughout the larval stage, whereas others were only expressed in the pupal stage.

Carboxypeptidase Expression Profile after Starvation
As shown in Figure 4, we tested the influence of feeding, starvation and starvation-refeeding on the expression level of midgut-specific carboxypeptidases (Figure 4). Our results showed that these carboxypeptidases were significantly downregulated after starvation and restored their expression after re-feeding. Moreover, we created two temporal expression pattern heat maps, one for each gender ( Figure S1). Midgut-specific carboxypeptidases were expressed throughout the larval stage, whereas others were only expressed in the pupal stage.
The expression of carboxypeptidases in the feeding group was stable throughout the experimental period. The expression level of BmSCP12, BmSCP14, BmMCP13, BmMCP20, BmMCP22, BmMCP30 and BmMCP31 reached a peak between 48 and 72 h after feeding, whereas that of other carboxypeptidases, including BmSCP1, BmMCP23 and BmMCP27, reached a peak much earlier. BmMCP14 was highly expressed throughout the starvation and refeeding period, whereas the expression level of BmSCP12, BmMCP13, BmMCP27 and BmMCP31 was significantly increased at the beginning of starvation. This may be the stress response of silkworm midgut. Eleven carboxypeptidases were significantly downregulated after starvation; most of them (BmSCP12, BmSCP14, BmMCP13, BmMCP14, BmMCP20, BmMCP22 and BmMCP30) restored their expression level after refeeding, whereas the rest (BmSCP1, BmMCP23, BmMCP27 and BmMCP31) failed to reach the initial expression level.

Discussion
Carboxypeptidases are widely found in members of the taxon Metazoa [23,24]. These enzymes are exopeptidases that generally catalyze different reactions based on their active sites. In the present study, 48 carboxypeptidases were identified in the silkworm genome. The silkworm possesses a higher number of carboxypeptidases than do other insects [7,25]; therefore, further study is needed to investigate their unknown functions.
The size of silkworm carboxypeptidases ranges from 73 AA-1051 AA. It is generally considered that 40-50 residues are the lower limit of the functional domains and that protein sizes range from 40-50 residues to thousands of residues. We predicted the BmMCP33 (996 AA) and BmSCP11 (1051 AA) domains and found that the former has two Peptidase_M14 domains, whereas the latter is an endomembrane protein 70. Differences in the size of silkworm carboxypeptidases and the combination of the carboxypeptidase domain with other domains suggested that different carboxypeptidases might have different functions.
According to the domain features, 34 metal carboxypeptidases were classified into two groups: Peptidase_M2 carboxypeptidases and Peptidase_M14 carboxypeptidases [3]. Carboxypeptidases that contain the Peptidase_M2 domain are known as angiotensin-converting enzymes (ACEs) [26,27]. ACEs are highly important for the regulation of blood pressure [28]. A. gambiae has nine ACE-like genes, but their functions remain unclear [27]. The M14 family is one of the most widely-studied metal carboxypeptidase subunits. The functions of Peptidase_M14 carboxypeptidases are various and diverse, including the digestion of food [29], the processing of bioactive peptides [30] and the metabolism of bacterial cell walls [31].
Fourteen serine carboxypeptidases were classified into two groups: Peptidase_S10 carboxypeptidases and Peptidase_S28 carboxypeptidases [3]. The Peptidase_S10 carboxypeptidase family is active only at acidic pH and is different from most of the other serine peptidase families [32]. There are two types of Peptidase_S10 carboxypeptidases; one (e.g., carboxypeptidase C) that shows preference for hydrophobic residues [33][34][35] and another (e.g., carboxypeptidase D) that shows preference for basic amino acids on either side of the scissile bond, but it is also able to cleave peptides with hydrophobic residues [33,36,37]. Carboxypeptidases in the family S28 suppress angiotensin II by the cleavage of the C-terminal-Pro Phe bond [38]. Additionally, recombinant carboxypeptidases in the family S28 associated with H-kininogen are able to activate plasma prekallikrein [39]. In general, serine carboxypeptidases are considered to play the role of lysosomes and participate in the turnover of proteins. In addition, some of them release amino acids from extracellular proteins and peptides [3].
Phylogenetic analysis of silkworm carboxypeptidases is presented in Figure 1. The tree of metal carboxypeptidase topologies indicated that the divergence and duplication of the Peptidase_M14 carboxypeptidase gene occurred before the separation of B. mori and A. aegypti. Eleven A. aegypti carboxypeptidase genes were induced in the midgut by blood-meal feeding [7], suggesting that B. mori carboxypeptidase genes might be also induced by food intake. The silkworm Peptidase_M14 carboxypeptidase is very conservative in the taxon Metazoa. Human digestive carboxypeptidases (NP_001859) hydrolyze the C-terminal peptide of dietary polypeptide chains [40]. Therefore, the silkworm Peptidase_M14 carboxypeptidase might also have a digestive function. Similarly, the B. mori Peptidase_S28 carboxypeptidase genes and the T. brasiliensis serine carboxypeptidase genes share orthologs in the main branch of the tree. T. brasiliensis use serine carboxypeptidase as a digestive enzyme, suggesting that the B. mori Peptidase_S28 carboxypeptidase might be also a digestive enzyme.
Carboxypeptidases play key roles in various physiological and biochemical processes in many insects. In the present study, some Peptidase_M14 carboxypeptidases and the serine carboxypeptidases BmSCP1, BmSCP3, BmSCP12 and BmSCP14 were specifically expressed in the midgut of silkworm. In Anopheles culicifacies, the carboxypeptidase AcCP is specifically expressed in the midgut, whereas in T. brasiliensis, the serine carboxypeptidases tbscp-1 and tbscp-2 are highly expressed in the posterior midgut (small intestine) and lowly expressed in the salivary glands, fat body and anterior midgut (stomach) [15]. These results demonstrated that silkworm carboxypeptidases might participate in digestion in the midgut. Several carboxypeptidases were specifically expressed in the testis and might play important roles in the male reproductive development. Whereas others were widely expressed in various tissues and might perform different functions. For example, the molting fluid carboxypeptidase A (MF-CPA) is identified in the molting fluid of insects at the pupal ecdysis and molting pre-pupal stages. MF-CPA has been proposed to degrade proteins in old epidermal cells and to participate in the recycling of amino acids [18].
The insect digestive tract is divided into three parts: the foregut, midgut and hindgut. The midgut is the most advanced of digestive organs and the most important place for digestion and absorption in insects. The digestion of silkworm larvae includes mechanical digestion and chemical digestion. Under the action of the midgut digestive juice, macromolecules from mulberry, such as carbohydrates, proteins and lipids, are digested into small molecule compounds and absorbed by midgut epithelial cells. Then, compounds are transported to other organs through the hemolymph to provide energy for silkworm growth, development and other life activities.
In the present study, starvation could regulate the expression levels of carboxypeptidases in the larval midgut, and re-feeding could restore them to the initial levels. These results suggested that the expression of midgut carboxypeptidases was induced by food intake. Similar results have been also reported in A. aegypti; 11 of A. aegypti carboxypeptidases are upregulated in response to blood meal feeding [7]. The expression profile of induced by starvation and re-feeding in our study was the same as the expression profile of a chymotrypsin-like serine protease in Spodoptera litura [41]. Here, the expression levels of BmSCP1 and BmMCP30 were higher after re-feeding compared to those during normal feeding. Harmonia axyridis can completely compensate the body sizes through accelerated growth [42]. In some animals, the compensatory growth is sometimes faster than the normal growth [43,44], and starvation is applied in animal rearing to obtain economic benefits [45].
In summary, 48 members of the silkworm carboxypeptidase family were identified and characterized in the present study. The expression patterns and two phylogenetic trees of carboxypeptidases were analyzed. We further explored the function of carboxypeptidases, especially of those that were specifically expressed in the midgut. Our findings provided a reference for future studies on Lepidoptera carboxypeptidases.

Biological Materials
The silkworm strain Dazao (p50) was used in this study. The silkworms were reared on fresh mulberry leaves at 25˝C with 70%-80% relative humidity and a 16-h light/8-h dark cycle in a growth chamber of the State Key Laboratory of Silkworm Genome Biology. Samples from embryonic stages and larval tissues were isolated and stored in liquid nitrogen.

Identification of the Carboxypeptidase Family in Silkworm
SilkDB [46] was used to predict the silkworm carboxypeptidase family. Carboxypeptidase genes from other species were downloaded from GenBank [47]. The BLAST alignment tool was downloaded from the ftp site of the National Center for Biotechnology Information [48]. Carboxypeptidases sequences from other species were used as queries to BLAST against the SilkDB with an E-value threshold of 10´6 [49]. Subsequently, SMART (Simple Modular Architecture Research Tool) [50] and Pfam [51] were used to validate each putative protein.

Bioinformatics and Phylogeny Analysis of the Silkworm Carboxypeptidase Family
The open reading frame (ORF) of carboxypeptidases in B. mori was identified using ORF Finder [52]. The signal peptide was predicted by SignaIP 4.1 [53]. The molecular weight and isoelectric point were predicted using ProtParam [54]. The amino acid sequences of putative carboxypeptidase were aligned using ClustalX [55]. A phylogenetic trees of metal-carboxypeptidases and another of serine-type carboxypeptidases were constructed by the neighbor-joining method with 1000 bootstrap replicates using MEGA 6.0 [56,57].

Expression Profiles of Silkworm Carboxypeptidase Genes via Whole-Genome Microarrays
A genome-wide oligonucleotide microarray with more than 22,000 probes, including 44 carboxypeptidase-specific oligonucleotide probes, was established as previously described [58]. We identified four carboxypeptidase genes without specific oligonucleotide probes in the database. Microarray data revealed that carboxypeptidase genes had different expression patterns in the tissues of the fifth instar larva at Day 3. Next, diverse tissues, including testis, ovary, head, integument, fat body, midgut, hemocytes, Malpighian tubules, anterior/middle silk gland and posterior silk gland, were collected. To identify the developmental expression patterns, silkworm from 20 different time points (from Day 3 of the fifth instar larval stage to the moth stage) were collected from both genders. Gene expression levels were visualized using GeneCluster 3.0 (University of Tokyo, Tokyo, Japan) [59].

Silkworm Starvation Experiment
Newly molted fifth instar larvae were divided into three groups, to test whether carboxypeptidases were induced by starvation. Larvae in the feeding group were fed on mulberry leaves throughout the experimental period. Larvae in the starvation groups were starved for 6, 12, 24, 48 and 72 h. Larvae in the starvation-refeeding groups were starved for 12, 24 and 36 h and then fed for 12, 24 and 36 h, respectively [41,60]. The larval midguts in each group were collected for analysis.

RNA Extraction
Total RNA was extracted from all tissues (testis, ovary, head, integument, fat body, midgut, hemocytes, Malpighian tubules, anterior/middle silk gland and posterior silk gland) at Day 3 of the fifth instar larval stage and starvation experiment samples using the Total RNA Kit II (Omega, Norcross, GA, USA), according to the manufacturer's protocol. Total RNA (2 µg) was reverse transcribed into cDNA using M-MLV reverse transcriptase (Promega, Madison, WI, USA). To synthesize the first-strand cDNA, 2 µg of total RNA was mixed with 2 µL of 50 µM oligo (dT) in a total volume of 15 µL. The mixture was briefly spun, heated at 70˝C for 5 min and incubated on ice for 5 min. The mixture was then spun briefly and replaced on ice. After other components (5 µL 5ˆfirst strand synthesis buffer, 1 µL dNTP mix, 1 µL RNase inhibitor, 100 U M-MLV reverse transcriptase) were added to the mixture, the reaction mixture was spun briefly and incubated at 42˝C for 1.5 h. The cDNA was then incubated at 92˝C for 10 min and stored at´20˝C.

qRT-PCR
qRT-PCR was performed using the Step-One-Plus™ Real-Time PCR system (Thermo-Fischer Scientific, Waltham, MA, USA) with SYBR ® Premix Ex Taq™ II (TaKaRa, Shiga, Japan). PCR conditions were 94˝C for 30 s, followed by 40 cycles at 95˝C for 5 s and 60˝C for 30 s. PCR conditions were 94˝C for 30 s, followed by 40 cycles at 95˝C for 5 s and 60˝C for 30 s. All cDNA samples were normalized using the B. mori eukaryotic translation initiation factor 4A (BmeIF-4a, silkworm microarray probe ID, sw22934; sense primer, 5 1 -TTCGTACTGGCTCTTCTCGT-3 1 ; antisense primer, 5 1 -CAAAGTTGATAGCAATTCCCT-3 1 ) as the internal control. Each expression assay was repeated at least three times. The primer sequences of all genes are listed in Table S1. The relative gene expression level was determined by the 2´∆ ∆Ct method [61]. Statistical significance at p < 0.05 was determined by Student's t-test using GraphPad [62].