Comparative Transcriptome Analysis of Stage-Specific Changes in Gene Expression during Larval Development in Monochamus alternatus Hope

Monochamus alternatus Hope (Coleoptera: Cerambycidae) is an important trunk borer of pine trees and a major vector of pine wilt disease. Although chemicals are widely used in forest pest control, new strategies based on insect biology are offering promising approaches to manage the disease. Although there have been important research advances in this respect, there has not yet been a deep sequence analysis of M. alternatus describing the transcriptome, and no information is available about the gene function of this insect vector. We used next generation sequencing technology to provide a full transcriptome from the four larval instars of M. alternatus and successfully built an M. alternatus transcriptome database. In total, 67,456 unigenes were obtained with trinity software, information for 11,858 classified unigenes was obtained with the Clusters of Orthologous Groups (COGs) database, and 13,007 unigenes matched predicted pathways using the Kyoto Encyclopedia of Genes and Genomes (KEGG). In addition, genes related to lignocellulose, and putative Bt receptors and genes related to digestion are described. Additionally, the differential gene expression of these genes in different larval stages was analyzed. This study provides valuable information to underpin the development of new molecular tools for M. alternatus control strategies.


Introduction
Monochamus alternatus Hope (Coleoptera: Cerambycidae) is a significant vector of the pinewood nematode and an important forest pest that causes pine forest devastation in Korea, Japan, and China [1]. Pine wilt disease, which is caused by pinewood nematode and is native to North America, spreads rapidly into China, Korea and Europe (Portugal and Spain), and has become a major economic problem since it was introduced into China in 1982 [2]. Since the 1945, pine wilt disease has caused the loss of 26 million m 3 of timber in Japan [3]. Pine wilt disease is considered a "cancer" of pine trees because of its rapid spread and lack of effective control methods against it [4]. Pinewood nematode disease has become endemic in 17 provinces in China, causing economic losses of up to one trillion yuan every year [5]. Therefore, it is very important to control the transmission of the pinewood nematode disease. It mainly feeds on the phloem and xylem of the trunk and branches, and destroys and cuts off the transporting tissues affecting water and nutrients. The larvae have four instars-the first and second instars feed on phloem while the second instar specifically feeds between the cortex and sapwood. When the third and fourth instars are developed, they feed on the wood xylem [6]. The degradation of cellulose plays a crucial role in meeting the nutrient requirements of M. alternatus during their development. The gene family of the cellulolytic enzyme in insects has been reported to play a crucial role during cellulose digestion. For example, plant cell wall degrading enzymes (PCWDEs) have been identified in the intestinal tract of Anoplophora glabripennis and Agrilus planipennis larvae, and a single GH45 gene has been identified in the midgut of Western Corn Rootworm [7][8][9]. However, information about the lignocellulose gene family, which is necessary for the survival of different larval instars in M. alternatus, is insufficient. Therefore, study of the differential expression of lignocellulose genes among the different larval instars of M. alternatus provides a theoretical basis for designing and developing the corresponding cellulose-degrading enzyme inhibitors, and at the same time promotes the development of an environmentally friendly and effective control of M. alternatus.
To effectively control the population of M. alternatus, it is necessary to adopt various methods of prevention and cure. Chemicals are widely used in forest pest control, but some are ineffective and may cause environmental pollution and insect resistance. Therefore, it is necessary to develop more effective and environmentally friendly control methods.
In recent year, Bacillus thuringiensis (Bt) has become the most studied, developed, and widely used microbial insecticide [10]. The parasitoid crystals produced by Bt have a good insecticidal effect and high specificity against target pests such as Lepidoptera and Coleoptera [11,12]. When the target insect feeds on the Bt crystallin, the crystallin will be degraded into a toxic-active peptide in the alkaline environment of the insect midgut, which will bind to the specific receptor in the midgut causing cell membrane perforation, disrupting the cell osmotic balance and finally causing the insect to stop feeding and die [13]. With the use of Bt insecticides, some target insects developed resistance to the Cry toxin [14]. Therefore, identifying and analyzing intestinal digestive enzymes and the Bt receptor gene family of M. alternatus larvae will provide a basis to engineering more effective Bt toxins for M. alternatus.
In this study, the transcriptome data of the first, second, third, and fourth instar larvae of M. alternatus were assembled by Illumina sequencing technology. The lignocellulose, Bt receptor, and midgut protease gene families related to the digestive process were systematically analyzed and their differential expressions determined. This study provides new information towards developing a novel control strategy for M. alternatus.

Insects
M. alternatus larvae were collected from newly dead trees in LianJiang county and FuJian province (N 26.15046 • ; E 119.59261 • ). Insects were feed in the insectary until they developed into adults and were used to rear the next generation. First, second, third, and fourth instar larvae were collected for RNA extraction in triplicate. In this study, the first, second, third, and fourth-instar larvae were collected from three nematode-infected pine trees, at three replicates per instar (N = 12).

RNA Extraction
Total RNA was extracted with TRIzol Reagent (Invitrogen, Carlsbad, CA, USA) and purified with an HP Total RNA Kit (OMEGA, Invitrogen). RNA quality was checked on 1% agarose gels. RNA concentration and purity were determined using the RNA Nano 6000 Assay Kit of the Agilent Bioanalyzer 2100 system (Agilent Technologies, Santa Clara, CA, USA).

cDNA Library Construction and Illumina Sequencing
Total RNA (3 µg per sample) was used to construct the libraries with the RNA Library Prep Kit for Illumina ® (NEB, Beverly, MA, USA) following the manufacturer's instructions. mRNA was enriched by Oligo (dT) and fragmented randomly using divalent cations under elevated temperatures in NEBNext First Strand Synthesis Reaction Buffer (5X). The first cDNA strand was synthesized using mRNA as a template with random hexamers. Then, a double cDNA strand was synthesized using DNA Polymerase I and Rnase H. Double stranded cDNA was purified with the AMPure XP system (Beckman Coulter, Beverly, MA, USA) to preferentially select fragments with a length of 150-200 bp. PCR was performed with Phusion High-Fidelity DNA polymerase, Universal PCR primers, and Index (X) Primer. Finally, PCR products were purified with the AMPure XP system, and cDNA library quality was assessed with the Agilent Bioanalyzer 2100 system. After cluster generation was used on a cBotcluster Generation System, the Illumina HiSeq 2000 platform performed high-throughput sequencing.

Sequence Assembly
Clean reads were obtained by removing subsequent ligation, primers, and low-quality reads. Transcriptome assembly was performed to generate transcript data on high-quality clean reads in Trinity [15].

Differentially Expressed Gene (DEG) Analysis Based on FPKM Values
To determine gene expression changes during the larval stages, we used comparative transcriptomes analysis according to the FPKM (Fragments Per Kilobase of exon model per Million mapped fragments) values [26]. In addition, corrected p-values were assessed to authenticate DEGs using the Benjamini-Hochberg approach [27]. The FDR (False Discovery Rate) is considered a key indicator in multiple hypothesis testing for the screening of different genes. To normalize expression levels of genes based on the FDR, fold change was expressed as a log2 transform. The parameters for interpreting significant differential expression were set to FDR ≤ 0.01 and FC (Fold Change) ≥ 2.

Validation of Gene Expression by qPCR
To verify the expression patterns observed from RNA-SEQ analysis, 12 genes were selected for qPCR analysis. RNA was extracted and purified by the above methods. The first-strand cDNA was synthesized from a random hexamer with approximately 3 µg of RNA per sample. Quantitative RT-PCR was performed with SYBR Premix Ex Taq (Takara) on the AN ABI 7500 system. The primers used are shown in Additional File 2. The amplification program lasted for 2 min at 95 • C followed by 95 • C for 15 s and 60 • C for 31 s (40 cycles). To ensure the robustness of the genes, beta-actin was used as an internal reference gene, and each gene was repeated in three biological replicates. The relative expression level was calculated by the 2 −∆∆CT method [28].

Construction of a Phylogenetic Tree and Heat Map
The available amino acid sequences of genes identified in different species were downloaded from the Pfam database to construct the phylogenetic tree. A phylogenetic tree was constructed with MEGA 5.0 [29] using the neighbor-joining method with the Poisson model after sequence alignment performed by ClustalX [30]. Bootstrap analysis of 1000 replication trees was performed to evaluate the support value of each tree.

Statistical Analysis
The normalized signal strength values were further analyzed by one-way analysis of variance (ANOVA) according to the gene family expression levels at different larval stages. In all cases, statistical significance was set at the p value threshold of 0.001 (p < 0.001).

Illumina Sequencing and de Novo Assembly
High throughput sequencing was performed on the larvae of M. alternatus at different developmental stages using Hiseq2000. The raw reads were submitted to the Sequence Read Archive (SRA) database at the National Center for Biotechnology Information (NCBI) with the accession number (SRP 070969). After quality control, we obtained valid data totaling 66.76 Gb, with a Q30 between 89.42 and 92.66% for each sample. Read quantity was 19,938,940-26,833,274, and GC content was not less than 43.35% (Table S1). Trinity software processed the data, and 67,456 unigenes with an average length of 784.97 bp and an N50 of 1527 bp were acquired. The length distribution of the unigenes is shown in Figure S1 and Table 1. The results showed that the number of unigenes decreased with increasing sequence length, and a large number of unigenes were present in the range of 200-300 bp. These indicate that the quality of the transcriptome sequencing and assembly is acceptable.
To better understand the functions of candidate genes in different instars, we compared all genes expressed in the transcriptomes of M. alternatus at each instar ( Table 2). There were 4314 differentially expressed genes in the second and third instars, among which 2248 genes were up-regulated and 2066 genes were down-regulated, while there were fewer differentially expressed genes in the first and second, as well as the third and fourth, instars. This may be due to the transition of the feeding environment from phloem to xylem at different stages.

Analysis of Differentially Expressed Genes Based on FPKM Values
Based on the annotation of the results described above, we identified a total of 67,456 unigene gene sequences, many of which were differentially expressed. To verify the accuracy of the bioinformatic analyses, 12 differentially expressed candidate genes were selected for qPCR verification in this study. The results showed that the qPCR validation results were consistent with the FPKM values obtained by sequencing, which verified the accuracy of the bioinformatic prediction. The expression trend of differentially expressed genes was similar between the first and second instar larvae, and the expression of two differentially expressed genes (c71864.graph_c0, c77343.graph_co) was significantly up-regulated at the fourth instar stage. Thus, reliable data are provided for further identification of candidate genes ( Figure 1).

Analysis of Differentially Expressed Genes Based on FPKM Values
Based on the annotation of the results described above, we identified a total of 67,456 unigene gene sequences, many of which were differentially expressed. To verify the accuracy of the bioinformatic analyses, 12 differentially expressed candidate genes were selected for qPCR verification in this study. The results showed that the qPCR validation results were consistent with the FPKM values obtained by sequencing, which verified the accuracy of the bioinformatic prediction. The expression trend of differentially expressed genes was similar between the first and second instar larvae, and the expression of two differentially expressed genes (c71864.graph_c0, c77343.graph_co) was significantly upregulated at the fourth instar stage. Thus, reliable data are provided for further identification of candidate genes ( Figure 1).
indicating a genetic diversity (Figure 2). Three ADHs (c71289.graph_c0, c72119.graph_c0, c72893.graph_c0) had different expression levels in each instar, suggesting that genes encoding ADHs may be involved in the metabolism of carbohydrates, lipids, and other nutrients. Two ADHs (c73009.graph_c0, c73582.graph_c0) were expressed at the third and fourth instar and may play a role in the metabolism of fat synthesis ( Figure S2).

Candidate Genes Related to Bt Receptors
A total of 308 genes associated with Bt receptors were detected. Among them, four extensively studied Bt receptors were identified and are described below. Transcriptome data demonstrated that 35% of APNs and 43% of CADs represent approximately 11.36 and 13.96% of all Bt receptors. Thirteen ALPs were identified in the transcriptomic data; it is known that ALPs play an important role in the insecticidal process of Bt toxin. In addition to APN, ALP, and CAD, 217 ABCs were identified, contributing to 70.45% of the total Bt receptors (Table S3). We performed a phylogenetic analysis using our candidate APNs, ALPs, ABC transporters, and cadherins and those from other insect species, including A. glabripennis, T. castaneum, A. gambiae, S. litura, and D. melanogaster (Figure 3). APN genes in M. alternatus were very similar to A. glabripennis. M. alternatus Bt receptor orthologous pairs had levels of similarity ranging from 12 to 100% among species. The heat map was generated for all four instars. Two ABCs (c74925.graph_c0, c71552.graph_c0) were highly expressed in the third and fourth instar larvae ( Figure S3a). One ALP (c73032.graph_c0) was highly expressed in the first and second instar, and one ALP (c72382.graph_c0) was highly expressed in the third and fourth instar ( Figure S3b). c70335.graph_c0 annotated as CAD maintains high expression in all the instars ( Figure  S3c). In APNs, c80143.graph_c0 was highly expressed in the first and second instar, and c77905.graph_c0 was highly expressed in the third instar ( Figure S3d).

Candidate Genes Related to Bt Receptors
A total of 308 genes associated with Bt receptors were detected. Among them, four extensively studied Bt receptors were identified and are described below. Transcriptome data demonstrated that 35% of APNs and 43% of CADs represent approximately 11.36 and 13.96% of all Bt receptors. Thirteen ALPs were identified in the transcriptomic data; it is known that ALPs play an important role in the insecticidal process of Bt toxin. In addition to APN, ALP, and CAD, 217 ABCs were identified, contributing to 70.45% of the total Bt receptors (Table S3). We performed a phylogenetic analysis using our candidate APNs, ALPs, ABC transporters, and cadherins and those from other insect species, including A. glabripennis, T. castaneum, A. gambiae, S. litura, and D. melanogaster (Figure 3). APN genes in M. alternatus were very similar to A. glabripennis. M. alternatus Bt receptor orthologous pairs had levels of similarity ranging from 12 to 100% among species. The heat map was generated for all four instars. Two ABCs (c74925.graph_c0, c71552.graph_c0) were highly expressed in the third and fourth instar larvae ( Figure S3a). One ALP (c73032.graph_c0) was highly expressed in the first and second instar, and one ALP (c72382.graph_c0) was highly expressed in the third and fourth instar ( Figure S3b). c70335.graph_c0 annotated as CAD maintains high expression in all the instars ( Figure S3c). In APNs, c80143.graph_c0 was highly expressed in the first and second instar, and c77905.graph_c0 was highly expressed in the third instar ( Figure S3d).

Candidate Genes Related to the Digestion Process
A total of 502 genes associated with digestive proteases have been identified in the midgut of M. alternatus. Among these unigenes, serine proteases (SPs) were the most represented group (392, 78.09%), followed by cysteine proteases (70, 13.94%), metalloproteases (29, 5.78%), and aspartic proteases (11, 2.19%) (Table S4). SPs are the main digestive enzymes in the midgut of lepidopteran insects, including trypsin and chymotrypsin. Interestingly, two hundred and twenty-three trypsin and seventy-three chymotrypsin were found in the serine protease family. Phylogenetic trees were constructed to understand further the relationship of trypsin and chymotrypsin of M. alternatus to those present in other insect species (Figures 4 and 5). The expression levels of genes of trypsin and chymotrypsin at different stages are presented in Figures S4 and S5. In particular, three genes of trypsin (c72215.graph_c0, c75639.graph_c0, c69988.graph_c0) and six genes of chymotrypsin (c63619.graph_c0, c72323.graph_c0, c78183.graph_c0, c68958.graph_c0, c69988.graph_c0, c77789.graph_c0) were highly expressed in the first and second instar. Two genes of trypsin (c74031.graph_c0, c77708.graph_c0) and one chymotrypsin gene (c75737.graph_c0) were highly expressed in the third instar larvae.

Candidate Genes Related to the Digestion Process
A total of 502 genes associated with digestive proteases have been identified in the midgut of M. alternatus. Among these unigenes, serine proteases (SPs) were the most represented group (392, 78.09%), followed by cysteine proteases (70, 13.94%), metalloproteases (29, 5.78%), and aspartic proteases (11, 2.19%) (Table S4). SPs are the main digestive enzymes in the midgut of lepidopteran insects, including trypsin and chymotrypsin. Interestingly, two hundred and twenty-three trypsin and seventy-three chymotrypsin were found in the serine protease family. Phylogenetic trees were constructed to understand further the relationship of trypsin and chymotrypsin of M. alternatus to those present in other insect species (Figures 4 and 5). The expression levels of genes of trypsin and chymotrypsin at different stages are presented in Figure S4 and Figure S5. In particular, three genes of trypsin (c72215.graph_c0, c75639.graph_c0, c69988.graph_c0) and six genes of chymotrypsin (c63619.graph_c0, c72323.graph_c0, c78183.graph_c0, c68958.graph_c0, c69988.graph_c0, c77789.graph_c0) were highly expressed in the first and second instar. Two genes of trypsin (c74031.graph_c0, c77708.graph_c0) and one chymotrypsin gene (c75737.graph_c0) were highly expressed in the third instar larvae.

Discussion
Coleoptera is the most significant order of insects with the largest number of species, some of which are important pests in agriculture and forestry. A lot of major transcriptome data about Coleoptera have been published, such as the genome of Dendroctonus ponderosae, the transcriptome characteristics of Dastarcus helophoroides, and the transcriptome data of the 4th instar larva of M. alternatus [31][32][33]. To further analyze the differences among different larval instars, we have described the transcriptomic data of the first, second, third, and fourth instar larvae, and moreover we have identified lignin-degrading enzymes, Bt receptors, and midgut protease gene families.
Because the second instar larvae feed on the phloem of pinewood and the third instar larvae transfer to the xylem, they need to degrade lignocellulose to obtain nutrients and other living substances. It is therefore of importance to determine the differential expression of lignocellulose in larvae. In this study, transcriptome data of different stages of M. alternatus larvae were analyzed, and lignocellulose candidate genes were obtained by comparison, which including many ADH genes and were annotated into the process of lipid anabolic metabolism. The results suggest that the ADH gene may play an essential role in carbohydrate metabolism, and lipid anabolic processes in the larva. According to the tissue-specific expression pattern of alcohol dehydrogenase in Helicoverpa armigera, the expression level of the ADH5 gene is the highest in the midgut because the midgut is the main digestive part of H. armigera, where there is considerable material and energy metabolism [34]. Further study of the lignocellulose gene will be helpful for developing a new tool for the biological control of M. alternatus.
The Bacillus thuringiensis Cry toxin has been widely used as a microbial insecticide for pest control. At present, four Bt receptors, including ALP, APN, CAD, and ABC transporters, have been identified in the fourth instar larvae of M. alternatus [33]. In the current study, 308 genes related to Bt receptors were placed in the first, second, third, and fourth instar larvae, including the above four widely studied Bt receptors. ALP plays an important role in the insecticidal process of Bt toxin. One ALP (c73032.graph_c0) was Forests 2021, 12, 1312 9 of 12 highly expressed in the first and second instar larvae, which may affect the toxicity of Bt toxin and lead to low insecticidal activity against M. alternatus. Previous studies have shown that ALP-specific peptides block the binding of Cry11Aa to ALP and reduce toxicity, suggesting that ALP plays a functional role in mediating susceptibility to the Cry toxin [35].
APN was the first toxin receptor protein ever discovered. An exogenous protease binds to the midgut membrane through glycosylphosphatidylinositol (GPI) [36]. Among the APNs, the expression of c80143.graph_c0 was highest in the first and second instar larvae, and c77905.graph_c0 was highest in the third instar larvae. This may suggest that different APNs have specific functions at different stages of development, such as in the midgut of the larva, where they can digest proteins from the insect diet [37]; they also act as a Cry1A binding protein in the binding process of Bt toxin [38]. CAD belongs to the transmembrane glycoprotein family, which mainly promotes the oligomerization of Cry proteins into the cell membrane by binding multiple binding sites of the extracellular near membrane region or extracellular repeat sequences to domain II of the Cry protein.
One CAD (c70733.graph_c0) was highly expressed in all stages of the larvae. This may indicate that the gene is indispensable in larval development and can be used as a potential target for further pest control. The exact physiological function of cadherin is unknown, but strict control of cadherin levels during larval development suggests their importance in maintaining the midgut epithelial tissue [39]. We identified 217 ABCs but only two genes (c71552.graph_c0, c74925.graph_c0) were highly expressed in the third and fourth instar larvae. It has been reported that this reduced toxin-binding ability is due to an HaABCC2 inactivation mutation, suggesting that ABCC2 may play an essential role in the toxin perforation process [40]. The expression levels of Bt receptors were different at different developmental stages. By comparing the transcriptomic analysis of gene expression differences at different development stages, these results provided a basis for the analysis of the weak insecticidal activity of Bt toxin and the development of transgenic biocontrol bacteria-important new information for the control of M. alternatus.
Midgut proteases contribute to the action mode of Bt Cry toxin by participating in the dissolution of toxin crystals and activation of the protoxin. They can also further degrade the activated toxin protein, thereby affecting the specificity of the activated toxin [41]. It is important to study the type and abundance of insect midgut proteases to understand the resistance of insects to Bt Cry toxin [33]. The four main proteases in insects are serine protease (SPs), cysteine protease (CPs), aspartic protease (APs), and metalloprotease. We identified 502 midgut proteinase-related genes in the larval transcriptomes of M. alternatus. The most representative group is the serine protease, which is involved in various physiological and biochemical processes such as food digestion, immune response, and signal transduction [42]. Furthermore, 223 trypsin and 73 chymotrypsin genes were found in serine proteases. Trypsin specifically hydrolyzes the peptide bonds formed by lysine and arginine, while chymotrypsin remains intact in a highly alkaline environment. Therefore, both contribute to the dissolution of crystal proteins produced by Bt toxin in the midgut of longhorns. In addition, metalloproteinases are resistant to Beauveria bassiana infection and play an important role in insect development [43]. The insect midgut has become a significant target for Bt-derived pesticides and alternatives [44]. Research on the midgut proteinase of larvae offers the possibility of finding more suitable targets for Bt toxin.

Conclusions
In this study, the transcriptome from the four instars of the M. alternatus larva was determined and assembled to a high standard. Functional annotation and differential expression analysis were performed on certain genes. The results show that there were 4314 differentially expressed genes between the second and third-instars. Furthermore, qPCR results also verified this differential expression. The identification of lignocelluloserelated candidate genes revealed genetic diversity between the ADH gene family and other insects. The differential expression of ADH in different larval instars may suggest that ADH plays an important role in nutrient metabolism and anabolic lipid metabolism. The high expression of Bt-related receptor genes indicated that the larval stage of M. alternatus is the key period for the use of Bt as a control approach. The identification of trypsin and chymotrypsin enriched in intestinal proteases related to the digestive process provides the possibility for Bt toxin to find suitable targets in the intestinal tract of M. alternatus. High-quality transcriptome data and gene differential expression data are valuable in supporting further research on the physiological development of Coleoptera pests and the search for new control strategies.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/ 10.3390/f12101312/s1, Table S1: Summary of transcriptome data from M. alternatus, Table S2: Lignocellulose genes identified from the M. alternatus transcriptome, Table S3: Bt receptor genes identified from the M. alternatus transcriptome, Table S4: Intestinal digestive enzyme genes identified from the M. alternatus transcriptome, Additional File 1: All gene expression levels are based on the M. alternatus FPKM value, Additional File 2: PCR primers for 12 differentially expressed genes, Figure S1: Length distribution of M. alternatus unigenes, Figure S2: Heat map of alcohol dehydrogenase gene expression, Figure