Molecular Cloning and Expression Analysis of the Endogenous Cellulase Gene MaCel1 in Monochamus alternatus

: The purpose of this study was to characterize the endogenous cellulase gene MaCel1 of Monochamus alternatus , which is an important vector of Bursaphelenchus xylophilus , a pine wood nematode, which causes pine wilt disease (PWD). In this study, MaCel1 was cloned by rapid ampliﬁcation of cDNA end (RACE), and its expression analyzed by RT-qPCR (real-time quantitative PCR detecting). A total of 1778 bp of cDNA was obtained. The encoding region of this gene was 1509 bp in length, encoding a protein containing 502 amino acids with a molecular weight of 58.66 kDa, and the isoelectric point of 5.46. Sequence similarity analysis showed that the amino acids sequence of MaCel1 had high similarity with the β -Glucosinolate of Anoplophora glabripennis and slightly lower similarity with other insect cellulase genes (GH1). The β -D-Glucosidase activity of MaCel1 was 256.02 ± 43.14 U / L with no β -Glucosinolate activity. MaCel1 gene was widely expressed in the intestine of M. alternatus. The expression level of MaCel1 gene in male (3.46) and female (3.51) adults was signiﬁcantly higher than that in other developmental stages, and the lowest was in pupal stage (0.15). The results will help reveal the digestive mechanism of M. alternatus and lay the foundation for controlling PWD by controlling M. alternatus .


Introduction
Cellulose is a type of natural polymer compound widely present in plant tissues. It is the most abundant biomass resource on earth, and it is also an important industrial raw material and renewable energy resource [1]. Cellulose is the main nutrient in the food of wood borers and plays an important role in normal growth and development [2]. The digestive system of selected phytophagous insects has been examined as a potential resource for identification of novel cellulolytic enzymes with potential industrial applications [3]. Plants have rigid cells composed of abundant cellulose polymers strengthened by hydrogen bonds and van der waals forces. Phytophagous insects need cellulase, pectinase, lignan, xylanase, and xyloglucanase to digest plant materials. Insects have a complete and

Test Materials
The infected trees of P. massoniana in Guantou Town, Lianjiang County, Fuzhou City, Fujian Province (26.150 • N; 119.593 • E) were used as experimental material. The infected trees were felled and sawn to a length of 1 m. The early larvae, late larvae and pupae of M. alternatus were collected in phloem and xylem. Then, the larvae were left to grow to the adult stage in the infected trees. Adults, both male and female, were collected and fed with fresh pine branches for later use.

RNA Isolation and cDNA Synthesis
We took 10 early larvae, late larvae, pupae, female and male adults of M. alternatus each and cut a small hole in their tail. The intestines were picked out with an insect needle under a stereo microscope. The total RNA was extracted according to the Trizol reagent instructions and stored at −80 • C. The first strand of cDNA was synthesized according to the instructions of Bestar TM qPCR RT Kit and stored at −20 • C. The cDNA was used as a template for quantitative expression analysis. In addition, the first strand of cDNA was synthesized (Fermentas RevertAid First Strand cDNA Synthesis Kit) from fresh intestinal tissue of the late larvae of M. alternatus, which was used to amplify the middle fragment of the gene.

Amplification of the Intermediate Fragment of MaCel1
The M. alternatus cellulase gene was screened out by referring to the transcription data and annotation information. The primers were designed and synthesized by Shenggong Biological Engineering Co., Ltd. (Shanghai, China) ( Table 1). The first strand of synthesized cDNA was used as a template for PCR amplification. The reaction mixture was as follows: PCR-Grade Water 15 µL, 2X Ex taq Buffer (takara) 25 µL, dNTP Mix (10 mM) 1 µL, Ex taq (takara) 1 µL, cDNA first strand 5 µL, and forward and reverse primers 1.5 µL each. PCR amplification of the cDNA of MaCel1 was performed at 94 • C for 2 min followed by 35 cycles of 94 • C for 30 s, 55 • C for 30 s, and 72 • C for 30 s. The PCR products were electrophoresed on 1.0% agarose gel. After that, the target band was quickly cut out and recycled according to the instructions of the OMEGA kit. The purified PCR product was connected to plasmids pMD18T, which is a high-efficiency cloning vector. After transformation, the positive clones were sequenced, and sequence comparison analysis was performed to verify the transcriptome sequence.

Full-Length RACE of MaCel1
The three specific 5 RACE primers and two specific 3 RACE primers were designed by Premier 5.0 software. The primers were synthesized by Shenggong Biological Engineering Co., Ltd. (Shanghai, China). and used to clone the 5 terminal and 3 terminal sequences of MaCel1 gene.
The total RNA of the M. alternatus intestine was used as the template and gene-specific primers (GSP1) ( Table 1) as the primer. The first strand of the cDNA was synthesized by Superscript II RT enzyme. RNA was removed from the cDNA using RNase Mix. The dC-tailed cDNA was amplified by nested PCR with the first-round primers of GSP-2 (Table 1) and abridged anchor primer (AAP), followed by the second-round primers of GSP-3 (Table 1) and abridged universal amplification primer (AUAP) in the kit. The PCR product was linked to pMD18T and the positive clones were sequenced.
The total RNA of the M. alternatus intestine was used as the template to synthesize the cDNA of 3 RACE. The cDNA was synthesized by SMARTScribe TM reverse transcriptase and primer 3 CDS primer A. The cDNA was then used as a template of the first round of PCR and the primers were 3 977-1 (Table 1) and universal primer (UPM). The template of the second round PCR was cDNA with primers of 3 977-2 (Table 1) and UPM. The PCR was linked to pMD18T and the positive clones were sequenced by Shenggong Biological Engineering Co., Ltd. (Shanghai, China). Finally, the full-length cDNA sequence of MaCel1 gene was obtained by Vector NTI10.3.0. The clone sequence was compared with the cDNA sequence to confirm that the recombinant vector was successfully obtained.

Prokaryotic Expression and Purification of MaCel1
The full-length CDS (Met1-Leu502) were amplified using pMD8-T-Cel plasmid as the template. After recycling, the CDS was linked to 5 EcoRI-XhoI3 of Pet-32a vector. The vector was connected and converted to DH5α competent cells. The recombinant plasmid pET32a-Cel was screened and identified. Then, pET32a-Cel was converted to Escherichia coli Rosetta (DE3) and identified. Rosetta (pET32a-MaCel1) was inoculated into Luria-Bertani (LB) medium and cultured overnight at 37 • C for 220 r/min. Isopropyl-beta-d-thiogalactopyranoside (IPTG) was added and the final concentration was 0.5 mmol/L at 18 • C for 10 h when OD600 was between 0.4 and 0.6. The recombinant cells were centrifuged for 10 min at 12000 r/min. Then the supernatants and precipitates were collected separately, and the solubility of the fusion protein was detected by 12% sodium dodecyl sulfate-polyacrylamide gel electrophoresis (SDS-PAGE). The inclusion bodies were obtained from the induction of Rosetta-pET32a-MaCel1. The inclusion bodies were denatured by adding urea. The soluble MaCel1 recombinant protein was obtained when the denatured inclusion bodies were added into the refolding buffer. The heteroprotein was removed by Ni-NTA (network terminal appliance) column. The MaCel1 protein was concentrated on a millipore 15 mL 10K (UFC910096) ultrafiltration centrifuge column.

Determination of Enzyme Activity of MaCel1
The activities of β-D-Glucosidase and β-Glucosinolate of MaCel1 protein were determined by 3,5-Dinitrosalicylic acid (DNS) colorimetric method with allyl glucoside and salicylic acid as the substrates. An enzyme activity unit was defined as the number of micromoles per minute per milligram of protein producing reducing sugars. The glucose standard curve was run and measured at 520 nm wavelength by enzyme-labeled instrument [36,37]. The DNS method was used to detect the glucose enzyme in the sample [36,37]. The method for determination of the β-D-Glucosidase activity was the same as that for determination of the glucosidase activity. The substrate was allyl glucoside [38,39].

Bioinformatics Analysis
DNAMAN software was used to predict the coding sequence of the target gene. The open reading frame (ORF) finder (www.ncbi.nlm.nih.gov/gorf/gorf.html) was used to predict the ORF of MaCel1. The physical and chemical properties of MaCel1 encoded protein were predicted by Protparam (https://web.expasy.org/protparam/) while SignalP4.1 (http://www.cbs.dtu.dk/services/SignalP/) was used to predict the signal peptide encoding proteins.
The transmembrane structure and hydrophilicity of the protein encoded by MaCel1 gene were predicted by TMpredServer (https://embnet.vital-it.ch/software/TMPRED_form.html). The prediction of the secondary and tertiary structure of MaCel1 protein was achieved by using SOPMA (http://scop.mrc-lmb.cam.ac.uk/scop/). The amino acid sequences were compared by ClustalW2 software (http://www.ebi.ac.uk/Tools/clustalw2/index.html). The sequences were downloaded from the NCBI and the phylogenetic tree was constructed with the LG+G evolutionary model and the maximum likelihood phylogenetic method by MEGA X software.

Quantitative Expression of MaCel1
RT-qPCR was used to analyze the expression of the MaCel1 gene in the intestinal tract of M. alternatus. The RT-qPCR was performed and analyzed using Bestar ® SybrGreen qPCRmasterMix and a StepOnePlus ™ real-time fluorescence quantitative PCR instrument (Corbett Research, Australia). The glyceraldehyde-3-phosphate dehydrogenase (GAPDH) gene (Table 1) was used as a reference gene to normalize expression data [37]. The RT-qPCR was performed in a final volume of 20 µL in a mixture containing 10 µL Bestar ® SybrGreen qPCRmasterMix, 0.5 µL each of specific forward and reverse primers, 7 µL ddH 2 O, and 2 µL cDNA. The amplification program was 95 • C for 2 min followed by 40 cycles of 94 • C for 20 s, 58 • C for 20 s, and 72 • C for 30 s. The specificity of the amplification was controlled by melting curve analysis (from 65 to 95 • C). The relative expression of MaCel1 gene was calculated by the 2 −∆∆Ct method, which is applicable when the amplification efficiency of the target and reference genes is approximately equal [40]. The data were analyzed by the one-way ANOVA using IBM SPSS software. Normality of the data was evaluated with the Kolmogorov-Smirnov test.

Prediction of Endogenous Cellulase in M. alternatus
Gene annotation was mainly based on the obtained sequencing information [28]. The gene sequence was compared with kyoto encyclopedia of genes and genomes (KEGG), cluster of orthologousgroup (COG), gene ontology (GO) and RefSeq non-redundant proteins (Nr) to obtain the corresponding functional annotation information, and the possible functions of the genes were inferred based on the results of each database [41]. The Blast comparison was conducted between the genes and each database and then filtered by Blast results. The sequence was annotated of the highest score comparison results (default identity ≥40%, coverage ≥40%). The expression of the lignocellulase-related genes was calculated by reads per kilobase per million mapped reads (RPKM) method [42]. The RPKM method could eliminate the influence of gene length and sequencing amount on the expression of the genes. Thus, the RPKM values could be directly used to differentiate the gene expression levels among different samples.

Prediction of Endogenous Cellulase in M. alternatus
M. alternatus had 14 cellulase enzymes, including 3 exo-glucanase (GH48), 7 endoglucanase (GH5, GH9, GH45), and 4 β-Glucosidase (GH1, GH3), belonging to 6 glycosidase families (Table A1). There were two up-regulated genes, c79679 and c81097 (Figure 1), and six down-regulated genes, namely c89763, c39850, c77902, c94844, c71020, and c86570, in the late larvae compared with the early larvae. The absolute expression of the cellulase gene at different developmental stages was compared. The results show that the c39850 gene was found to be the highest in the early larvae (4445 RPMK), and c77902, c71020, and c86570 genes had more than 200 reads per kilobase per million mapped reads (RPMK). In the late larvae, the highest expression of c81097 gene was 83 RPMK, the lowest was 55 RPMK for c79679 gene, and the expression level of other genes did not exceed 20 RPMK.

Cloning and Sequence Analysis of MaCel1
According to the verification results of 5 RACE and 3 RACE based on the transcriptome sequence, the cDNA with a total length of 1778 bp was obtained. The 5 terminal non-coding region was 183 bp, and the 3 terminal non-coding region was 86 bp. We named the gene MaCel1 (KY073339). Sequence analysis revealed an ORF of 1509 bp, which encoded a predicted protein of 504 amino acids with a calculated molecular weight of 58.66 kDa and theoretical pI value of 5.46. The molecular formula of the protein was C 2685 H 4005 N 687 O 771 S 13 , and the instability coefficient was 41.66, which suggested that the protein was an unstable protein. The average hydrophilicity of MaCel1 gene was −0.449. MaCel1 gene coding protein contained the hydrophobic regions, which indicated that the MaCel1 coding protein was a hydrophilic protein. Using TMpred server online software to predict the transmembrane structure of the protein encoded by the MaCel1 gene, the results showed one transmembrane domain and three suspected transmembrane domains. The results of secondary structure of the protein was 37.85% α-helix related protein function, 7.17% β-sheet related protein function, and 15.14% the stretched chain between α-helix and irregular curl. The main skeleton of the protein in the tertiary structure was α-helix ( Figure 2). larvae. The absolute expression of the cellulase gene at different developmental stages was compared. The results show that the c39850 gene was found to be the highest in the early larvae (4445 RPMK), and c77902, c71020, and c86570 genes had more than 200 reads per kilobase per million mapped reads (RPMK). In the late larvae, the highest expression of c81097 gene was 83 RPMK, the lowest was 55 RPMK for c79679 gene, and the expression level of other genes did not exceed 20 RPMK.

Cloning and Sequence Analysis of MaCel1
According to the verification results of 5′RACE and 3′RACE based on the transcriptome sequence, the cDNA with a total length of 1778 bp was obtained. The 5'terminal non-coding region was 183 bp, and the 3'terminal non-coding region was 86 bp. We named the gene MaCel1 (KY073339). Sequence analysis revealed an ORF of 1509 bp, which encoded a predicted protein of 504 amino acids with a calculated molecular weight of 58.66 kDa and theoretical pI value of 5.46. The molecular formula of the protein was C2685H4005N687O771S13, and the instability coefficient was 41.66, which suggested that the protein was an unstable protein. The average hydrophilicity of MaCel1 gene was −0.449. MaCel1 gene coding protein contained the hydrophobic regions, which indicated that the MaCel1 coding protein was a hydrophilic protein. Using TMpred server online software to predict the transmembrane structure of the protein encoded by the MaCel1 gene, the results showed one transmembrane domain and three suspected transmembrane domains. The results of secondary structure of the protein was 37.85% α-helix related protein function, 7.17% β-sheet related protein function, and 15.14% the stretched chain between α-helix and irregular curl. The main skeleton of the protein in the tertiary structure was α-helix ( Figure 2).

Similarity Analysis of MaCel1
The amino acid sequences of the MaCel1 gene and other insect Cel genes obtained from Blast were compared using DNAMAN V6 software (Figure 3). The phylogenetic analysis of the Cel genes

Similarity Analysis of MaCel1
The amino acid sequences of the MaCel1 gene and other insect Cel genes obtained from Blast were compared using DNAMAN V6 software (Figure 3). The phylogenetic analysis of the Cel genes from 9 insect species, including the MaCel1 gene, was performed using MEGA X (Figure 4). The results showed that the MaCel1 gene was most closely related to the β-Glucosinolate of A. glabripennis while Harpegnathos saltator was the outgroup species.

Prokaryotic Expression of MaCel1 Gene and Determination of Enzyme Activity of MaCel1
SDS-PAGE analysis showed that a specific protein band of about 76 kD was expressed in the E. coli containing MaCel1 induced by IPTG ( Figure 5A), which fitted the estimated size. The renatured protein was obtained by purification ( Figure 5B). The results of enzyme activity showed that the β-D-Glucosidase activity was 256.02 ± 43.14 U/L, but no activity of the β-Glucosinolate was observed.

Expression Analysis of MaCel1
In order to study the relative expression of cellulase in M. alternatus at different development stages, RT-qPCR analysis was carried out on the early larvae, the late larvae, pupae, female and male adults. There was a significant difference in the relative expression level of the MaCel1 gene among different developmental stages (ANOVA, F = 20.822, df = 4, p = 0.002). The relative expression of

Prokaryotic Expression of MaCel1 Gene and Determination of Enzyme Activity of MaCel1
SDS-PAGE analysis showed that a specific protein band of about 76 kD was expressed in the E. coli containing MaCel1 induced by IPTG ( Figure 5A), which fitted the estimated size. The renatured protein was obtained by purification ( Figure 5B). The results of enzyme activity showed that the β-D-Glucosidase activity was 256.02 ± 43.14 U/L, but no activity of the β-Glucosinolate was observed.

Prokaryotic Expression of MaCel1 Gene and Determination of Enzyme Activity of MaCel1
SDS-PAGE analysis showed that a specific protein band of about 76 kD was expressed in the E. coli containing MaCel1 induced by IPTG ( Figure 5A), which fitted the estimated size. The renatured protein was obtained by purification ( Figure 5B). The results of enzyme activity showed that the β-D-Glucosidase activity was 256.02 ± 43.14 U/L, but no activity of the β-Glucosinolate was observed.

Expression Analysis of MaCel1
In order to study the relative expression of cellulase in M. alternatus at different development stages, RT-qPCR analysis was carried out on the early larvae, the late larvae, pupae, female and male adults. There was a significant difference in the relative expression level of the MaCel1 gene among different developmental stages (ANOVA, F = 20.822, df = 4, p = 0.002). The relative expression of

Expression Analysis of MaCel1
In order to study the relative expression of cellulase in M. alternatus at different development stages, RT-qPCR analysis was carried out on the early larvae, the late larvae, pupae, female and male adults. There was a significant difference in the relative expression level of the MaCel1 gene among different developmental stages (ANOVA, F = 20.822, df = 4, p = 0.002). The relative expression of MaCel1 was the highest in both male and female adults, followed by the early larvae, while it was the lowest in late larvae and pupal stage ( Figure 6). MaCel1 was the highest in both male and female adults, followed by the early larvae, while it was the lowest in late larvae and pupal stage ( Figure 6).

Discussion
Insects that feed on plant materials, such as termites and longhorn beetles, can use cellulose as a carbon source. Thus, cellulase is very important to digest the cellulose that the insects eat. We obtained the endogenous cellulase gene c39850 sequence based on the transcriptome data of M. alternatus at different developmental stages and named it MaCel1. The amino acid sequence of MaCel1 gene and β-Glucosidase gene of A. glabripennis had the highest similarity; suggesting that these two insects have similar evolutionary timing. The enzyme activity of the β-D-Glucosidase of the MaCel1 was 256.02 ± 43.14 U/L, which indicated that MaCel1 protein had certain β-D-Glucosidase activity. It is suggested that MaCel1 may be a new member of the insect cellulase gene and its catalytic properties should be further investigated. There are different types of cellulase in different types of insects [43]. The β-D-Glucosidase activity of the MaCel1 was higher than that of the M. alternatus digestive tract previously reported, which might be caused by the different test objects [25]. M. alternatus intestinal crude enzyme solution was used in the previous study but purified protease was used in this experiment. Degradation enzymes have been identified in the digestive track of Thermobia domestica, including cleavage double monooxygenase, which weaken cellulose fibers and make cellulose fibers easier to degrade [44]. By using the same expression vector to express two endogenous lignocellulase enzymes of Macrotermes barneyi, it was found that the synergistic factor was increased and the synergistic effect was better than when expressed separately [45]. The next step is to try to express lignocellulase with high activity and good properties in the same host.
By using real-time fluorescent quantitative PCR, we showed that the relative expression of MaCel1 in the early larvae of M. alternatus was higher than that in the late larvae. Our result is consistent with a previous study that showed lower cellulase activity in Ceratoides glabra with the increase of the larval age [20]. The cellulase activity of the A. glabripennis larvae was also measured under natural conditions and a similar pattern was found [21]. This phenomenon may be related to the feeding pattern of the M. alternatus larvae. The young stage is the peak feeding period, with a large amount of food digested and a greater demand for cellulase. In the advanced larval stage, the less it fed, the less it demanded cellulase. During the pupal stage of M. alternatus, the expression of MaCel1 was the lowest. At this stage, M. alternatus had stopped feeding, and the main way to obtain energy is to decompose and digest its own fat, thus the demand for cellulase was very low. After emergence of M. alternatus adults, they feed on fresh pine branches to supplement their nutrition,

Discussion
Insects that feed on plant materials, such as termites and longhorn beetles, can use cellulose as a carbon source. Thus, cellulase is very important to digest the cellulose that the insects eat. We obtained the endogenous cellulase gene c39850 sequence based on the transcriptome data of M. alternatus at different developmental stages and named it MaCel1. The amino acid sequence of MaCel1 gene and β-Glucosidase gene of A. glabripennis had the highest similarity; suggesting that these two insects have similar evolutionary timing. The enzyme activity of the β-D-Glucosidase of the MaCel1 was 256.02 ± 43.14 U/L, which indicated that MaCel1 protein had certain β-D-Glucosidase activity. It is suggested that MaCel1 may be a new member of the insect cellulase gene and its catalytic properties should be further investigated. There are different types of cellulase in different types of insects [43]. The β-D-Glucosidase activity of the MaCel1 was higher than that of the M. alternatus digestive tract previously reported, which might be caused by the different test objects [25]. M. alternatus intestinal crude enzyme solution was used in the previous study but purified protease was used in this experiment. Degradation enzymes have been identified in the digestive track of Thermobia domestica, including cleavage double monooxygenase, which weaken cellulose fibers and make cellulose fibers easier to degrade [44]. By using the same expression vector to express two endogenous lignocellulase enzymes of Macrotermes barneyi, it was found that the synergistic factor was increased and the synergistic effect was better than when expressed separately [45]. The next step is to try to express lignocellulase with high activity and good properties in the same host.
By using real-time fluorescent quantitative PCR, we showed that the relative expression of MaCel1 in the early larvae of M. alternatus was higher than that in the late larvae. Our result is consistent with a previous study that showed lower cellulase activity in Ceratoides glabra with the increase of the larval age [20]. The cellulase activity of the A. glabripennis larvae was also measured under natural conditions and a similar pattern was found [21]. This phenomenon may be related to the feeding pattern of the M. alternatus larvae. The young stage is the peak feeding period, with a large amount of food digested and a greater demand for cellulase. In the advanced larval stage, the less it fed, the less it demanded cellulase. During the pupal stage of M. alternatus, the expression of MaCel1 was the lowest. At this stage, M. alternatus had stopped feeding, and the main way to obtain energy is to decompose and digest its own fat, thus the demand for cellulase was very low. After emergence of M. alternatus adults, they feed on fresh pine branches to supplement their nutrition, and the main component of pine branches is still cellulose, indicating that the MaCel1 should play an important role in the digestion process by M. alternatus adults. The endoglucanase activity in adults was also higher than that in the larvae from Cyrtotrachelus buqueti Guer [46]. The relative expression level of MaCel1 in adults was significantly higher than that in other stages due to multiple factors. Due to the complexity of the cellulase system of M. alternatus and the high expression level of other kinds of cellulase, MaCel1 only plays the auxiliary function [22]. On the other hand, it is also possible that the cellulase secreted by exogenous microorganisms plays a major role in the degradation of cellulose, as the cellulase activity of M. alternatus adults differs under different feeding conditions [47].
Research on the effects of RNAi on termites by targeting the conserved regions of the five endoglucanase genes of T. domestica showed that the injection of dsRNA and oral administration may cause significant silencing of the Coptotermes formosanus (CfEGs) gene, leading to death, decreased enzyme activity, and weight loss [48]. The cellulase activity in termites can be successfully reduced by using cellobiose imidazole and fluoromethyl cellobiose as cellulase inhibitors [49]. Thus, it is possible to control the population of M. alternatus by restricting its digestive function.

Conclusions
The results demonstrate the existence of the endogenous cellulase gene, MaCel1 in the digestive track of M. alternatus with 14 cellulase enzymes. The expression level of MaCel1 differed among developmental stages, being the highest in the adult stage. The β-D-Glucosidase activity of MaCel1 was higher than the β-Glucosinolate activity. The results of this study provide a theoretical basis for elucidating the digestive physiology of M. alternatus and developing new control strategies of M. alternatus, thereby controlling pine wilt disease.

Conflicts of Interest:
The authors declare no conflict of interest.