Copy Number Variations of Glycoside Hydrolase 45 Genes in Bursaphelenchus xylophilus and Their Impact on the Pathogenesis of Pine Wilt Disease

: The pine wood nematode Bursaphelenchus xylophilus parasitizes millions of pine trees worldwide each year, causing severe wilt and the death of host trees. Glycoside hydrolase 45 genes of B. xylophilus are reported to have been acquired by horizontal gene transfer from fungi and are responsible for cell wall degradation during nematode infection. Previous studies ignored the possibility of copy number variations of such genes. In this study, we determined that two of the glycoside hydrolase 45 genes evolved to maintain multiple copies with distinct expression levels, enabling the nematode to infect a variety of pine hosts. Additionally, tandem repeat variations within coding regions were also detected between different copies of glycoside hydrolase 45 genes that could result in changes in protein sequences and serve as an effective biological marker to detect copy number variations among different B. xylophilus populations. Consequently, we were able to further identify the copy number variations of glycoside hydrolase 45 genes among B. xylophilus strains with different virulence. Our results provide new insights into the pathogenicity of B. xylophilus , provide a practical marker to genotype copy number variations and may aid in population classiﬁcation.


Introduction
The pine wood nematode (PWN) Bursaphelenchus xylophilus (Steiner & Buhrer) Nickle is an invasive species and the causative agent of pine wilt disease, which originated in North America [1]. B. xylophilus has not caused disease in the local pine species in North America but it poses a major threat to all pine forests in Asia and Europe [2]. In China, the infected area has reached over 660 counties in 18 provinces according to the State Forestry and Grassland Administration of China. Researchers have been struggling to elucidate the pathogenic mechanism of B. xylophilus in order to efficiently manage this parasite [3][4][5][6].
To better understand this nematode, the whole genome of B. xylophilus was published in 2011 [7]. Since the announcement of the first short-read sequencing-based genome assembly, high-throughput sequencing has frequently been used to identify pathogenic traits of B. xylophilus. This effort focused on genes, noncoding RNAs and genomic variations, which expanded our knowledge of its molecular pathogenesis [6,[8][9][10][11][12]. However, most of these studies usually focused on analyzing genes with potential regulatory roles. In contrast, only a few reports have shed light on B. xylophilus multi-copy genes [9,13,14], which have been suggested to be associated with diseases in humans and other model organisms [15]. Glycoside hydrolase 45 (GH45) gene families are known to be B. xylophilusspecific and have been shown to have been acquired by horizontal gene transfer (HGT) and to be associated with cell-wall-degradation processes during infection [7,13]. However, there is little information available about the genome architectures and copy number variations of certain GH45 genes. Consequently, research is needed to investigate the copy numbers of these important genes and to gain novel insights into the pathogenesis of B. xylophilus.
Multi-copy genes are usually found within repeated regions, such as segment duplications (SDs) and tandem repeats (TRs) [16]. Many genomes of simple organisms contain similar sequence repeats ranging in size from TRs to SDs [17]. SDs are usually considered to be highly similar DNA fragments (>90% identity) between 1 and 400 kb, which occur more than once in the genome. They are known to be gene regulators and are responsible for gene functional diversification, gene family expansion and the adaption of the species to certain environments [18]. TRs are unstable genomic elements that are repeated head to tail and are usually prone to variation [19]. They have been used as genetic markers due to their unique polymorphic nature. Extensive studies have also revealed their potential role in the regulation of gene products associated with pathogenicity variation in microbes and in polygenic disorders in humans [15,20,21]. Despite their crucial biological importance, the accurate identification and profiling of multi-copy genes is difficult due to their highly similar sequence content [22].
To fill this knowledge gap, we set out to document the existence and copy number variations of GH45 in B. xylophilus based on our recently generated long-read genome assembly (AH1, PRJNA524063, manuscript under review). More importantly, the expression patterns of different GH45 copies were quantified during the pathogenesis process of pine wilt disease. Our results provide a new perspective for these pathogenesis-related genes that will inform current research on this devastating disease.

Sampling and Inoculation Test
The B. xylophilus strains AMA3 (strong virulent), USA (weak virulent) and GZ1 (weak virulent) were maintained in the laboratory of Prof. Jianren Ye for all downstream analyses. OKD-1 (weak virulent) was kindly provided by the University of Miyazaki [9]. All nematodes were cultured on a fungus (Botrytis cinerea Pers.) at 25 • C for 7-10 days to obtain sufficient sample sizes. The nematodes were isolated by the Baermann funnel method and were stored in a water suspension using a 1.5 mL centrifuge tube for downstream sequencing. Approximately 5000 AMA3 were inoculated into a small wound of Pinus thunbergii stems for 6, 12 and 24 h and cultivated at temperatures between 28 and 32 • C. Subsequently, the nematodes were extracted from the chips of the entire seedlings for 3 h using the Baermann funnel method. After they were washed three times in double-distilled (dd) H 2 O, the nematodes were collected by centrifugation at 3500 rpm for 3 min and frozen in liquid nitrogen for subsequent experimental validation.

Identification and Quantification of Multi-Copy GH45 Genes
The genomic coordinates and annotation of the four GH45 genes were obtained by the long-read sequencing assembly (PRJNA524063). The identification of SDs and TRs was performed by BLAST based on the annotation result. Multiple gene comparison was conducted by DNAMAN (Lynnon Corp, Quebec, Canada). On the transcriptome level, four specific primer pairs were designed on the first and last exons of each GH45 gene based on previous alignment results. Each of the primer pairs contained > two specific nucleotide sites. PCR was conducted by the following procedure: 94 • C for 2 min, followed by 37 cycles (94 • C for 30 s, 58 • C for 30 s and 72 • C for 30 s) and finally 72 • C for 10 min. The PCR products were examined using 1% agarose gel electrophoresis. At the genome level, five primers (Supplementary Table S2) were designed in the intergenic regions (P1, P2, P4 and P5) of the four GH45 genes and different combinations were utilized to validate their existence (P2F+P2R, P2F+P4R, P1F+P2R, P1F+P4R and P1F+P5R). Highly sensitive Takara PrimeStar Max enzyme was employed to perform long-range PCR with the three-step procedure described in the manufacture manual. The extension time was adjusted to 25s/kb as needed. The signal peptide cleavage site was predicted by (http://www.cbs.dtu.dk/services/SignalP/ (accessed on 20 January 2021)).
The RT-PCR test was conducted using an ABI StepOne plus real-time system (Applied Biosystems, Carlsbad, CA, USA) with SYBR Premix Ex Taq (Takara, Dalian, China). The results were normalized to the expression level of the B. xylophilus Actin gene (GenBank: EU100952). All RT-PCR experiments were conducted with four replicates (Supplementary Table S3).

Detection of Copy Number Variations
Suitable primers were designed from the flanking sequences of TRs. Exact copy numbers can be assessed appertaining to amplified product lengths (sense primer: CGTGGTG-GAAGGTAAGAC, antisense primer: GTTGGAGTGGTGCTTGAT). PCR was conducted by the following procedure: 94 • C for 2 min, followed by 35 cycles (94 • C for 30 s, 58 • C for 30 s and 72 • C for 30 s) and finally 72 • C for 10 min. The PCR products were visualized using 3% agarose gel electrophoresis.

Identification of Novel HGT Gene Families, Glycoside Hydrolase 45
In this study, four GH45 genes were determined (GH45_1: 1177 bp, GH45_2: 1091 bp, GH45_3: 912 bp and GH45_4: 1101 bp), located on the same contig adjacent to each other and separated by~1.5 kb intergenic intervals ( Figure 1a, Table 1). Based on the long-read assembly (AH1), GH45_1 and GH45_2 were considered to be different copies of the same gene with only 42 bp insertions (3 region) and 19 bp mutations (5 region), which were possibly caused by segmental duplications during evolution. GH45_3 and GH45_4 shared more abundant mutations at both 5 and 3 ends with no insertions and deletions (INDELs) and were considered two other copies of the GH45 gene. The existence of the four genes and their sequence variations was confirmed by PCR amplification followed by clone sequencing ( Table S1). In addition, specific primers were designed in the up and downstream intergenic regions of the four genes (P1, P2, P4 and P5) to indicate their genomic locations and architectures. The two gel bands from the same primer pairs (P2F+P2R, P2F+P4R and P1F+P2R) proved that the segmental duplication event occurred in P2 and P3. Additionally, the~8000 bp (P1F+P4R) and~9000 bp (P1F+P5R) fragments further indicated that the aforementioned four genes were sequentially located ( Figure 1c). The genome comparison results suggested that GH45_2 and GH45_4 were not found in both the short-read assembly using Japanese strain (JP) and the newly reported long-read assembly [23] (PRJEB40022). Furthermore, only GH45_1 and GH45_3 were reported ( Figure 1d). GH45_1 was determined to have six copies of 21mer TRs while GH45_2 had four copies. The extra two copies of TRs were in accordance with the aforementioned 42 bp insertions (Figures 1a and 2a-c).

Sequence Characterizations of Different GH45 Copies
The ORFs among different copies were altered by nucleotide variations and TRs. The longest ORF of GH45_1 was 1056 bp while that of GH45_2 was 1014 bp. The TR variations found in GH45_1 and GH45_2 altered the ORF regions and included two extra copies of repeated peptides (Figure 3). In addition, the ORFs between GH45_3 (162 bp) and GH45_4 (288 bp) diverged significantly. They shared the same start codons but the stop codons were modified by nucleotide variations. Motif locations and types were almost the same between GH45_1 and GH45_2, whilst GH45_2 missed one repeat motif, which was mostly due to the lack of two copies of TRs. Motifs between GH45_3 and GH45_4 varied on the 3 region. Among the 5 regions, different copies were more conserved (Supplementary Figure S2). GH45_1 and GH45_2 were predicted to possess the same signal peptide cleavage site with 21-22 amino acids, whilst GH45_3 and GH45_4 had no such site ( Figure 3).

Expression Profiling of Novel Multi-Copy GH45 Genes
Due to their highly similar sequences, the GH45 gene copies cannot be well quantified by RNAseq data. This is because sequence aligners are incapable of handling sequencing reads within highly repetitive regions. To confirm the real expressions among different GH45 copies, four multi-copy genes were quantified by a conventional qPCR method using specific primer pairs. The sensitivity of the four qPCR primer pairs was validated by clone sequencing in advance (Figure 4a,b, Supplementary Table S3, Supplementary  Figure 3). Subsequent expression profiling of the aforementioned GH45 genes indicated that they shared similar expression patterns with distinct expression levels in early infection stages. All GH45 genes reached their climax at 12 h after inoculation and decreased significantly at 24 h. The expression level of GH45_1 was significantly higher than the other copy, GH45_2. GH45_3 and GH45_4 were expressed at a very low scale when compared with GH45_1 and GH45_2, whilst the expression of GH45_3 was still higher than that of GH45_4 (Figure 4c). These results suggest that expressions of different GH45 copies vary significantly.

Expression Profiling of Novel Multi-Copy GH45 Genes
Due to their highly similar sequences, the GH45 gene copies cannot be well quantified by RNAseq data. This is because sequence aligners are incapable of handling sequencing reads within highly repetitive regions. To confirm the real expressions among different GH45 copies, four multi-copy genes were quantified by a conventional qPCR method using specific primer pairs. The sensitivity of the four qPCR primer pairs was validated by clone sequencing in advance (Figure 4a,b, Supplementary Table S3, Supplementary Figure S3). Subsequent expression profiling of the aforementioned GH45 genes indicated that they shared similar expression patterns with distinct expression levels in early infection stages. All GH45 genes reached their climax at 12 h after inoculation and decreased significantly at 24 h. The expression level of GH45_1 was significantly higher than the other copy, GH45_2. GH45_3 and GH45_4 were expressed at a very low scale when compared with GH45_1 and GH45_2, whilst the expression of GH45_3 was still higher than that of GH45_4 (Figure 4c). These results suggest that expressions of different GH45 copies vary significantly.

Copy Number Variations of GH45 Genes Among Different B. xylophilus Strains
Accurately genotyping the copy numbers of certain genes was difficult due to their highly similar sequence contents. However, the copy numbers of GH45_1 and GH45_2 can be assessed based on the aforementioned TR variations (Figure 1). The conventional PCR method would be adequate in identifying different copies with different TR combinations by designing suitable primers located in the flanking sequences of TRs. The PCR results combined with Sanger sequencing data indicate that there were two copies in AMA3 (six TRs; four TRs), three copies in USA (six TRs; five TRs; four TRs), one copy with five TRs in OKD-1 and one copy with seven TRs in GZ1 (Figure 5a-c). In addition, AMA3 was verified as a strong virulent strain while USA, OKD-1 and GZ1 were verified as weak virulent strains [9,12]. Obviously, the copy number variations and the TR variations of the GH45 gene existed among different nematode strains. A potential method would be to distinguish B. xylophilus strains that have a different virulence, as more strains with distinguished virulence variations should be included to provide more concrete conclusions in future works.

Copy Number Variations of GH45 Genes Among Different B. xylophilus Strains
Accurately genotyping the copy numbers of certain genes was difficult due to their highly similar sequence contents. However, the copy numbers of GH45_1 and GH45_2 can be assessed based on the aforementioned TR variations (Figure 1). The conventional PCR method would be adequate in identifying different copies with different TR combinations by designing suitable primers located in the flanking sequences of TRs. The PCR results combined with Sanger sequencing data indicate that there were two copies in AMA3 (six TRs; four TRs), three copies in USA (six TRs; five TRs; four TRs), one copy with five TRs in OKD-1 and one copy with seven TRs in GZ1 (Figure 5a-c). In addition, AMA3 was verified as a strong virulent strain while USA, OKD-1 and GZ1 were verified as weak virulent strains [9,12]. Obviously, the copy number variations and the TR variations of the GH45 gene existed among different nematode strains. A potential method would be to distinguish B. xylophilus strains that have a different virulence, as more strains with distinguished virulence variations should be included to provide more concrete conclusions in future works.

Discussion
The short-read-based assembly generally did not show an extraordinary capacity in characterizing long repetitive sequences. Consequently, few genes annotated as multiple copies with highly similar sequence contents and copy number variations have been previously reported in B. xylophilus [7,14]. In this study, we found that some B. xylophilusspecific GH45 genes had multiple copies with the help of long-read-based (PRJNA524063) assembly (the misidentification of GH45 copies in another long-read assembly may be caused by copy number variations-the sequencing strains may not contain any different copies). Hitherto, GH45_1 and GH45_3 were reported as Bx-eng-3 and Bx-eng-2, respectively, whilst the other copies (GH45_2 and GH45_4) were ignored in other studies [14] ( Figure 1d). This indicates that some GH45 genes were not only acquired from horizontal gene transfer but also evolved to maintain multiple copies within SD regions.
Additionally, GH45_1 and GH45_2 were proved to locate within the SD region and experimentally verified to possess TR variations within their ORFs (Figures 1a and 2a-c). Relevant research has suggested that TRs could be formed when SDs occurred [24][25][26] and TRs found within ORFs were usually associated with gene expression, protein structure and function [27][28][29]. At the same time, TRs were also reported as mediators for phenotypic change, which facilitated rapid adaptation of the organisms to the environment [30]. It was clear that GH45 cellulase was one of the major plant cell-wall-degradation enzymes of B. xylophilus [14]. Thus, the TR variations (unit size: 21 mer) of GH45_1 and GH45_2, along with their distinct expressions (Figures 3 and 4), would provide potential evidence for the diverse virulence and wide distribution range of B. xylophilus. Additionally, the expression of GH45 genes was boosted 12 h after inoculation and decreased at 24 h. This expression pattern was negatively correlated with the host response activities that occurred at the corresponding time points, which possibly inhibited the pathogenesis-related gene expressions [31]. GH45_3 and GH45_4 did not have TRs but their expression levels were also significantly variable. It is possible that the B. xylophilus-specific GH45 cell-wall-degradation genes evolved to maintain their parasitism by introducing various copies and TRs to achieve various gene functions and protein structures.
Traditional population studies on the genetic variations of B. xylophilus are usually based on the different enzyme sites within DNA fragments, such as ISSR, AFLP and RFLP [32][33][34] and other biological markers using mitochondrial DNA, tandem repeats and microsatellites [35][36][37][38]. In the current study, a new approach was proposed to classify B. xylophilus with different virulence based on copy number variations. Copy number variations of B. xylophilus can be easily detected by simple PCR amplifications. This approach

Discussion
The short-read-based assembly generally did not show an extraordinary capacity in characterizing long repetitive sequences. Consequently, few genes annotated as multiple copies with highly similar sequence contents and copy number variations have been previously reported in B. xylophilus [7,14]. In this study, we found that some B. xylophilus-specific GH45 genes had multiple copies with the help of long-read-based (PRJNA524063) assembly (the misidentification of GH45 copies in another long-read assembly may be caused by copy number variations-the sequencing strains may not contain any different copies). Hitherto, GH45_1 and GH45_3 were reported as Bx-eng-3 and Bx-eng-2, respectively, whilst the other copies (GH45_2 and GH45_4) were ignored in other studies [14] (Figure 1d). This indicates that some GH45 genes were not only acquired from horizontal gene transfer but also evolved to maintain multiple copies within SD regions.
Additionally, GH45_1 and GH45_2 were proved to locate within the SD region and experimentally verified to possess TR variations within their ORFs (Figures 1a and 2a-c). Relevant research has suggested that TRs could be formed when SDs occurred [24][25][26] and TRs found within ORFs were usually associated with gene expression, protein structure and function [27][28][29]. At the same time, TRs were also reported as mediators for phenotypic change, which facilitated rapid adaptation of the organisms to the environment [30]. It was clear that GH45 cellulase was one of the major plant cell-wall-degradation enzymes of B. xylophilus [14]. Thus, the TR variations (unit size: 21 mer) of GH45_1 and GH45_2, along with their distinct expressions (Figures 3 and 4), would provide potential evidence for the diverse virulence and wide distribution range of B. xylophilus. Additionally, the expression of GH45 genes was boosted 12 h after inoculation and decreased at 24 h. This expression pattern was negatively correlated with the host response activities that occurred at the corresponding time points, which possibly inhibited the pathogenesis-related gene expressions [31]. GH45_3 and GH45_4 did not have TRs but their expression levels were also significantly variable. It is possible that the B. xylophilus-specific GH45 cell-walldegradation genes evolved to maintain their parasitism by introducing various copies and TRs to achieve various gene functions and protein structures.
Traditional population studies on the genetic variations of B. xylophilus are usually based on the different enzyme sites within DNA fragments, such as ISSR, AFLP and RFLP [32][33][34] and other biological markers using mitochondrial DNA, tandem repeats and microsatellites [35][36][37][38]. In the current study, a new approach was proposed to classify B. xylophilus with different virulence based on copy number variations. Copy number variations of B. xylophilus can be easily detected by simple PCR amplifications. This approach could be utilized as a novel biological marker to characterize the virulence of B. xylophilus as well as assist in population classification.
Other studies have also revealed possible relationships between gene copies and complex traits [39]. Global detection of copy numbers in human genomes proved that copy number variations occur within disease loci and functional elements [40]. Evolutionary study of immune-associated genes in Drosophila showed that copy number variations tend to vary on signaling proteins, suggesting that molecular interactions between hosts and pathogens may drive adaptive evolution in the immune system [41]. Apart from model organisms, researchers have also found that juvenile hormone esterase-related genes in pests occur concomitantly with the loss of functional motifs during gene expansion [42]. Therefore, it is possible that the copy number variations found in GH45 genes in B. xylophilus may correlate with certain pathological traits and mediate the infection process.

Conclusions
The accurate identification and quantification of GH45 genes with different copies elucidates a new perspective upon the pathogenicity of B. xylophilus. In future studies, the similarities among different copies of GH45 genes should be recognized when their expressions are quantified or relevant studies are conducted. Traditional short-read RNAseq data with a 100-200 bp read-length were incapable of quantifying different copies of GH45 genes with accuracy. Therefore, long-read-based transcriptome sequencing would be a better method to obtain more solid results. However, the interaction between copy number variations of GH45 genes and the pathogenesis of PWN still requires additional experiments with sufficient strains.