Roles of Species-Specific Legumains in Pathogenicity of the Pinewood Nematode Bursaphelenchus xylophilus

Peptidases are very important to parasites, which have central roles in parasite biology and pathogenesis. In this study, by comparative genome analysis, genome-wide peptidase diversities among plant-parasitic nematodes are estimated. We find that genes encoding cysteine peptidases in family C13 (legumain) are significantly abundant in pine wood nematodes Bursaphelenchus genomes, compared to those in other plant-parasitic nematodes. By phylogenetic analysis, a clade of B. xylophilus-specific legumain is identified. RT-qPCR detection shows that these genes are highly expressed at early stage during the nematode infection process. Utilizing transgene technology, cDNAs of three species-specific legumain were introduced into the Arabidopsis γvpe mutant. Functional complementation assay shows that these B. xylophilus legumains can fully complement the activity of Arabidopsis γVPE to mediate plant cell death triggered by the fungal toxin FB1. Secretory activities of these legumains are experimentally validated. By comparative transcriptome analysis, genes involved in plant cell death mediated by legumains are identified, which enrich in GO terms related to ubiquitin protein transferase activity in category molecular function, and response to stimuli in category biological process. Our results suggest that B. xylophilu-specific legumains have potential as effectors to be involved in nematode-plant interaction and can be related to host cell death.


Introduction
Peptidases have central roles in parasite biology and pathogenesis, and are very important and essential components of these processes. They degrade host proteins for nutrition, and manipulate the host immune system to elude the immune response [1]. Among them, cysteine peptidases in family C13 (clan CD), called legumains, have high specificity for the hydrolysis of acyl bonds of asparaginyl asparagine and play important roles in many physiological processes [2,3]. Legumains are widely distributed in animals and plants, with intriguing mechanistic peculiarities. In plants, they are called vacuolar processing enzymes (VPEs), they can help to process proteins in storage vacuoles, and are usually involved in plant cell death [4][5][6]. In the Arabidopsis genome, there are four VPEs (α, β, γ and δ VPEs), and the γVPE is specific to vegetative organs and associated with cell death processes [6,7]. In animals, legumain are also known as asparaginyl endopeptidases (AEPs) and have various functions including digestion, immunity, antimicrobial activity, immune signalling, apoptosis, transcription factor and osteoclast remodelling [8]. Legumains are also very important to parasitic protozoa and helminths [1]. It was reported that a surface-localized legumain appeared to display a pro-survival role in the intestinal protozoan Blastocystis hominis. Inhibition of legumain activity could induce programmed cell death (PCD) in Blastocystis [9]). In blood-feeding helminths, legumains were reported to be involved in host hemoglobin digestion [10,11]. Under some conditions, legumains are likely to be secreted to the pericellular environment. It was reported that a legumain was secreted to the blood of a host patient by the pathogenic parasite Clonorchis sinensis as an antigen to play an important role in host-parasite interaction [12]. However, little information is currently known about legumain in animal-and plant-parasitic nematodes.
The pinewood nematode (PWN), Bursaphelenchus xylophilus, is a migratory endoparasitic nematode, which lives in coniferous trees (mainly Pinus spp.) and causes a serious epidemic forest disease-the pine wilt disease (PWD). This invasive plant-parasitic nematode has killed millions of pine trees in its introduced regions in Asia and Europe, and has become a worldwide threat to forest ecosystem [13]. The nematode is transmitted to healthy trees by vector beetles (mainly pine sawyers, Monochamus spp.) during maturation feeding of the beetles [14,15]. Then, the nematode lives in the resin canals of host trees, and feeds epithelial cells and parenchyma cells [16,17]. An obvious characteristic of PWD is quick death of the living pine trees after infection with the nematode. Histopathological observation with electron microscopes (SEM and TEM) showed that, in inoculated pine seedlings, severe degradation in the resin canals, rays and the cambial zone had occurred before external symptoms appeared [17,18]. Cytological observations showed that, after inoculation of pathogenic PWN on Japanese black pine (Pinus thunbergii) seedings, the first conspicuous symptom was vacuolization of parenchyma cells, developed in a wide area and finally tonoplast burst. After the breakdown of vacuoles and the spread of vacuolar inclusion in the cells, parenchyma cells do not perform normal physiological function [19,20]. Moreover, the formation of vacuoles was observed before the nematode population increase in affected seedlings, indicating that the formation of vacuoles occurred without direct contact of nematodes with parenchyma cells. Therefore, it was suggested that some factors originating from the nematode B. xylophilus seemed to cause the formation and bursting of vacuoles in host [17,19]. For exploring the mechanisms of PWN pathogenesis, much work has been done and several hypotheses were proposed [15,21]. The genome of a Japanese strain of the nematode B. xylophilus has been published [22], and some parasitism-related genes and proteins involved in nematode-plant interactions have been reported [23][24][25][26]. However, the molecular mechanism by which such tiny worms of B. xylophilus kill such massive pine trees so rapidly are still unclear.
In plant-pathogen interaction, plants have evolved immune systems that protect them from invading pathogens. Cell death is a defense mechanism of the host against biotrophic pathogens that occurs by inducing PCD, termed the hypersensitive response (HR), which may function to limit the spread of pathogens. Vacuole-mediated cell death is a plant defense strategy. VPE is a key molecule in the immune system that is associated with vacuolar membrane collapse [27]. However, pathogens can manipulate plant PCD pathways to their own advantage by inhibiting or inducing host PCD, which contributes to immunity, targeting defense compounds, or enhancing nutrient acquisition [28,29]. Cell death may instead be an infection strategy for some pathogens [30]. Plant pathogens have developed sophisticated attack strategies mediated by numerous effector proteins to stimulate cell death by hijacking the host cell's defense machinery to promote the recycling of host cellular resources to absorb nutrients [31]. Plant VPE is also necessary for cell death mediation from a wide range of plant pathogens. Some necrotrophic pathogens can take advantage of the increased VPE activity in host cells to enhance pathogenicity [6,32]. Increased VPE activity may benefit the pathogen by mediating protein turnover and nutrient release [33]. The obligate biotrophic pathogen Hyaloperonospora arabidopsidis can also take advantage of the increased VPE activity in host cells to increase sporulation and enhance pathogenicity [6,32]. The hemibiotrophic pathogen Fusarium moniliforme produces mycotoxin FB1 to kill host cells through the host HR by the VPE-mediated vacuolar mechanism [5,34]. So far, the infection strategies by utilizing host cell death to their own advantage have been reported in fungal and bacterial pathogens. As the way of host cell death caused by B. xylophilus is similar to VPE-mediated PCD, it is an interesting question whether the nematode could use VPE activity to cause plant cell death.
With the increase in published genomes of plant-parasitic nematodes, it is possible to explore molecular host-parasite interactions by comparative genome analysis [35]. In this study, to explore the molecular pathogenic mechanism of PWN, we first try to explore diversities of peptidases in different plant-parasitic nematodes by comparing peptidases in plant-parasitic nematode genomes. Then, we try to identify B. xylophilus-specific legumain and estimate their potential for involvement in nematode-plant interaction by experimental verification. Our results may provide useful information regarding legumains in plant-parasitic nematodes.

Diversity of Peptidases in Plant-Parasitic Nematodes by Comparative Genome Analysis
To explore the pathogenic mechanism of PWN, we firstly sequenced and assembled three  Table S1). The genome size and the gene number in BxCN are slightly different from the reported genome of the B. xylophilus 'R' form from the Japanese strain Ka4C1 (hereafter named BxJP), which is 74.56 MB in size with 18,074 protein-coding genes [22].

Expansion of Cysteine Peptidase in Family C13 in B. xylophilus and Their Evolutionary Relationship
We paid more attention to cysteine peptidases in family C13 (legumain/VPE/AEP), which has been documented to be involved in cell death in plants [6]. In the four Bursaphelenchus genomes, a total of 15, 16, 10, and 11, legumain-coding genes are identified from BxCN, BxJP, BxCA, and BmCN (Supplementary Table S3), and 13, 11, 10, and 11, of them are predicted to have signal peptides, respectively (Supplementary Table S4). However, only a total of 9 legumains are found in all other PPN genomes, which including one or two members in each of the genomes of Ditylenchus spp., Globodera spp., R. similis, and R. reniformis, but no homologue in the genomes of Heterodera spp. and Meloidogyne spp. (Supplementary Table S3). Additionally, three of them are predicted to have signal peptides (Supplementary Table S4). Obviously, legumain genes in the four Bursaphelenchus genomes are much more abundant than those in other PPN genomes, which are especially rich in the genomes of pathogenic B. xylophilus 'R' form (BxCN and BxJP).
In addition to two homologues from the Caenorhabditis elegans genome, a phylogenetic tree of above PPN legumains (excluding five very short sequences) is constructed. The topology shows that these PPN peptidases are clustered to 12 orthologous groups ( Figure 2a). Among them, two groups are conserved in PPNs. One group (BxCN04518 and orthologues) is homologous with C. elegans T05E11.6 (C13.005), belonging to glycosylphosphatidylinositol (GPI)-anchor transamidases. Another group (BxCN02210 and orthologues) is homologous with C. elegans T28H10.3 (C13.A02), belonging to hemoglobinase-type cysteine proteases. The other groups are composed of Bursaphelenchus members, with no homologue in other PPN genomes. A clade of B. xylophilus-specific legumains is identified (including BxCN10248 and BxCN10334 orthologous groups), which have no orthologue in B. mucronatus (Figure 2a). For further exploring the origin and evolutionary relationship of these legumains, we retrieved legumain sequences from all nematode genomes deposited in the NCBI database (including plant-and animal-parasitic nematodes, and free-living nematodes), in addition to plant VPEs from A. thaliana, Pinus, and Picea genomes. A phylogenetic tree of 89 members (with length of the C13 domain >200 aa) is constructed (Supplementary Figure S2). The tree topology shows that the above two conserved groups (BxCN04518/C13.005 and BxCN02210/C13.A02 homologous groups) are also conserved in other nematodes. A large clade of Bursaphelenchus-specific legumains is found, with no homologue in other nematode genomes, and the B. xylophilus-specific legumains are involved in this clade (Supplementary Figure S2). Based on the locations of legumain-coding genes in each of Bursaphelenchus genome, we find that some genes are tandemly arrayed (Supplementary Table S5). Sequence similarities and gene positions of legumains indicate gene duplication occurrence in the B. xylophilus genome ( Figure 2b).
Comparison of primary structural organizations shows that B. xylophilus-specific legumain has a typic legumain structure, i.e., with a signal peptide, a peptidase_C13 domain (PF01650) and a C-terminal prodomain of legumain (legumain_C, cd2115), the same as that of human AEP and Arabidopsis γVPE (Figure 3a, Supplementary Figure S3). Within the peptidase_C13 domain, four essential amino acid residues (R, H, C and S) exist, which form the substrate pocket, and the middle two residues (H and C) form the catalytic dyad [6]. The legumain_C domain is denoted as legumain stabilization and activity modulation domain (LSAM domain). The spatial structure of B. xylophilus-specific legumain (BxCN10334) predicted by RoseTTAFold software [41] ( https://robetta.bakerlab.org/) is also similar to that of human AEP and Arabidopsis γVPE (Figure 3b). Protein structure comparison by TM-align [42] (https://zhanggroup.org/TM-align/) shows that the TM-score value between B. xylophilus-specific legumain BxCN10334 and human AEP is larger than 0.87, and that between B. xylophilus-specific legumain and Arabidopsis γVPE is larger than 0.77 ( Figure 3c). More structures and comparisons are shown in Supplementary Figure S4. The results indicate high structural similarities among B. xylophilus-specific legumain, human AEP and Arabidopsis γVPE.   They have a signal peptide (gray box) at the N-terminus, a peptidase_C13 domain (PF01650) (blue box) and a C-terminal prodomain of legumain (legumain_C, cd2115) (orange box). The C13 domain includes four essential amino acid residues (R, H, C, S), which form the substrate pocket, and the middle two (H, C) are catalytic dyad; (b) prediction of spatial structures of B. xylophilus-specific legumain, Arabidopsis γVPE, and human AEP, with RoseTTAFold online (job IDs 207107, 207108 and 207109, respectively). Pink: signal peptide; blue: peptidase C13 family domain; yellow: C-terminal of prodomain of legumain; red: the essential amino acid residues; (c) protein structure comparison by TM-align, between B. xylophilus legumain and Arabidopsis γVPE, B. xylophilus legumain and human AEP. Pink represent B. xylophilus legumain, blue represent Arabidopsis γVPE or human AEP.

Expression Patterns of B. xylophilus-Specific Legumain Genes during the Nematode Infection in Pine Host
Through quantitative real-time PCR (RT-qPCR) technology, we detected the mRNA expression dynamics of the four species-specific legumains (BxCN10334, BxCN10337, BxCN10348, and BxCN10284) in B. xylophilus after the nematode inoculation on pine seedlings (P. thunbergii) for 0 (1-2 h) to 15 days. The result shows that these genes are xylophilus-specific legumain (BxCN10334), Arabidopsis γVPE (AT4G32940), and human AEP (NP_001008530). They have a signal peptide (gray box) at the N-terminus, a peptidase_C13 domain (PF01650) (blue box) and a C-terminal prodomain of legumain (legumain_C, cd2115) (orange box). The C13 domain includes four essential amino acid residues (R, H, C, S), which form the substrate pocket, and the middle two (H, C) are catalytic dyad; (b) prediction of spatial structures of B. xylophilus-specific legumain, Arabidopsis γVPE, and human AEP, with RoseTTAFold online (job IDs 207107, 207108 and 207109, respectively). Pink: signal peptide; blue: peptidase C13 family domain; yellow: C-terminal of prodomain of legumain; red: the essential amino acid residues; (c) protein structure comparison by TM-align, between B. xylophilus legumain and Arabidopsis γVPE, B. xylophilus legumain and human AEP. Pink represent B. xylophilus legumain, blue represent Arabidopsis γVPE or human AEP.

Expression Patterns of B. xylophilus-Specific Legumain Genes during the Nematode Infection in Pine Host
Through quantitative real-time PCR (RT-qPCR) technology, we detected the mRNA expression dynamics of the four species-specific legumains (BxCN10334, BxCN10337, BxCN10348, and BxCN10284) in B. xylophilus after the nematode inoculation on pine seedlings (P. thunbergii) for 0 (1-2 h) to 15 days. The result shows that these genes are highly expressed and reach a maximum after inoculation on the host for three to five days ( Figure 4a). After inoculation for 10 days, their expressions decrease to very low levels.
highly expressed and reach a maximum after inoculation on the host for three to five days (Figure 4a). After inoculation for 10 days, their expressions decrease to very low levels.

Functional Complementation of γVPE-Deficient Arabidopsis by Expression of B. xylophilus-Specific Legumains in Mediating Plant Cell Death
It has been documented that Arabidopsis γVPE can mediate mycotoxin FB1-induced cell death [5]. To explore the roles of B. xylophilus-specific legumains in host cell death, we first introduced two specific legumain genes (BxCN10334 and BxCN10337) into the A. thaliana ∆γvpe mutant (Salk_024036C) independently by genetic manipulation, with

Functional Complementation of γVPE-Deficient Arabidopsis by Expression of B. xylophilus-Specific Legumains in Mediating Plant Cell Death
It has been documented that Arabidopsis γVPE can mediate mycotoxin FB1-induced cell death [5]. To explore the roles of B. xylophilus-specific legumains in host cell death, we first introduced two specific legumain genes (BxCN10334 and BxCN10337) into the A. thaliana ∆γvpe mutant (Salk_024036C) independently by genetic manipulation, with introduction of the empty vector pCAMBIA1302 as control. Then, leaves from five week old T3 plants were infiltrated with FB1 (10 mM, in 0.1% methanol) as an elicitor to trigger cell death. As expected, after FB1 infiltration for five days, necrotic lesions are clearly observed on treated leaves of the transgenic plants introduced with pCAMBIA1302::BxCN10334 and pCAMBIA1302::BxCN10337. These lesions are identical with the typical lesions showing cell death on leaves in the wild type Arabidopsis plants, while no obvious lesion is observed on leaves of the control plants introduced with the empty vector (Figure 4b).
Then, we quantitatively analyzed the effects of the four B. xylophilus-specific legumains (BxCN10334, BxCN10337, BxCN10284 and BxCA10290), in addition to a pine VPE (Pinus taeda PITA_000069534), on complement of Arabidopsis γVPE to mediate FB1-induced cell death, with the ∆γvpe mutant and the wild type plants as negative and positive controls. A 10-µL of FB1 (10 mM, in 0.1% methanol) was infiltrated into each leaf of different Arabidopsis plants. After infiltration for five days, the diameters of necrotic spots on treated leaves of each transgenic line were measured (Figure 4c, Supplementary Figure S5 Table S6). The above results indicate that B. xylophilus-specific legumains in the pathogenic 'R' form (BxCN) can fully recover the activity of the Arabidopsis ∆γvpe mutant in mediating cell death, but the role of the orthologue in the 'M' form nematode (BxCA) is relatively weak.

Validation of Secretory Activities of B. xylophilus-Specific Legumains
To determine whether these B. xylophilus-specific peptidases can be secreted into host cells by stylet to play roles, different experiments were performed to validate their secretory activities, which are necessary for effectors to interact with host. We first analyzed the tissue localization of the mRNA expression of three B. xylophilus-specific legumains (BxCN10334, BxCN10337, and BxCN10284) in the nematode, by in situ hybridization. With digoxigenin-labelled antisense cDNA probes, strong signals are observed within the oesophageal glands in juvenile and adult nematodes (Figure 5a). Similarly, using fluorescence in situ hybridization, strong fluorescent signals are detected in the subventral and dorsal oesophageal glands in juveniles and adults, respectively (Figure 5b). No signal is detected using the sense cDNA probes.
Then, secretory activities of their signal peptides are validated by using a genetic assay based on invertase secretion for yeast growth on plates with sucrose or raffinose as the sole carbon source. Sequences of predicted signal peptides of the above three genes (BxCN10334, BxCN10337, and BxCN10284) were inserted into the plasmid pSUC2T7M13ORI, and then transferred into the yeast strain YTK12, respectively. The result shows that all transformants can grow on YPRAA media and display red color with the chemical 2,3,5-triphenyltetrazolium chloride (TTC) (Figure 6). Then, secretory activities of their signal peptides are validated by using a genetic assay based on invertase secretion for yeast growth on plates with sucrose or raffinose as the sole carbon source. Sequences of predicted signal peptides of the above three genes (BxCN10334, BxCN10337, and BxCN10284) were inserted into the plasmid pSUC2T7M13ORI, and then transferred into the yeast strain YTK12, respectively. The result shows that all transformants can grow on YPRAA media and display red color with the chemical 2,3,5-triphenyltetrazolium chloride (TTC) (Figure 6). . Yeast invertase secretion assay of the signal peptides of the above three BxCN legumains. Yeast YTK12 strain carrying signal peptides of BxCN10334, BxCN10337, and BxCN10284, are able to grow in raffinose-containing YPRAA medium, and react with TTC to display red color. YTK12 carrying the Avrblb2 and the pSUC2 vector are used as positive and negative controls, respectively. CMD-W, YPDA, and YPRAA, are culture media. TTC, 2,3,5-triphenyltetrazolium chloride.

Identification of Genes Involved in Nematode Legumain-Mediated FB1-Induced Plant Cell Death
To determine which genes are involved in nematode legumain-mediated FB1induced plant cell death, we sequenced and compared transcriptomes of four types of Arabidopsis plants, i.e., ∆γvpe mutant, the wild type, transgenic lines introduced with Figure 6. Yeast invertase secretion assay of the signal peptides of the above three BxCN legumains. Yeast YTK12 strain carrying signal peptides of BxCN10334, BxCN10337, and BxCN10284, are able to grow in raffinose-containing YPRAA medium, and react with TTC to display red color. YTK12 carrying the Avrblb2 and the pSUC2 vector are used as positive and negative controls, respectively. CMD-W, YPDA, and YPRAA, are culture media. TTC, 2,3,5-triphenyltetrazolium chloride.
Moreover, we also identify an additional 1333 DEGs displaying similar expression patterns (up-or down regulated) in transcriptomes of the two transgenic lines introduced with BxCN10334 and BxCN10337, including 261 significantly upregulated and 1072 downregulated genes (Figure 7a, Supplementary Table S9). GO enrichment analysis shows that these DEGs are enriched in GO terms response to stimulus (GO:0050896), such as to chemical (GO:0042221), stress (GO:0006950), and hormone (GO:0009725), and terms biological regulation (GO:0065007) and signal transduction (0007165) in category BP (Figure 7c, Supplementary Table S10).
We then randomly selected 39 DEGs (including 16 upregulated and 23 downregulated) from the 59 common DEGs to verify their expression patterns in the four types of Arabidopsis plants (i.e., ∆γvpe mutant, wild-type, and transgenic plants introduced with BxCN10334 and BxCN10337) by RT-qPCR. The results show similar expression patterns as those observed in the transcriptome profiles (Supplementary Figure S6).

Discussion
Although legumains are documented to play important roles in animals and plants, little is known in nematodes, currently. In this study, by comparative genome analysis, we find that genes encoding legumains are significantly abundant in the pathogenic B. xylophilus 'R' form genome, in contrast to those in other plant-parasitic nematode genomes (Figures 1 and 2, Supplementary Table S3). Based on sequence similarities and the positions of legumain genes in the B. xylophilus genome, we suggest that the expansion of legumain-encoding genes in the B. xylophilus genome is mainly contributed by gene duplication (Figure 2b). We identified four B. xylophilus-specific legumains, which have similar structures to Arabidopsis γVPE and human AEP (Figure 3, Supplementary Figures S3 and S4). In general, proteins with high sequence identity and high structural similarity tend to possess functional conserved [43,44]. As both human AEP and Arabidopsis γVPE are related to cell death [6], B. xylophilus-specific legumains with high sequence identities and high structural similarities to them perhaps also have similar function in cell death. Then, utilizing genetic manipulation, genes encoding B. xylophilus-specific legumains were transferred to A. thaliana ∆γvpe mutant, and a functional complementation assay was performed. After infiltration of fungal toxin FB1, necrotic lesions are clearly observed on treated leaves of the transgenic plants introduced with B. xylophilus-specific legumain genes, which are similar to the lesions on leaves in the wild type Arabidopsis plants (Figure 4b,c, Supplementary Figure S5). The result indicates that these B. xylophilus-specific legumains can fully recover the activity of Arabidopsis γvpe mutant to mediate plant cell death triggered by the fungal toxin FB1. Of cause, further verification is necessary to obtain supports in the pine host tree. However, owing to the limitation of current technology, it is difficult to perform experimental verification of nematode legumains in a pine tree by host-mediated RNAi or by gene knockout in the nematode. An interesting question in this study is that the mean size in the transgenic lines of BxCA10290 (the orthologue in B. xylophilus 'M' form) is obviously smaller than that of BxCN10334 (the orthologue in B. xylophilus 'R' form). We compared the protein structures of BxCN10334 and BxCA10290, and their identity of amino-acids is 93% and TM-score > 0.92 (Supplementary Figure S7). Limited by our current knowledge, we cannot interpret the virulence difference from protein structures; more study is needed in the future. Using recombinant DNA technology, we may explore the biochemical characteristics of these legumains through heterologous expression of these legumain genes in a bacterium or yeast.
Moreover, hybridization in situ verify that these B. xylophilus-specific legumains are expressed in esophageal glands of the nematode ( Figure 5). Yeast invertase secretion assay validates the function of the signal peptides harbored in the legumains ( Figure 6). Gene expression by RT-qPCR detection shows that these B. xylophilus-specific legumain genes are expressed at the early stage of the nematode infection (Figure 4a). The results indicate that these nematode legumains have effector potentials. Based on the above results, we suggest that these species-specific legumains are secretory proteins, which are likely to be secreted out through the nematode stylet into the host cells, and may play roles as effectors to participate in pathogen-plant interactions in the early stage during nematode infection. In consideration of previously histopathological and cytological results [17][18][19], a molecular mechanism of B. xylophilus killing pine trees is proposed. It is that, when the nematode enters a healthy pine tree and feeds on the host's parenchyma and epithelial cells, the host immune system may be activated, and a defensive HR mediated by VPE be produced. Meanwhile, the worms may secrete species-specific legumains through their stylets when they are feeding, which have plant VPE activities and may strengthen plant cell death, resulting in a pathogenic PCD development in susceptible pine trees. However, no pathogenic PCD occurs in resistant pine trees. Perhaps it is owing to that the hosts can slow damage expansion in tissues early and have time to complete the biosynthesis of lignin in the cell walls, as reported previously [45]. Lignin surrounds the damaged regions and induce cell wall protein-based defense. Based on our current results, we suggest that the pathogenic nematode B. xylophilus may uses a strategy like some fungal phytopathogens, by inducing plant PCD to cause inappropriate cell death in hosts for improving their parasitism [46]. With biotechnological developments, nematode genes may be introduced into a pine host by genetic manipulation, and host-mediated RNAi may be performed. Alternatively, B. xylophilus mutants may be obtained by gene knock-out technology. Then, the biological functions and mechanisms of different legumains in the nematode will be determined.

Nematode Strains
Three Bursaphelenchus nematode strains were used for whole genome sequencing in this project, i.e., B. xylophilus 'R' form strain ZJSS and B. mucronatus strain BmDH from P. massoniana in Zhejiang Province, China, and B. xylophilus 'M' form strain CAN from Canada (a gift from Dr. HM. Li). The three nematodes are similar in morphology, biology, and ecology. Therefore, they were once taken as a "super species" or the pinewood nematode species complex (PWNSC) [47,48]. However, the pathogenicity to pine hosts is obviously different [49,50]. Nematodes were cultured on fungal mats of Botrytis cinerea grown on potato-dextrose agar (PDA) plates at 25 • C. After sister-brother mating over 10 generations for each strain, established inbred lines were prepared for genome sequencing.

Plant Materials
Seeds of Arabidopsis thaliana Col-0 ecotype and the ∆γvpe mutant Salk_024036C (from the Nottingham Arabidopsis Stock Centre, NASC) were surface-sterilized with 75% ethanol, and then sown onto 0.4% agar containing 1/2 Murashige-Skoog (MS) medium and 1% sucrose for three days in the dark at 4 • C for vernalization as described [5]. Seeds were germinated and grown in a growth chamber at 22 • C under a 16 h light/8 h dark period for two weeks. Then, seedlings were transplanted in cultivated matrix and grown under the same conditions. These Arabidopsis plants were used for genetic transformation and functional complementation assays. Nicotiana benthamiana were cultured in a greenhouse and used for transient expression. Five year old pine seedlings (P. thunbergii) were prepared for nematode inoculation.

Genome Sequencing and Assembly
Genomic DNA was extracted from each of the inbred lines of above three nematodes at mixed stages using a regular phenol chloroform method. For effective genome assembly, multiple libraries with different insert sizes were constructed for each nematode strain (Supplementary Table S11). Genome sequencing was performed using an Illumina GA II sequencing system with standard procedures at the Beijing Genomics Institute (Shenzhen, China). In total, 9.7 Gb, 14.4 Gb, and 12.1 Gb, were generated for B. xylophilus 'R' form (named BxCN), B. xylophilus 'M' form (named BxCA), and B. mucronatus (named BmCN), respectively (Supplementary Table S11).
After filtered to remove low-quality sequences, base-calling duplicates and adaptor contamination, the sequence data from each library were assembled using SOAPdenovo with default parameters [51]. We first assembled the reads from the short insert-size libraries (≤500 bp) into contigs using k-mer overlap information (de Bruijn graph k-mer). A hierarchical assembly method was used to construct scaffolds step by step by adding data from each library separately in the order of insert size from smallest to largest. Gaps were filled in by following the methods described previously [52]. Then, taking assembly of the published genome of B. xyolphilus strain Ka4C1 as the reference [22], genome assemblies of the two B. xylophilus genomes (BxCN and BxCA) were improved by using mugsy [53]. We assessed the completeness of the draft genomes using the Core Eukaryotic Genes Mapping Approach (CEGMA) [54], and annotated repetitive elements in each genome by RepeatScout [55] and RECON [56].

Protein-Coding Gene Annotation
The protein-coding genes were inferred using de novo-, homology-and evidence-based approaches. De novo gene prediction was performed on repeat-masked genomes using four gene prediction programs (Augustus, GlimerHMM, SNAP, and GeneMark.hmm-ES) [52]. Training gene model sets were generated from subsets of the EST/mRNA and RNA-Seq datasets representing 822 distinct genes of BxCN and 808 distinct genes of BmCN. The same dataset used for BxCN was also used for BxCA. Taking advantage of the published protein-coding gene annotation of Ka4C1 [22], we used the homology-based gene annotation program genBlastG [57] to find potentially missed gene models in BxCN, BxCA, and BmCN. RNA-seq reads obtained by sequencing the reverse transcribed mRNA libraries of BxCN, BxCA, and BmCN, were aligned using Bowtie [58]. Introns were defined using TopHat [59].
Following the prediction of the protein-coding gene set, each inferred amino acid sequence was assessed for conserved protein domains in Pfam v31.0 [60] and NCBI CDD database [61]. SignalP v4.0 [40], TargetP v1.1 [62] were applied to predict signal peptides, and SignalP, Phobious, and TMHMM v2.0c [63] were applied to predict transmembrane domains in the proteins. We predicted peptidases by aligning amino acids against the reported peptidases in the MEROPS database [3] with an E-value cutoff of 1 × 10 −5 . The predicted peptidases were further confirmed and used for following analysis if they contained conserved domains. For the peptidases from PPN genomes, we used TBtools [64] (Chen et al., 2020) to draw the heatmap. The 3D structure of legumain is predicted by RoseTTAFold server [41] (https://robetta.bakerlab.org). Protein structure comparison is performed by TM-align [42], i.e., uploading the PDB files of protein structure to the TMalign server (https://zhanggroup.org/TM-align/) for comparative analysis. The TM-score of the comparative analysis will be provided by the TM-align method and the PDB files for pair-wise structure comparison will be generated by TM-align. To view the protein structure, we used iCn3D [65] and Jmol v14.32.70 (http://jmol.sourceforge.net/).
We also downloaded all available PPNs genomes (23 genomes) deposited in the public NCBI databases, in addition to C. elegans genome (Supplementary Table S2). As some genomes had no gene information, we performed de novo prediction to identify the conserved genes using Augustus [66] (https://bioinf.uni-greifswald.de/augustus/) with the training dataset of C. elegans.

Phylogenetic Tree Construction and Evolutionary Analysis
For the phylogenetic analysis of PPN legumains, we used both the maximum likelihood (ML) method with WAG + G model and the neighbor-joining (NJ) method with JTT model. Alignment of sequences was performed using the program MUSCLE [67]. The neighbor-joining phylogenetic tree was constructed based on whole gene sequences. Prottest [68] was used to evaluate the best models for the maximum likelihood phylogenetic tree, which was constructed based on the C13 domain sequences with length larger than 110 aa. The MEGA v6 [69] was used to construct the gene trees. Then, we explored homologous genes of all nematode legumains on NCBI, in addition to plant VPEs (A. thaliana and conifers), and a phylogenetic tree of 89 members (length of the C13 domain >200 aa) was constructed by the (ML) method with the LG + I + G model.

RNA Isolation, RT-PCR and Quantitative Real-Time PCR
Fresh cultured nematodes were collected and the total RNA was isolated using TRIzol reagent (Invitrogen, Carlsbad, CA, USA). After measuring the concentration (OD260) and the purity (OD260/OD280 1.8~2.0) with an ultraviolet spectrophotometer, agarose gel electrophoresis was carried out to check the RNA integrity. Then, 1 µg of RNA was used for first-strand cDNA synthesis using the SuperScript™ III cDNA Synthesis Kit (Invitrogen, Carlsbad, CA, USA), according to the manufacturer's instructions. Using cDNA as the template, RT-PCR was performed with primer-pairs (Supplementary Table S12). PCR conditions were as follows: 98 • C for 1 min, 35 cycles of 98 • C for 10 s, Tm • C 20 s, and 72 • C for X s (set up as 60 s/kb), and finally, 72 • C for 5 min. Then, PCR products were purified with a gel purification kit (Tiangen, Beijing, China) and Sanger-sequencing by forward and reverse primers. Gene sequences of predicted B. xylophilus-specific legumains were validated.
For detection of the mRNA expression dynamics during nematode infection, we inoculated the nematode B. xylophilus on five year old pine seedlings (P. thunbergii), with approximately 10,000 individuals (200 µL) for each. After inoculation for 0 (1-2 h), 1, 3, 5, 10, and 15 days, nematodes were isolated, respectively. Total RNA was extracted and first-strand cDNA was synthesized as above for each treatment, independently, with at least three replicates for each treatment. RT-qPCR was performed with primers (Supplementary Table S12) to detect expression levels of B. xylophilus-specific legumain-coding genes. Each 20 µL reaction mixture was prepared using a SYBR Premix ExTaqTM II kit (Takara, Dalian, China) on a CFX96 Real-Time PCR System (BIO-RAD, CA, USA) following the manufacturer's protocol. Cycling conditions for quantitative PCR were: 95 • C for 30 s, and 45 cycles of 95 • C for 5 s, 60 • C for 20 s and 72 • C for 15 s. Each treatment had three independent repeats, with four technical replicates for each reaction. The ef1α gene was used as the internal control. The relative gene expression changes were calculated using the 2 −∆∆CT method.

Functional Complementation Assay
Nematode legumain-coding genes (with removed signal peptides) were amplified from the cDNA templates with primers (Supplementary Table S12), then inserted into the plant transformation vector pCAMBIA1302. The resulting constructs were independently introduced into the Arabidopsis ∆γvpe mutant (Salk_024036C) through Agrobacteriummediated transformation following a method described previously [70]. A transformant with the empty vector pCAMBIA1302 was used as a control. All transformants were then incubated overnight in the dark at 25 • C and moved to a greenhouse under controlled conditions (14 h light/10 h dark at 25 • C) for growth and seed harvesting. Seeds of each generation were screened on MS medium plates with hygromycin B (100 mg/mL). All putative transformants were confirmed by PCR amplification with the primer pair (pCAMBIA1302-F/-R, Supplementary Table S12) from genomic DNA isolated from the leaves using a DNeasy Plant Mini Kit (QIAGEN, Hilden, Germany) following the manufacturer's instructions. Homozygous T3 plants were used for further experiments.
We first introduced two legumain-coding genes (BxCN10334 and BxCN10337) into the Arabidopsis ∆γvpe mutant for the complementation assay. Homozygous transgenic seedlings expressing 35S::BxCN10334 or 35S::BxCN10337 were grown for five weeks. Then, detached leaves were infiltrated with 10 mM FB1 (in 0.1% methanol) as described previously [5]. Lesion development on leaves was observed after five days. Then, four B. xylophilus-specific legumain-coding genes (including three BxCN and one BxCA) and a plant vpe gene from P. taeda (PITA_000069534) were introduced into the mutant independently. For quantitative comparison of complemental effects, a 10-µL of FB1 (in 0.1% methanol) was infiltrated into each leaf. After five days, the diameters of necrotic spots on the treated leaves were measured under a light microscope (Olympus, Tokyo, Japan). More than 10 plants and each with 3-4 leaves were treated with FB1 infiltration in each transgenic line. The mean size of each line was calculated. One-way ANOVA was used to compared the mean sizes with the SPSS software v20 (IBM, Chicago, IL, USA).

In Situ Hybridization
We used both digoxigenin-labelled probes method [71] and the fluorescein isothiocyanate method [72] to determine tissue expression of the three B. xylophilus-specific legumains (BxCN10334, BxCN10337, and BxCN10284) in the nematode. For digoxigeninlabelled probe hybridization, a fragment of approximately 300 bp was amplified from the coding regions of each gene and used as a template for the synthesis of both DIG-labelled sense and antisense probes (Supplementary Table S12). Hybridization was performed with mixed life stage nematodes. Hybridization signals within the nematodes were de-tected with alkaline phosphatase-conjugated anti-digoxigenin antibody and substrate and observed under a light microscope (Olympus, Tokyo, Japan). For fluorescence probe hybridization, an approximately 25 bp cDNA fragment specific to each gene was used as an antisense and sense probe, and the 5 -end was labelled with fluorescein isothiocyanate (FITC, Sangon Biotech, Shanghai, China) as previously described with modification [72]. The hybridization signals were observed under a confocal microscope (Zeiss LSM700, Oberkochen, Germany). Probe sequences are listed in Supplementary Table S12.

Secretory Activity Assay
The yeast secretion system was used to validate the function of signal peptides [73]. The signal peptide sequences of above three legumains (BxCN10334, BxCN10337, and BxCN10284) were independently inserted into the yeast signal sequence trap vector pSUC2T7M13ORI. Then, the constructs were transformed into the invertase-negative yeast strain YTK12 using the Frozen-EZ Yeast Transformation II Kit (Zymo Research, CA, USA). After transformation, the yeast was plated on CMD-W plates for three days. Yeast colonies were purified and then grown on raffinose-containing YPRAA plates. The YPDA plates were used as equal viability controls. In addition, invertase enzymatic activity was detected with 2,3,5-triphenyltetrazolium chloride (TTC) as described in a previous study [74].

RNA Sequencing and DEG Identification
Using TRIzol reagent (Invitrogen, Carlsbad, CA, USA), following the manufacturer's instructions, total RNA was extracted independently from leaves of four types of Arabidopsis plants, i.e., ∆γvpe mutant, wild-type, and transgenic lines with BxCN10334 and BxCN10337, which were infiltrated with FB1 for five days to induce cell death. At least 20 mg of total RNA was prepared from each sample. Library construction and RNASeq were performed at Wuhan Frasergen Bioinformatics Co. (Wuhan, China), using Illumina HiSeq 2000. Two independent replicates were prepared for sequencing. After removing adaptor sequences and low-quality reads, the clean reads (36371964~54419250 reads, 5455~8162 Mb) were mapped to the Arabidopsis genome (TAIR10, GCA_000001735.1), with TopHat (-r 30 -p 20). To obtain gene expression FPKM values, Cufflinks and TopHat were used for analysis as shown in the reported workflow [75]. DEGs were obtained using the DEseq package [76] by comparing gene expression profiles of transgenic lines and WT with that of ∆γvpe mutant (using the mean value of two replicates, |log 2 Ratio| > 1, p-value < 0.05, FDR < 0.01). Shared DEGs with identical patterns (significantly upregulated or downregulated in WT and two transgenic lines, compared to the mutant control) were selected for GO enrichment analysis. Enrichment analysis of DEGs was performed by PANTHER [77], with a p-value cutoff of 0.05 and two interactive tools (overrepresentation test and enrichment test).
For verification of the expression patterns, 39 DEGs (16 up-and 23 downregulated) were randomly selected from the shared DEGs, and RT-qPCR was performed using the four Arabidopsis cDNA templates (i.e., ∆γvpe mutant, wild-type, and transgenic plants with BxCN10334 and BxCN10337) as above. At least three replicates for each gene and four technical replicates for each reaction were performed. The actin2 (AT3G18780) gene was used as the internal control gene. Primers are listed in Supplementary Table S12.

Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.