First Draft Genome Assembly of Root-Lesion Nematode Pratylenchus scribneri Generated Using Long-Read Sequencing

Root-lesion nematodes (genus Pratylenchus) belong to a diverse group of plant-parasitic nematodes (PPN) with a worldwide distribution. Despite being an economically important PPN group of more than 100 species, genome information related to Pratylenchus genus is scarcely available. Here, we report the draft genome assembly of Pratylenchus scribneri generated on the PacBio Sequel IIe System using the ultra-low DNA input HiFi sequencing workflow. The final assembly created using 500 nematodes consisted of 276 decontaminated contigs, with an average contig N50 of 1.72 Mb and an assembled draft genome size of 227.24 Mb consisting of 51,146 predicted protein sequences. The benchmarking universal single-copy ortholog (BUSCO) analysis with 3131 nematode BUSCO groups indicated that 65.4% of the BUSCOs were complete, whereas 24.0%, 41.4%, and 1.8% were single-copy, duplicated, and fragmented, respectively, and 32.8% were missing. The outputs from GenomeScope2 and Smudgeplots converged towards a diploid genome for P. scribneri. The data provided here will facilitate future studies on host plant-nematode interactions and crop protection at the molecular level.


Introduction
Root-lesion nematodes (RLN) are recognized as the top three plant-parasitic nematode groups of the world and lie behind root-knot (Meloidogyne) and cyst nematodes (Heterodera) in terms of economic importance [1]. Unlike sedentary endoparasites (Meloidogyne and Heterodera), nematodes in the genus Pratylenchus do not establish permanent feeding sites (giant cells or syncytia) and move freely between soil and roots throughout their lifecycle. This feeding behavior results in the development of characteristic reddish-brown necrotic lesions on root tissue, which construct pathways for secondary invaders, such as bacteria, fungi, etc., resulting in severe yield losses [1,2]. Pratylenchus spp. have a cosmopolitan distribution and are known to parasitize more than 400 host plant species. They have a wide host range and impact crops of major economic importance, such as potato, wheat, corn, and soybean [3]. The use of chemical nematicides is a common management strategy for controlling RLNs, but their toxic effects on non-target organisms and environmental health call for alternative control strategies based on novel gene targets [4].
The root-lesion nematode, Pratylenchus scribneri Steiner 1943, is found commonly associated with the major crops grown in the northern Great Plains region of North America [5][6][7]. Considering the importance of P. scribneri, several molecular detection methods for identifying and quantifying this nematode in soil and roots have been developed to enhance the management decisions and practices [8][9][10]. More recently, it has been reported as an important nematode pest infecting corn, soybean, wheat, and tomato in China [11][12][13][14].
In Nigeria, yield losses in corn have been reported to be between 26 to 37% based on the population densities of P. scribneri [15].
Despite the ubiquity and significant importance of Pratylenchus species in agriculture, the availability of genome information on these species is limited as compared to the sedentary PPN. So far, the genome sequence of only one Pratylenchus species, P. coffeae, is available, and the published genome size of 19.7 Mb is the smallest among the metazoans [16]. This raises questions regarding its entirety, especially due to the lack of a complete physical map. Additionally, the genomes of Pratylenchus spp. are expected to have complex ploidy similar to root-knot nematodes (Meloidogyne spp.) [17], but this is still an unexplored area. Ploidy determination could provide important insights into the developmental and evolutionary biology of Pratylenchus spp. Moreover, the genome data for different Pratylenchus spp. are needed to help better understand the parasitism mechanism of the nematodes in this genus. Jointly, this information could help reveal novel gene targets for employing specific control strategies for P. scribneri and other related species. In light of this, the main objective of the current study is to generate a contiguous and relatively complete annotated genome of one of the important root-lesion nematodes, P. scribneri, and make it publicly available. We also explored two different genomic DNA isolation methods to prepare sequencing quality DNA for P. scribneri, and the advantages of one over the other have been presented.

Library Preparation and Assembly
The raw PacBio HiFi sequencing yields and assembly metrics for the libraries prepared using two different DNA isolation methods are presented in detail in Table 1. The library resulting from DNA extracted using 500 handpicked nematodes (Method 2) had greater sequencing yield (19.3 Gb) as compared to Method 1 (14.9 Gb), which involved DNA isolation directly from nematode suspension. Fewer contigs were assembled in Method 2. The contig N50 increased from 222 kb in the first method to 1.70 Mb in the second method. The average GC% was 37.6% and 32.8% for sequencing libraries prepared using Method 1 and Method 2, respectively.

Assembly Decontamination and Completeness Analysis
The Blobplot in Figure 1 represents the preliminary contig taxonomic assignments, contig read coverage, contig lengths, and contig GC% obtained for the two DNA extraction methods. Figure 1a,b show the Blobplot analyses from the "core" nematode genome contigs as red points, and that the library constructed using Method 2 showed >30× P. scribneri read coverage than the library constructed using Method 1. In Method 2, out of the 39 contigs classified into phyla other than Nematoda, only one contig was determined to be a bacterial contaminant sequence which was removed prior to further genome analysis. All others had very weak hits to a variety of phyla tested or no hits at all. The assembly obtained from Method 2 was used for further analyses and subject to haplotig purging. The final purged decontaminated assembly contained 276 contigs summing to~227 Mb. The BUSCO scores for raw, purged, and decontaminated assembly are provided in Table 2. The final decontaminated assembly had a completeness score of 65.4%, with 24.0% being single copy, 41.4% duplicated, 1.8% fragmented, and 32.8% missing BUSCOs out of the total 3131 orthologs used.

Assembly Decontamination and Completeness Analysis
The Blobplot in Figure 1 represents the preliminary contig taxonomic assignments, contig read coverage, contig lengths, and contig GC% obtained for the two DNA extraction methods. Figure 1 a,b show the Blobplot analyses from the "core" nematode genome contigs as red points, and that the library constructed using Method 2 showed >30× P. scribneri read coverage than the library constructed using Method 1. In Method 2, out of the 39 contigs classified into phyla other than Nematoda, only one contig was determined to be a bacterial contaminant sequence which was removed prior to further genome analysis. All others had very weak hits to a variety of phyla tested or no hits at all. The assembly obtained from Method 2 was used for further analyses and subject to haplotig purging. The final purged decontaminated assembly contained 276 contigs summing to ~227 Mb. The BUSCO scores for raw, purged, and decontaminated assembly are provided in Table  2. The final decontaminated assembly had a completeness score of 65.4%, with 24.0% being single copy, 41.4% duplicated, 1.8% fragmented, and 32.8% missing BUSCOs out of the total 3131 orthologs used.
(a) (b) Figure 1. Blobplot contaminant analysis of Pratylenchus scribneri assembly obtained using two DNA extraction methods, before removing contaminants. The contig length is represented by circles proportionally scaled and colored by taxonomic annotation based on BLAST similarity search results. Nematode genome contigs are represented by red circles for: (a) Method 1 (library prepared from 8000 to 10,000 mixed-stage P. scribneri in nematode suspension obtained from carrot culture); (b) Method 2 (library prepared from 500 hand-picked P. scribneri juveniles and adults). Contigs are placed based on the GC proportion (X-axis) and the coverage of reads (Y-axis). The legend box on the top right of each Blobplot provides a color-coded description of the different taxonomic groups identified.

Genome Ploidy Estimation and Structural Annotation
The P. scribneri genome ploidy estimations were made using the final assembly constructed using Method 2. After manually setting the estimated monoploid peak from GenomeScope and Smudgeplot, the models and output converged on ploidy = 2. The presence of two major peaks in GenomeScope output and coverage and distribution of k-mer pairs into two bright smudges in Smudgeplot pointed towards a diploid genome for P. scribneri (Figure 2).

Discussion
Recently, long-read PacBio HiFi sequencing has emerged as a powerful tool to generate high-quality de novo genome assemblies. However, the technique has not been explored to its full potential for microorganisms such as nematodes, due to the relatively high DNA input requiring a larger number of nematodes. Here, we present, for the first For predicting proteins, the BUSCO analysis was run in "long mode" on the final decontaminated assembly. The predicted proteins (single copy (23.5%)/multi-copy (42.7%)/fragmented (1.4%)) were then merged together into 3966 protein sequences as additional protein input for BRAKER annotation. The structural annotation ran using BRAKER and TSERBA with RNA-seq and protein evidence resulted in a total of 51,146 predicted protein sequences. The BUSCO scores for the TSEBRA-combined BRAKER2 predicted protein had 72% complete BUSCOs (single copy: 12.0%, duplicate: 60.0%, fragmented: 1.2%) and 26.8% missing BUSCOs.

Discussion
Recently, long-read PacBio HiFi sequencing has emerged as a powerful tool to generate high-quality de novo genome assemblies. However, the technique has not been explored to its full potential for microorganisms such as nematodes, due to the relatively high DNA input requiring a larger number of nematodes. Here, we present, for the first time, the de novo draft genome assembly of the root-lesion nematode Pratylenchus scribneri constructed using 5 ng of input DNA from 500 nematodes as an example. In this study, the libraries prepared from two different DNA extraction methods were sequenced using the PacBio ultra-low DNA input protocol which uses as low as 5 ng of DNA as input and therefore, significantly fewer nematodes. The library prepared using DNA extracted from 500 handpicked nematodes (Method 2) was much cleaner (fewer contaminants) as compared to the library prepared from DNA extracted directly from nematode suspension (Method 1). This could be attributed to the presence of carrot tissue and other organisms in the library prepared using the nematode suspension obtained directly from carrot cultures. As a result, the number of assembled contigs specific to different taxonomical groups were reduced from 9.5 k in Method 1 to 492 in Method 2. It is difficult to get the sequencing library from nematodes grown on carrots to be completely free of any contaminants. For optimal results, the handpicking of nematodes required for Method 2 should be carried out in a laminar flow hood under sterile conditions, but Method 2 is also tedious due to the handpicking of 500 nematodes, which could be a limitation for preparing the nematode sample for DNA extraction.
There are only a few reports available on the de novo genome assembly construction of small organisms based on the ultra-low DNA input protocol. Kingan et al. [18] generated a high-quality de novo genome assembly using a single mosquito with the ultra-low DNA protocol. DNA isolation for this study was performed using the MagAttract HMW kit as used in Method 2 for the current study. However, no reports on de novo genome assembly of a plant-parasitic nematode using the ultra-low DNA input protocol are available thus far. The use of MagAttract HMW kit for genomic DNA isolation for whole genome sequencing has mostly been observed for bacterial pathogens and insects [18][19][20]. In the present study, the library prepared using 5 ng of input DNA from 500 active P. scribneri individuals (Method 2) was sequenced with a single PacBio SMRT cell 8 M chip on the Sequel IIe System. Nearly 2.7 million reads producing 19.3 Gb of raw data with mean read length of 7 kb were obtained and the genome size was predicted to be around 227 Mb. Similar raw sequencing yield results were obtained for the sand fly (Phlebotomus papatasi) sequenced on the Sequel IIe System using just 5 ng of input DNA following PacBio's ultralow DNA input protocol (https://www.pacb.com/wp-content/uploads/Korlach-PAG-2020-High-Quality-PacBio-Insect-Genome-from-5-ng-of-Input-DNA.pdf, accessed on 30 November 2022).
The BUSCO completeness scores for the final decontaminated assembly tested against 3131 orthologs were fairly average and the presence of more than 65% complete genes in our study are comparable with previous studies [21,22]. However, the gene duplication content of our assembly was higher, and could be related to the likely repetitive gene content of P. scribneri genome and usage of orthologs from different reference datasets in other studies on PPNs [19,22]. This in part may also explain the presence of fewer complete and duplicated BUSCOs in genome assemblies of other migratory endoparasitic nematodes such as Radopholus similis and Bursaphelenchus xylophilus [22,23]. Additionally, the genome assembly of P. scribneri generated in this study had a GC content of 32.8%, which is consistent with other PPNs sequenced on a PacBio RSII platform [24,25]. However, the completeness and quality of the set of 51,146 predicted proteins in P. scribneri assessed using BUSCO showed 72% completeness. The BUSCO scores for predicted proteins were improved over the genome assembly; however, duplicated complete genes were still high (60%). Similar observations related to higher duplicated genes in the protein set of a root-knot nematode Meloidogyne enterolobii were noted by Koutsovoulos et al. [26]. The 227.2 Mb genome assembly size of P. scribneri is much bigger than 19.6 Mb of P. coffeae, the only Pratylenchus species sequenced and reported so far [16]. The above observation is supported by the underlying genome organization of P. scribneri containing a fair amount of gene duplication. This study suggests a diploid genome for P. scribneri which is consistent with ploidy estimations of another migratory endoparasitic nematode, B. xylophilus [27]. Further studies on comparative and functional genomics analysis are needed to validate these results and take a deeper look into the common genes and their functions between the two species.
To the best of our knowledge, this is the first report of de novo draft assembly of the P. scribneri genome generated using the PacBio ultra-low DNA input protocol. The new workflow described here will facilitate the sequencing and assembly of new and existing species of plant-parasitic nematodes that are important for crop production. In vitro culturing of nematodes is a common technique to increase pure nematode population. The DNA preparation and sequencing method described here is recommended for sequencing genome of the nematodes that are difficult to be mass-produced. Overall, the strategy of hand-picking nematodes under sterile conditions performed better in our study and resulted in an almost contaminant free library and can be considered for other endoparasitic and ectoparasitic plant-parasitic nematodes. The long-read-based sequencing and genome assembly of P. scribneri would enable the identification of parasitism genes/effectors involved in the host and nematode interaction mechanism. Additionally, the comparative genome analysis of Pratylenchus species with that of sedentary endoparasites could facilitate studies associated with evolutionary and lifestyle mechanisms of plant-parasitic nematodes based on the effectors present in different groups.

Nematode Collection, DNA Isolation, and Evaluation
Pratylenchus scribneri isolate was originally collected from a potato field (in rotation with corn) located in Sargent County, ND and, thereafter, maintained aseptically on carrot discs in our lab as suggested by Lawn and Noel [28]. For preparing nematode suspension, the inoculated carrot discs in the Petri plates were cut into thin slices using a sterile razor blade. The carrot pieces were left suspended in water for 3 h for the nematodes to move out of the carrot tissues. The nematodes along with the carrot tissues were then passed through a 60-mesh sieve placed on a 635-mesh sieve. The nematodes retained on the 635-mesh sieve were later collected into a 50 mL tube using a wash bottle containing distilled water with 50 mg/L carbenicillin and 50 mg/L kanamycin. Starting with nematode suspension, two different methods were used to isolate P. scribneri genomic DNA. In the first method, a 1 mL nematode suspension containing 8000 to 10,000 mixed-stage individuals of P. scribneri was ground to a fine powder in liquid nitrogen. The crushed nematodes were then subject to DNA extraction using the DNAeasy Plant Mini Kit (Qiagen, Hilden, Germany). The DNA quality was assessed using a ND-1000 Nanodrop spectrophotometer (Thermo Scientific, Waltham, MA, USA) and the samples with 260/280 above 1.8 were selected for further analyses (Method 1). For the second method, 500 P. scribneri adults and juveniles from the original nematode suspension were handpicked and transferred under a microscope on a sterile glass slide containing 100 µL double-distilled water with antibiotics (50 mg/L carbenicillin and 50 mg/L kanamycin). Using a 200 µL pipette, the nematodes were then transferred to a 1.5 mL tube and centrifuged at 4000 rpm at 4 • C for 10 min. The nematode pellet was snap frozen in liquid nitrogen and stored in a −80 • C freezer (Method 2). For subsequent DNA extraction and sequencing, the frozen sample was sent off to the Roy J. Carver Biotechnology Center, University of Illinois at Urbana-Champaign on dry ice through overnight shipping. The nematode pellet was then pulverized under liquid nitrogen. Two hundred microliters of CTAB (OPS Diagnostics, Lebanon, NJ, USA) were added, mixed with the powder, and the sample was incubated at 60 • C for 60 min and cooled to room temperature. The MagAttract HMW DNA (Qiagen, Hilden, Germany) kit was used for the rest of the procedure, with some modifications. Briefly, proteinase K and RNAase were added to the mixture and incubated overnight at room temperature. The mixture was cleaned up with 1× volume of chloroform/isoamyl alcohol, gently inverted 10 times to mix, and incubated at room temperature for 5 min, followed by centrifugation for 2 min at 10,000× g. The supernatant was transferred to a new 1.5 mL tube, and magnetics beads from the MagAttract kit were added, washed twice, and the DNA was eluted from the beads in AE buffer at 37 • C for 30 min. The DNA was quantitated with Qubit using the High-Sensitivity dsDNA kit (ThermoFisher Scientific, Waltham, MA, USA) and the integrity was evaluated in a Femto Pulse system (Agilent, Santa Clara, CA, USA).

Library Preparation and Sequencing
Five nanograms of purified genomic DNA were sheared with Megaruptor 3 (Diagenode, Denville, NJ, USA) to an average size of 10 kb. The sheared gDNA was amplified using the PacBio Ultra-Low DNA Input kit following the manufacture's recommendations (Pacific Biosciences, Menlo Park, CA, USA). The amplified gDNA was converted into a PacBio library with the SMRTbell Express Template Prep kit version 2.0 (Pacific Biosciences, Menlo Park, CA, USA). Briefly, the sheared genomic DNA was first added to an enzymatic reaction to remove single-stranded overhangs followed by treatment with repair enzymes to repair the DNA ends. The ends of the repaired, double-stranded fragments were then tailed with an A-overhang. After that, the T-overhang SMRTbell adapters were ligated and the SMRTbell library was purified with two rounds of AMPure PB bead clean up (Pacific Biosciences, Menlo Park, CA, USA). The library was quantitated with Qubit and then run on a Femto Pulse to confirm the presence of DNA fragments of the expected size. The library was sequenced on one SMRT cell 8 M on a PacBio Sequel IIe system with 30 h movie time. The circular consensus analysis was performed in real time in the instrument with SMRT Link V10.1 software (Pacific Biosciences, Menlo Park, CA, USA) using default parameters, which include 3 minimum passes and a minimum accuracy of 99%.

Genome Ploidy Estimations
To estimate the ploidy level of the genome, KMC v3.1.1 [38] was used to count k-mers (k = 21) in the filtered PacBio HiFi reads (>5 kb). The k-mer histogram was analyzed by GenomeScope 2 [39] with ploidy tested from p = 2 to p = 6. Haplotypic and duplicate genome content were also analyzed using heterozygous kmer pairs with the tool Smudgeplot [34].

Genome Structural Annotation
RepeatMasker v4.1.2 (http://www.repeatmasker.org, accessed on 5 July 2022) was used to soft-mask repeats from the final purged and decontaminated genome assembly file. Two separate runs of BRAKER2 [40] used RNA-seq and protein evidence with default parameters. The raw P. scribneri RNA-seq reads (read length = 100 bp) used for the above analysis were generated in our previous study through paired-end sequencing of cDNA libraries prepared from good quality total RNA extracts of several thousand preparasitic and parasitic nematodes on an Illumina NovaSeq 6000 sequencer (unpublished work). The reads were trimmed with Fastp v0.20.0 [41] and then aligned to the masked genome assembly using STAR v.2.7.10a [42]. Protein evidence was provided using NCBI's Nematoda RefSeq dataset containing 18,060 protein sequences. In addition, BUSCO v5.3.2 running in "long mode" [43] was used to predict protein sequences using the 3131 "lineage nematoda_odb10" orthologs from the final purged/decontaminated/unmasked genome assembly file. The outputs from the two independent BRAKER runs were then combined with the tool TSEBRA v.1.0.3 [38] to obtain GTF and amino acid formats using Cufflinks v2.2.1 [44].  Data Availability Statement: All the raw PacBio sequence data supporting the results of this article as well as the genome assembly have been deposited at the NCBI with the following accession numbers: BioProject: PRJNA932437; BioSample: SAMN33192627; Genome Assembly: JAQSEB000000000; SRA raw data: SRP421680/SRS16720223/SRX19322091/SRR23381582. The data will be made publicly available with the published version of this manuscript.