The Genetic Characterization of a Novel Natural Recombinant Pseudorabies Virus in China

We sequenced the complete genome of the pseudorabies virus (PRV) FJ epidemic strain, and we studied the characteristics and the differences compared with the classical Chinese strain and that of other countries. Third-generation sequencing and second-generation sequencing technology were used to construct, sequence, and annotate an efficient, accurate PRV library. The complete FJ genome was 143,703 bp, the G+C content was 73.67%, and it encoded a total of 70 genes. The genetic evolution of the complete genome and some key gene sequences of the FJ strain and PRV reference strains were analyzed by the maximum likelihood (ML) method of MEGA 7.0 software. According to the ML tree based on the full-length genome sequences, PRV FJ strain was assigned to the branch of genotype II, and it showed a close evolutionary relationship with PRV epidemic variants isolated in China after 2011. The gB, gC, gD, gH, gL, gM, gN, TK, gI, and PK genes of the FJ strain were assigned to the same branch with other Chinese epidemic mutants; its gG gene was assigned to the same branch with the classic Chinese Fa and Ea strains; and its gE gene was assigned to a relatively independent branch. Potential recombination events were predicted by the RDP4 software, which showed that the predicted recombination sites were between 1694 and 1936 bp, 101,113 and 102,660 bp, and 107,964 and 111,481 bp in the non-coding region. This result broke the previously reported general rule that pseudorabies virus recombination events occur in the gene coding region. The major backbone strain of the recombination event was HLJ8 and the minor backbone strain was Ea. Our results allowed us to track and to grasp the recent molecular epidemiological changes of PRV. They also provide background materials for the development of new PRV vaccines, and they lay a foundation for further study of PRV.


Introduction
Pseudorabies (PR), also known as Aujeszky's disease (AD), is an acute infectious disease caused by the pseudorabies virus (PRV) [1]. The disease can infect many livestock species and wild animals [2]. Pigs are the main vector for the virus. One of the main symptoms of diseased pigs is an elevated body temperature. In addition, newborn piglets mainly show neurological symptoms of encephalomyelitis, which can also invade the digestive system. Adult pigs often show recessive infection; pregnant sows can have miscarriages, stillbirths, and mummified fetuses; and boars show reproductive disorders

Concentration and Purification of the Virion
The PRV FJ strain was inoculated into a full monolayer of BHK-21 cells. After the cytopathic effect (CPE) reached 80−90%, the cell culture flask was placed in a cryogenic refrigerator at −80 • C. After freezing and thawing three times, the cell culture flask was separately packed into a 50 mL aseptic centrifuge tube, and the supernatant was centrifuged at 4 • C and 3500 rpm for 15 min. After the supernatant was sterilized and filtered by a 0.45 µm filter membrane (Millipore, Billerica, MA, USA), it was transferred to a 15 mL ultrafilter tube with a maximum cut-off of 100 kD (Millipore, Billerica, MA, USA). The tube was centrifuged at 4000× g for 30 min according to the manufacturer's instructions. After centrifugation, the concentrated liquid on the filter membrane was carefully aspirated with a 2-200 µL range pipette (Eppendorf, Hamburg, HAM, Germany).

DNA Extraction
DNA of the PRV FJ strain was extracted using the phenol-chloroform method: (1) 200 µL of the 10% SDS solution, and 15 µL of the 10 mg/mL RNase A were added in the Eppendorf tubes and incubated at 60 • C for 30 min in the metal bath (Cole-Parmer, Chicago, IL, USA). (2) 100 µL of the 10 mg/mL Proteinase K was added and incubated in a metal bath at 56 • C for 30 min. (3) The ddH 2 O was added to make up the concentrated viral solution to 400 µL, then 600 µL of phenol: chloroform: isoamyl alcohol = 25:24:1 DNA extraction reagent was added, the Eppendorf tubes were carefully inverted and mixed, then the Eppendorf tubes were stood for 5 min to make the liquid stratified. (4) These Eppendorf tubes were centrifuged for 5 min at 12,000 r/min with a microcentrifuge (Thermo Scientific, Waltham, MA, USA), and they took the supernatant to avoid aspirating to the impurities in the middle layer. (5) Repeat steps (3) and (4). (6) An equal volume of isopropanol was added, mixed lightly, and precipitated for 1 h in a refrigerator at −20 • C. (7) These Eppendorf tubes were centrifuged at 12,000 r/min for 5 min with the microcentrifuge to discard the supernatant, 800 µL of anhydrous ice ethanol was added, then 1/10 volume of 3 mol/L NaAc was added, washed with light mixing, and left for 5 min. (8) These Eppendorf tubes were centrifuged at 12,000 r/min for 5 min with the microcentrifuge at 4 • C to discard the supernatant, the precipitate was placed in a biosafety cabinet, the exhaust air was turned on and blown until there was no smell of alcohol. (9) The precipitate was carefully dissolved in 100µL TE solution.
Then the extracted DNA samples were stored in a refrigerator at −20 • C for further use.

Sequencing the Complete Genomes
The extracted PRV FJ DNA samples were sent to Wuhan BaiYi biotechnology company for complete genome sequencing. After the samples were qualified, the database was built with the PRV HLJ8 strain (National Center of Biotechnology Information [NCBI] accession number: KT824771.1) as the reference sequence. Third-and second-generation high-throughput sequencing was carried out using a PacBio RS II sequencing system and a MGISEQ-2000 sequencing system, respectively. For the PacBio RS II system, the Sequel Binding Kit 2.1, the Sequel Sequencing Kit 2.1, and the Sequel SMRT Cell 1mv2 (Pacific Biosciences, Menlo Park, CA, USA) were used for sequencing. The data were processed with the SMRT LINK 6.0 software. The read quality value in the original data was filtered. Based on the complete genome sequencing using the PacBio equipment, the obtained sequence was corrected using the MGISEQ-2000 s-generation sequencing platform. Finally, the complete PRV FJ genome sequence was assembled and annotated.

Genome and Related Gene Homology and Phylogenetic Analysis
Fifteen PRV genome sequences uploaded to NCBI (Table 1) were compared with the PRV FJ genome sequence and its major virulence, glycoprotein, and immunogenicity-related coding sequences(CDS): TK, PK, gB, gC, gD, gG, gH, gL, gM, gN, gI, and gE. A genetic evolution tree was drawn and analyzed using the MEGA 7.0 software (https://www.megasoftware.net, accessed on 25 October 2021).

Prediction of Potential Genome Recombination Events
The genome alignments from the 15 PRV reference strains and the FJ strain were analyzed with the Recombination Detection Program 4 (RDP4) software to screen for potential recombination events. Seven algorithms, including RDP, BootScan, GENECONV, Maxchi, SiScan, Chimera, and 3Seq were employed [14].

Sequence Submission
The complete PRV FJ genome sequence was deposited in the GenBank database (http://www.ncbi.nim.nih.gov/genbank, accessed on 18 October 2021) under the accession numbers of MW286330.

Complete Genome Sequence Analysis
After comparing the genome sequence assembly using the reference PRV sequences from NCBI databases, the PRV FJ complete genome length was 143,703 bp, the GC bases content was 73.67%, and it encoded a total of 70 genes without insertion and deletion of the rest of the coding sequences. The sequence was divided into four parts: UL, US, IRs, and TRs (Table 2). We annotated the linear map, gene arrangement, and distribution of the complete genome sequence using the Snapgene software ( Figure 1), and we noted the annotations of each open reading frame (ORF) ( Table 3).

Genomic Genetic Evolution Analysis
Nucleotide homology comparison between the FJ strain and the reference strains using the MEGA 7.0 software showed that the FJ strain had the highest homology with Chinese PRV mutant strains isolated after 2011, with 99.9% and 99.7% homology with classical PRV Fa and Ea strains, respectively, isolated in the 20th century in China. The homology with the other country's MY-1, Bartha, Becker, Kaplan, and Kolchis strains was 99.0%, 95.7%, 95.7%, 96.0%, and 95.7%, respectively; these values are relatively low (Table 4). It is worth noting that the MY-1 strain is an Asian strain. Note. "*" indicates that this PRV strain is the target PRV strain.
We analyzed the complete genome sequence homology of the reference strains and the FJ strain with the online program mVista (http://genome.lbl.gov/vista/mvista/submit. shtml, accessed on 3 October 2021). Compared with the Bartha strain, the FJ strain had low homology in UL56, UL51, UL27, UL36, UL41, UL28.5, UL21, and LLT genes. Except for the HB1201 strain, the other Chinese reference strains and the MY-1 strain also showed homology differences in the above regions. The other country's Becker, Kolchis, and Kaplan reference strains only showed significant homology differences in UL27, UL36, UL21, and US1 gene regions. The homology difference between HB1201 and Bartha was the greatest, and there were large base deletions in the UL56, UL27, UL21, UL36, LLT, US1, US3, and IE180 gene regions. In summary, after homology comparison with the Bartha strain, the regions with lower homology between the FJ strain and the other reference strains were mainly distributed in the non-coding region (Figure 2).
We constructed and analyzed the genetic evolution tree of the complete genome sequence of PRV strains with the maximum likelihood (ML) method. All the strains were classified into two major branches. The other country's Bartha, Becker, Kolchis, and Kaplan strains were located in the European and the North American genotype (genotype I) branch, while the Chinese strains and the MY-1 strain were located in the Asian genotype (genotype II) branch, which was consistent with the results of other reported genetic evolution analyses. The FJ strain was still located in the genotype II branch, close to the GD0304 strain branch, and it had the lowest genetic relationship with the ZJ01 strain in the genotype II branch (Figure 3).  (Table  4). It is worth noting that the MY-1 strain is an Asian strain. Note. "*" indicates that this PRV strain is the target PRV strain.
We analyzed the complete genome sequence homology of the reference strains and the FJ strain with the online program mVista (http://genome.lbl.gov/vista/mvista/submit.shtml, accessed on 3 October 2021). Compared with the Bartha strain, the FJ strain had low homology in UL56, UL51, UL27, UL36, UL41, UL28.5, UL21, and LLT genes. Except for the HB1201 strain, the other Chinese reference strains and the MY-1 strain also showed homology differences in the above regions. The other country's Becker, Kolchis, and Kaplan reference strains only showed significant homology differences in UL27, UL36, UL21, and US1 gene regions. The homology difference between HB1201 and Bartha was the greatest, and there were large base deletions in the UL56, UL27, UL21, UL36, LLT, US1, US3, and IE180 gene regions. In summary, after homology comparison with the Bartha strain, the regions with lower homology between the FJ strain and the other reference strains were mainly distributed in the non-coding region ( Figure 2). We constructed and analyzed the genetic evolution tree of the complete genome sequence of PRV strains with the maximum likelihood (ML) method. All the strains were classified into two major branches. The other country's Bartha, Becker, Kolchis, and Kaplan strains were located in the European and the North American genotype (genotype I) branch, while the Chinese strains and the MY-1 strain were located in the Asian genotype (genotype II) branch, which was consistent with the results of other reported genetic evolution analyses. The FJ strain was still located in the genotype II branch, close to the GD0304 strain branch, and it had the lowest genetic relationship with the ZJ01 strain in the genotype II branch (Figure 3).

Phylogenetic Analysis of Related Gene Sequences
We selected the coding sequences of 12 genes related to immunogenicity and virulence of PRV, including TK, PK, gB, gC, gD, gG, gH, gL, gM, gN, gI and gE genes and analyzed them using the ML method of the MEGA 7.0 software. Except for gL, the phylogenetic trees of all genes produced the typical genotype I and genotype II branches, while the gL evolutionary tree showed that the Chinese epidemic mutant HeN1 strain belonged to the European and the North American genotype I. The Becker strain belonged to genotype II. All the above genes of the FJ strain were located in the large genotype II branch, and its gB, gC, gD, gH, gL, gM, gN, TK, gI, and PK genes were in the same branch as other Chinese mutants. Its gG gene was assigned to the same branch with the classical Chinese PRV Fa and Ea strains' gG gene, while its gE gene was assigned to a relatively independent branch. All the genes of the MY-1 strain were located in the large branch of genotype II, except for TK, gL, gM, and gN; the other genes were located in a single branch compared with Chinese strains. The selected PRV FJ genes were far away from the other country's strains, and they were very close to the Chinese mutants; and, the above-mentioned immunogenicity and

Phylogenetic Analysis of Related Gene Sequences
We selected the coding sequences of 12 genes related to immunogenicity and virulence of PRV, including TK, PK, gB, gC, gD, gG, gH, gL, gM, gN, gI and gE genes and analyzed them using the ML method of the MEGA 7.0 software. Except for gL, the phylogenetic trees of all genes produced the typical genotype I and genotype II branches, while the gL evolutionary tree showed that the Chinese epidemic mutant HeN1 strain belonged to the European and the North American genotype I. The Becker strain belonged to genotype II. All the above genes of the FJ strain were located in the large genotype II branch, and its gB, gC, gD, gH, gL, gM, gN, TK, gI, and PK genes were in the same branch as other Chinese mutants. Its gG gene was assigned to the same branch with the classical Chinese PRV Fa and Ea strains' gG gene, while its gE gene was assigned to a relatively independent branch. All the genes of the MY-1 strain were located in the large branch of genotype II, except for TK, gL, gM, and gN; the other genes were located in a single branch compared with Chinese strains. The selected PRV FJ genes were far away from the other country's strains, and they were very close to the Chinese mutants; and, the above-mentioned immunogenicity and virulence-related genes were not significantly different from the previous PRV variants (Figure 4).

Recombination Analyses
We compared the FJ strain with the 15 PRV reference genome sequences using the RDP4 software (http://web.cbio.uct.ac.za/~darren/rdp.html, accessed on 14 October 2021); we predicted the recombination possibilities of the strain using Bootscan, LARD, 3seq, PhylPro, Maxchi, SiScan, and Chimaera algorithms. We detected several recombination signals for the FJ genome sequence ( Figure 5). The major backbone of the FJ strain was the HLJ8 strain; the minor backbone was the Ea strain. We analyzed the potential recombination events of the FJ complete genome sequence using the above-mentioned algorithms; the p value of each algorithm was <10 −3 . The predicted recombination sites were between 1694 and 1936 bp, between 101,113 and 102,660 bp, and between 107,964 and 11,148,1 bp; four algorithms supported the recombination events in each segment. Among them, two algorithms in the 1694-1936 bp section showed that the recombination event was credible; one algorithm in the 101,113-102,660 bp section showed that the recombination event was credible; and four algorithms in the 107,964-111,481 bp section showed that the recombination event was credible (Table 5).  The bar and the number represent the genetic distance scale of these genes at this length is 0.003.

Phylogenetic Analysis of Related Gene Sequences
We selected the coding sequences of 12 genes related to immunogenicity and virulence of PRV, including TK, PK, gB, gC, gD, gG, gH, gL, gM, gN, gI and gE genes and analyzed them using the ML method of the MEGA 7.0 software. Except for gL, the phylogenetic trees of all genes produced the typical genotype I and genotype II branches, while the gL evolutionary tree showed that the Chinese epidemic mutant HeN1 strain belonged to the European and the North American genotype I. The Becker strain belonged to genotype II. All the above genes of the FJ strain were located in the large genotype II branch, and its gB, gC, gD, gH, gL, gM, gN, TK, gI, and PK genes were in the same branch as other Chinese mutants. Its gG gene was assigned to the same branch with the classical Chinese PRV Fa and Ea strains' gG gene, while its gE gene was assigned to a relatively independent branch. All the genes of the MY-1 strain were located in the large branch of genotype II, except for TK, gL, gM, and gN; the other genes were located in a single branch compared with Chinese strains. The selected PRV FJ genes were far away from the other country's strains, and they were very close to the Chinese mutants; and, the above-mentioned immunogenicity and virulence-related genes were not significantly different from the previous PRV variants (Figure 4).

Recombination Analyses
We compared the FJ strain with the 15 PRV reference genome sequences using the RDP4 software (http://web.cbio.uct.ac.za/~darren/rdp.html, accessed on 14 October 2021); we predicted the recombination possibilities of the strain using Bootscan, LARD, 3seq, PhylPro, Maxchi, SiScan, and Chimaera algorithms. We detected several recombination signals for the FJ genome sequence ( Figure 5). The major backbone of the FJ strain was the HLJ8 strain; the minor backbone was the Ea strain. We analyzed the potential recombination events of the FJ complete genome sequence using the above-mentioned algorithms; the p value of each algorithm was < 10 −3 . The predicted recombination sites were between 1694 and 1936 bp, between 101,113 and 102,660 bp, and between 107,964 and 11,148,1 bp; four algorithms supported the recombination events in each segment. Among them, two algorithms in the 1694-1936 bp section showed that the recombination event was credible; one algorithm in the 101,113-102,660 bp section showed that the recombination event was credible; and four algorithms in the 107,964-111,481 bp section showed that the recombination event was credible (Table 5).

Discussion
Due to the large number of genes in the PRV complete genome, the high content of GC bases, and the presence of more than 900 nucleotide repeat sequences, it is relatively difficult to sequence the complete genome; research in the PRV gene function and comparative genomics had been somewhat restricted. In 2011, American researchers were the first to obtain and to publish the complete genome sequences of some representative PRV strains such as Bartha, Kaplan, and Becker using Illumina second-generation sequencing technology [3]. With the popularity of second-generation high-throughput sequencing around the world, the complete genome sequences of PRV isolates from various regions have been published in China since 2014 [15]. The advantage of second-generation sequencing is that segmented sequencing can be used to ensure the accuracy of sequencing results; the cost of sequencing at this stage is very low; and the DNA samples do not need to be of very high quality to sequence. However, there are several disadvantages,

Discussion
Due to the large number of genes in the PRV complete genome, the high content of GC bases, and the presence of more than 900 nucleotide repeat sequences, it is relatively difficult to sequence the complete genome; research in the PRV gene function and comparative genomics had been somewhat restricted. In 2011, American researchers were the first to obtain and to publish the complete genome sequences of some representative PRV strains such as Bartha, Kaplan, and Becker using Illumina second-generation sequencing technology [3]. With the popularity of second-generation high-throughput sequencing around the world, the complete genome sequences of PRV isolates from various regions have been published in China since 2014 [15]. The advantage of second-generation sequencing is that segmented sequencing can be used to ensure the accuracy of sequencing results; the cost of sequencing at this stage is very low; and the DNA samples do not need to be of very high quality to sequence. However, there are several disadvantages, including that the sequencing time is very long; the content of GC bases in PRV genomes is very high so that it is hard to completely sequence the genomes at one time; every sequencing of a complete genome will generate many gap sequences, which need to be amplified and filled by multiple pairs of primers; and the sequencing technology still needs to be innovated. In recent years, with the advent of the third-generation PacBio RSII gene sequencer, long and complex sequencing has become very convenient.
In this study, we combined second-and third-generation sequencing, an approach that provides the benefits of third-generation sequencing efficiency, ultra-long reading length, short sequencing cycle, no base preference, and no gap sequences. In addition, this approach allows the sequencing of complex structures at one time, and it makes use of MGISEQ-2000 s generation sequencing to make up for the shortcomings of low thirdgeneration sequencing flux and manual correction of sequencing errors. After sequencing, we assembled the PRV FJ complete genome quickly, accurately, and completely. The PRV FJ complete genome was 143,703 bp, had a G&C bases content of 73.67%, and encoded 70 ORFs. The length and the structural range of the complete genome sequence are consistent with the previously tested Chinese epidemic mutant HNX strain (full length = 142,294 bp, G&C bases content = 73.56%, encoding 70 ORFs) [16]; the HNB strain (full length = 142,255 bp, G&C bases content = 73.61%, encoding 70 ORFs) [17]; the TJ strain (full length = 143,642 bp, encoding 67 ORFs) [15]; and the HeN1 strain (full length = 141,803 bp, G&C bases bases content = 73.3%, encoding 69 ORF) [18]. Thus, our FJ strain sequencing results are reliable.
Among the PRV genes we selected for phylogenetic tree analysis, gB, gD, gH, gL, and gK are necessary to ensure their replication, growth, and proliferation in cells [19]. As PRV immunogenicity-related proteins, gB, gC, and gD can induce neutralizing antibody production [20]. The proteins expressed by gH and gL, gE and gI, and gM and gN genes can form heterodimers, which are related to virus infection and immune escape [21]. TK, PK, gE and gI are virulence-related genes. Single or multiple deletions or insertion mutations in these genes affect the virulence of PRV [22,23]. The gG gene encodes the only protein component released by PRV outside the virus; it was released out of the cell by protease hydrolysis, and it was disconnected from the virion after passing through the cell membrane [24]. Among the above-mentioned genes, only the gG gene of the PRV FJ strain was located in the subbranch of the classic Chinese strains identified before 2011. The reason is that the 244th base of the gG gene of the FJ strain, classic Chinese strain and other country's strains is 'T', while the corresponding base of the Chinese epidemic mutant is 'C'. The other genes were in the same subbranch with the mutants in China. Therefore, the results showed that the FJ strain has also been a common variant in China in recent years, and the genetic variation has been stable up to now.
Through the detection of the complete genome recombination events of the PRV FJ strain, we found recombinant signals in the 1694-1936, 101,113-102,660, and 107,964-111,481 bp regions, indicating that recombination occurred in the corresponding regions of the HLJ8 and the Ea strains. According to the location of coding sequence annotations in Table 3, we found the recombinant region was located in the non-coding region of UL and IRs of the FJ strain, so there is no recombination mutation event in the coding sequence. Non-coding regions in virus genomes have a variety of functions. For example, a non-coding region of Japanese encephalitis virus antagonizes the interferon response by blocking interferon regulatory factor 3 transport [25]; the replication of Marburg virus can be regulated by its non-coding region [26]. Influenza virus infection can be regulated by its non-coding region [27]. The non-coding region of the Epstein-Barr virus (EBV) plays an important role in the life cycle and the pathogenesis of EBV [28]. Natural recombination between different PRV strains has been common, as authors have reported, although the mechanism is unclear [29].
However, gene recombination in the non-coding region may affect the ability of PRV to induce interferon-beta promoter activity and regulate viral messenger RNA (mRNA) [30,31]. The major backbone strain for the recombination event was HLJ8, which is an epidemic variant strain that was isolated in China after 2011, while the minor backbone strain was Ea, a classic strain that was isolated in China in the 20th century. Hence, the FJ strain might have the ability to recombine with the Chinese epidemic variant strain and the classical strain. With regard to the cause of the natural recombination phenomenon, we speculate that the FJ wild type strain may have arisen due to natural recombination in pigs when they were immunized with the commercial vaccine with the Ea strain as the parent strain. During large-scale importation of breeding pigs in this pig farm, the cross-provincial transportation of breeding pigs caused some pigs with latent PRV infection in other areas, so it was difficult to show positive results accurately during PRV detection, while these pigs are still traded in the market. After the infection of the FJ strain, gene recombination might occur among PRV in the host.