1. Introduction
African swine fever virus (ASFV), the only member of the family
Asfarviridae [
1], contains a linear, double-stranded DNA molecule genome that ranges in length from about 170 to 194 kbp, showing inverted terminal repeats (ITRs) and covalently closed ends [
2,
3,
4]. The viral genome includes a conserved central region (CCR) and two variable ends, which results in some variation in size among strains [
2,
3,
4,
5]. The first ASFV whole genome sequence [
6] was completed by the Sanger method on the cell culture-adapted Ba71V strain. Since then, several technologies have been implemented for the sequencing of ASFV populations, including Roche 454 GS FLX, Illumina HiScanSQ, MiSeq, HiSeq, NextSeq500 and Nanopore [
5,
7,
8,
9,
10,
11,
12,
13,
14,
15]. However, these technologies depend on high-quality parameters of sequence data to assess the reliability of the results. In fact, low coverage may lead to misinterpretation of the data and underestimation of the variant frequency. Many parameters, in particular mean coverage, also deal with efficiency of viral DNA purification and quality of the sample. An important source of the low coverage seen in next-generation sequencing (NGS) of ASFV genomes is mainly related to the problem of eukaryotic DNA contamination, which has been approached by different strategies including animal infections, purification from animal blood, non-specific DNA amplification or probe-mediated viral DNA enrichment designed with known ASFV-sequences [
10,
12,
15,
16].
ASFV is the etiological agent of African swine fever [
17,
18,
19], a serious disease affecting both wild boar and domestic pigs, which lastly emerged from East Africa to the Caucasus in 2007, where it spread to affect 28 countries in Europe, Oceania and Asia, including China [
20,
21]. The virus is now endemic in China and is currently affecting neighboring countries such as Vietnam, Laos, Myanmar, Korea and the Philippines. The situation is economically dramatic, unbalancing the food chain and representing one of the most important social and industrial animal health concerns worldwide. The rapid spread of the disease proved eradication and regionalization measures to be insufficient for control of the current epidemic.
The development of effective vaccines is currently being attempted by several labs [
22,
23,
24] and indeed is urgently required. Among the different vaccine strategies, one of the most realistic option is the generation of live attenuated vaccines (LAVs), which are usually based either on naturally attenuated ASFV strains or on genetic manipulation of virulent strains in order to experimentally attenuate virulence [
25,
26,
27,
28]. An exhaustive genetic characterization using cutting-edge technologies must be performed, both on the parental stocks and the final LAV prototypes. Vaccine prototypes must be characterized not only to confirm the introduced genetic modifications (mostly deletion of specific genes) [
28] but extended to the whole genome [
25,
27]. This approach allows the identification of several genetic factors, such as off-target modifications, parental contamination and the presence of possible subpopulations and/or minor viral variants in the stocks, which eventually could also play a role in the overall safety of LAVs.
Here, we describe for the first time an improved viral DNA purification technique from extracellular viral particles and high-coverage NGS analysis for the genomic characterization of the Arm/07 stock. The Arm/07 stock presented at least two distinct sub-populations: Arm/07/CBM/c2, grouped within genotype II with 99.992% sequence identity to the updated Georgia 2007/1 reference strain [
12] and the clone Arm/07/CBM/c4, which showed a high level of heterogeneity within left and right genome ends and presented identity with genotype I strains, ranging from 99.605% (BA71V), 99.503% (E75) to 96.542% (Mzuki_1979). Remarkably, a single deletion of Arm/07/CBM/c4 at 75,213 position within the EP402R gene produced a frameshift that shortened the N-terminal region of this protein compared to other genotype I virulent strains. Therefore, the overall sequence of EP402R from Arm/07/CBM/c4 differed from both virulent and attenuated genotype I strains and from genotype II strains, resulting in a unique EP402R sequence. Furthermore, Arm/07/CBM/c4 showed impaired ability to control cGAS-STING pathway in vitro, similar to NH/P68 attenuated strain, while Arm/07/CBM/c2 prevented STING and IRF3 activation.
2. Materials and Methods
2.1. Cells and Viruses
The hemadsorbing ASFV isolate Arm/07 (genotype II) was obtained from an epizootic of domestic pigs in Armenia in 2007. The isolate was propagated in three passages in porcine blood monocytes (PBMs) according to the OIE Manual (2019). The low virulent, non-hemadsorbing ASFV NH/P68 (genotype I) isolated in Portugal was obtained in COS-1 cells. Viruses were grown in porcine alveolar macrophages (PAMs) or porcine blood monocytes (PBMs), in DMEM (Dulbecco Modified Eagle Medium) supplemented with 10% pig serum (Sigma, St. Louis, MO, USA) as previously described [
29]. African green monkey kidney cells (COS-1) obtained from the American Type Culture Collection (CLR-1650 ATCC), were cultured in DMEM with 5% FBS. Cells were grown at 37 °C in a 5% CO
2 atmosphere saturated with water vapor in culture medium supplemented with 2 mM L-glutamine, 100 U/mL gentamycin and 0.4 mM nonessential amino acids.
2.2. Viral DNA Extraction for NGS Analysis
The Arm/07 virus stock and the selected Arm/07 clones were grown in six P100 dishes of PAMs, with supernatants collected at 3 days post-infection and centrifuged at 8281× g o/n at 4 °C. Pellets were resuspended in cold, filtered 10 mM Tris-HCl (pH 8.8), then treated with 0.25 U/µL DNAse I (Sigma), 0.25 U/µL Nuclease S7 (Sigma) and 20 µg/mL RNAse A (Promega) in 800 mM Tris-HCl (pH 7.5), 200 mM NaCl, 20 mM CaCl2 and 120 mM MgCl2 for 2 h at 37 °C and further incubation with 12 mM EDTA (Sigma) and 2 mM EGTA (Sigma) for 10 min at 75 °C. After that, the solution was treated with 200 µg/mL proteinase K (Sigma) in 0.5% SDS for 1 h at 45 °C, then viral DNA was precipitated by incubating 1:1 with phenol:chloroform:isoamyl alcohol (25:24:1). After centrifugation at 9400× g for 3 min at RT, the aqueous fraction was transferred and further incubated with 0.1 volume of 3 M acetic acid (pH5.2), 1 µL LPA (Sigma) and 2 volumes of cold 100% ethanol for 1 h at −80 °C. After centrifugation at 15,890× g for 30 min at 4 °C, supernatants were discarded and pellets were washed once with cold 70% ethanol and dried on air before finally being resuspended in 10 mM Tris (pH 8.8).
2.3. Isolation of Viral Clones from Arm/07 Stock by Plaque Purification
For the isolation of independent clones from Arm/07 stock, we plated approximately 103 viruses per well of 6-well plates in COS-1 cells. After adsorption for 1.5 h, carboxymethyl cellulose (CMC) with 2% FBS DMEM was added. After 7 days, the appearance of lysis plaques was identified by optical microscope and collected by sterile tips in 40 µL of DMEM and conserved at −80 °C. After three freeze/thaw cycles, extracted virus were used to infect new COS-1 cells with the same procedure explained above. After three rounds of purification, individual clones were amplified and grown in PAMs.
2.4. Hemoadsorption (HAD) Assay
PAM (Porcine Alveolar Macrophages) cells were seeded in 12-well plates and mock-infected or infected with either Arm/07/CBM/c2 and Arm/07/CBM/c4 clones at MOI 0.5 in DMEM with 10% pig serum. After 16 hpi, a solution of fresh pig erythrocytes (2 µL of pig erythrocyte sediment per mL of 10% pig serum DMEM) was added to every well. 24 h after pig erythrocytes addition, rosettes were observed under Leica DM IL LED microscope coupled to a Leica DFC3000G camera (Leica Microsystem, Wetzlar, Germany).
2.5. Western Blot Analysis
PAM cells were cultured as indicated and mock-infected or infected with either Arm/07/CBM/c2 or Arm/07/CBM/c4 (or NH/P68) at MOI 0.5. Infected cells were collected at 16 hpi, washed with PBS and lysed using radioimmunoprecipitation assay (RIPA) buffer supplemented with protease and phosphatase inhibitors (Roche, Basel, Switzerland). Samples were kept at 4 °C for 30 min, sonicated and centrifuged for 10 min at 15,890× g at 4 °C. Supernatants were collected and quantified using BCA assay and 20 µg of each sample Samples were resolved by sodium dodecyl sulfate polyacrylamide gel electrophoresis (SDS-PAGE) and transferred to Immobilon-P membranes (Merck Millipore, Burlington, MA, USA). The membranes were incubated with the following specific primary antibodies: anti-P72 (1:2000, generated in CBMSO), anti-CD2v (1:2000, generated in CBMSO), anti-P32 (S-1D8) (1:6000, kindly provided by S.-Y. Sunwoo), anti-p-STING (Ser366) rabbit monoclonal antibody from Cell Signaling (1/1000), anti-p-IRF3 (Ser396) (4D4G) rabbit monoclonal antibody from Cell Signaling (1/1000) and anti-actin (1:1000, Santa Cruz Biotechnology sc-47778, Dallax, TX, EEUU) diluted in Tris-buffered saline (TBS) supplemented with 1% milk. Membranes were washed three times with TBS and exposed 1 h to specific peroxidase-conjugated secondary antibodies: anti-rabbit and anti-mouse immunoglobulin G coupled to peroxidase (1/5000 and 1/2000, respectively) from Amersham Biosciences (Little Chalfont, UK). and anti-m-IgGκ secondary antibody (1/1000) from Santa Cruz Biotechnology. Chemiluminescence detection was performed using ECL Prime (Amersham Biosciences, Little Chalfont, UK).
2.6. Arm/07/CBM/c2 and Arm/07/CBM/c4 Growth Curves
To elucidate the difference in behavior between both Arm/07 clones, the growth rate of each individual clone was calculated. Two 12-well plates were seeded with PAM at a density of 1.6 × 106 PAMs per well and infected with either Arm/07/CBM/c2 or Arm/07/CBM/c4 at MOI = 0.1. After 2 h of adsorption at 37 °C, the viral inoculum was discarded and cells were washed two times with PBS. DMEM containing 10% pig serum was added and cells were incubated for 0, 24, 48 and 72 h post-infection (hpi) at 37 °C and 5% CO2.
At each time point, cells were collected and centrifuged 5 min at 250× g at room temperature. 100 µL of the supernatant was stored as extracellular virus (EV) and the rest of the supernatant was homogenized with the pellet, submitted to three freeze/thaw cycles and centrifuged 5 min at full speed to precipitate cellular debris. The supernatant was stored as total virus (TV) fraction.
For viral titration, we performed hemadsorption (HAD) assay. Individual 60-well microtest plaques (Greiner) were seeded with PAMs, using a concentration of 106 PAMs per ml DMEM with 10% pig serum using 1 mL for each plate. For infection, 5 µL of the corresponding viral dilution were added. Cells were infected for 16 h and then a solution of fresh pig erythrocytes (2 µL of pig erythrocyte sediment per ml of 10% pig serum DMEM media) was added to every well. Hemadsorption was assessed 96 h after the addition of erythrocytes and the resulting growth curve was plotted using GraphPad prism software.
2.7. Nanopore and Illumina Sequencing and Data Analysis
High-quality genomic DNA (100 ng) was submitted to MicrobesNG (Birmingham, UK). Illumina libraries were prepared with an NEBNext Ultra DNA Library Prep Kit (New England Biolabs). DNA samples were fragmented in a Covaris instrument and sequenced on an Illumina MiSeq device as paired-end (2 × 250 bp) reads. Illumina reads were trimmed using Trimmomatic [
30] (v0.39). Long reads were sequenced in a GridION instrument from Oxford Nanopore Technologies.
The quality analysis of short and long reads was performed with FastQC [
31] (v0.11.8).
For variant calling analysis, the BWA-MEM tool (Burrows-Wheeler Alignment using MEM algorithm) [
32] was used to align Illumina reads against the Georgia 2007/1 reference sequence (accession number LR743116.1). Estimated average coverage was calculated based on the percentage of mapped reads and genome size. To separate the two putative subpopulations, bamsplit.py (
https://github.com/luntergroup/bamsplit) was used, a Python 3 tool for splitting a BAM file by reads supporting different haplotypes present in a VCF (Variant Call Format) file.
Picard Tools (
https://broadinstitute.github.io/picard/) was used to remove read duplicates and the pre-processed alignment files were then used for the variant calling process with GATK [
33] (v4.1.2). Numbers of SNPs (Single Nucleotide Polymorphism) and indels were determined and characterized by their location in coding and non-coding regions, as well as by synonymous or nonsynonymous SNPs. Due to the low read mapping quality at the genome ends, variants within the ITR (Inverted Terminal Repeat) regions were not included in the analysis. Minor genetic variants were detected using VarScan [
34] (v2.3.9), setting a minimum variant allele frequency above 0.02. Genetic variants were annotated using SnpEff [
35] software (v4.3t). For the allele balance distribution plot along the genome, the R package vcfR [
36] was employed.
De-novo assembly of Illumina reads was generated with SPAdes [
37] (v3.14.0) and the contigs obtained were compared with the Georgia 2007/1 reference using BLASTn (v2.9.0+) in the command line. After filtering the assembly, contigs were extended and scaffolded with SSPACE-standard [
38] software. To polish the assembly, reads were mapped to the new extended contigs and manually curated to obtain a single contig of 190,145 bp.
The assembly of Arm/07/CBM/c4 genome was performed following the ONT (Oxford Nanopore Technologies, Oxford, UK) assembly and Illumina polishing pipeline by Oxford Nanopore Technologies (available in
https://github.com/nanoporetech/ont-assembly-polish). First, Nanopore reads were assembled using Canu [
39] (v1.9). The contigs obtained were polished with Racon software [
40] (v1.4.16). Illumina reads were mapped onto the polished contigs using BWA-mem and were then used for the correction of the assembly with Pilon (v1.20) [
41]. Nanopore reads were aligned against the final genome using Minimap2 tool (v2.11-r797) [
42].
The genome assemblies were annotated using PROKKA [
43] (v1.13), a tool to annotate prokaryotic genomes and GATU [
44], a genome annotation transfer tool based on a closely related organism.
In order to identify differences between genotype II and I strains and Arm/07/CBM/c2 or Arm/07/CBM/c4 genomes, NCBI (National Center for Biotechnology Information) BLASTn with standard parameters and Nucmer [
45] (v 4.0.0beta2) were used. Nucmer application was used to discover genetic variants. The identification of modified CDS (Coding Sequence) was accomplished with SnpEff software, used to annotate genetic variants and a subsequent manual identification using IGV viewer [
46].
To analyze the differences in EP402R, MGF-110-11L, MGF-110-14L and ASFV_G_ACD_00350 genes between different strains, gene sequences were obtained from NCBI database for each published strain. These sequences were aligned using ClustalW software [
47]. Alignment were visualized using Snapgene software (from Insightful Science).
2.8. Phylogenetic Analysis
ASFV sequences corresponding to the central conserved region (comprising 129 kb between A224L and I196L genes) from 18 genotype II strains, 23 ASFV sequences corresponding to genotype I strains and 2 ASFV sequences corresponding to genotype X strains (sequences were downloaded from the NCBI database. A multi-fasta file including all 43 sequences downloaded plus Arm/07/CBM/c2 and Arm/07/CBM/c4 genomic sequences was generated and aligned using MAFFT software (v.7.390) with automatic settings.
The generated alignment file was used to build a phylogenetic tree of all downloaded ASFV strains with IqTree software using maximum-likelihood method [
48]. We used standard parameters, which identified the most suitable model based on the alignment and ultra-fast bootstrapping of 10,000 times. The most suitable model found by IqTree software was K3Pu+F+R2 (three substitution types model and equal base frequencies + empirical base frequencies + free rate model 2). The tree was rooted using Archaeopteryx, exported to Newick format and modified using Dendroscope software (University of Tübingen, Tübingen, Germany).
For the recombination analysis, the alignment file generated with MAFFT was used. The Recombination Detection Program (RDP, v Beta 5.05) [
49] was employed to detect potential recombination events. The default parameters of six methods (RDP, GENECONV, Bootscan, Maxchi, Chimaera, SiSscan) were used and only those events supported by a
p-value below 0.05 found by at least two methods were reported (see
Supplementary Table S2).
2.9. PCR and Sanger Sequencing
Conventional PCR for ITR-containing contig placement in Arm/07/CBM/c2 assembled genome at both ends and EP402R indel verification in Arm/07/CBM/c4 assembled genome was performed with Phusion High-Fidelity PCR Master Mix with HF Buffer (Thermo Sicentic, USA) and the following primers: 5′-AAACTTTCATATTGGTAACTTGTTC-3′ and 5′-TATTCGCACTAAAGTGCTATGTTAC-3′; 5′- AGTGAAGATCTATAGCTACGCCTTA-3′ and 5′-TATTCGCACTAAAGTGCTATGTTAC-3′; and 5′-AGCGCGAATTCGCCACCATGATAATAATAGTTATTTTTTTAATGTG-3′ and 5′-ATATGTTCTATTAAATATTTCTGTATTGTTAGG-3′ respectively. Corresponding bands (618 bp, 541 bp and 293 bp respectively) were purified from a 1% agarose gel with Speedtools PCR Clean-Up kit (Biotools, Madrid, Spain). The purified products were subsequently sequenced by the Sanger method (Macrogen, Seoul, Korea).
2.10. Data Availability
Read files and assembled genome Arm/07/CBM/c2 together with the annotation have been deposited in the European Nucleotide Archive (ENA) under the study accession number PRJEB38146. Raw reads from clones 1 and 3 have been deposited under the study accession PRJEB40011. Read files and assembled genome Arm/07/CBM/c4 together with the annotation are available under the study accession PRJEB40012.
4. Discussion
One of the most realistic short-term strategies for ASFV vaccine development is implementation of LAVs based on the deletion of specific genes from virulent strains [
22,
23,
24]. Thus, exhaustive genetic characterization of both prototypes and parental strains is essential for LAV generation and whole genome sequencing carries obvious advantages over sequencing of individual genes [
28].
Interest and technological advances in ASFV genome sequencing have grown over the past few years, allowing for cheaper and more accurate results. Low sequence coverage in older technologies can mask variants or even viral subpopulations in stocks collected from ASFV-infected animals. As shown in
Table 1, the proportion of viral DNA in a given extract is a key factor affecting mean coverage in NGS studies. It has been previously reported that low coverage resulted from high levels of eukaryotic DNA contamination [
7,
9,
14,
55], which increases the threshold of detection of real nucleotide differences. To avoid this problem, specific PCR-Sanger sequencing (SPSS) has been used to resolve ambiguous or poorly covered areas in the genome, which is time-consuming and may overlook non-homologous subpopulations. As an alternative, several attempts to improve viral DNA purification have been employed for ASFV-NGS, including strain-specific experimentally infected pigs, followed by isolation and sequencing of the virus from the blood of the infected animals [
5,
8,
15]. This method presents obvious economic and animal welfare concerns. There are some other techniques for enrichment of viral vs. cellular DNA, either by nonspecific amplification of DNA [
16] or by removal of methylated DNA [
56]. Viral DNA capture using specific probes based on known genotype-specific ASFV sequences has also been used as a method to obtain pure viral DNA [
12], which in the case of mixed viral populations would still select only the DNA sequences displaying the specific DNA sequences bound by the probes, thus missing any other information present in the sample.
In order to improve the accuracy of the methods to guarantee the nature of the viral stocks that we will use to develop recombinant vaccine prototypes, we present here that use of extracellular virions as the source for viral DNA purification led to a very high viral/eukaryotic DNA ratio (>85%). We obtained greater depth in sequence coverage, sufficient to identify different viral populations within a single stock. In addition, our methodology was able to identify minor variants, defined as variants with an allelic frequency between 2 and 50%, within a single clone (Arm/07 clone 1,
Supplementary Figure S1). Minor variants may identify minor sub-populations that could play a role in clinical outcomes of LAV prototypes generated from cell passage. Although other studies have also shown the use of cell-free virus as a genomic source, high percentages of viral DNA and coverage depth necessary for robust data were not finally obtained [
7,
9]. It is uncertainly whether the differences lie in the NGS technology used (Roche 454 GS FLX) [
9] or possibly to other difficulties in the accomplished methodology [
7].
In our hands, the Arm/07 stock, which was thought to be composed of a homogeneous viral population, was unexpectedly found to include viral genomic heterogeneity. In order to characterize the viral populations detected, we further pursued isolation of individual clones using plaque purification in COS-1 cells. It is largely known that growth of ASFV strains in cells other than their natural targets, that is, PBMs/PAMs, can induce genomic modifications. For instance, sequencing of the whole genome of Ba71-adapted in Vero cells revealed a multitude of changes compared to its parental strain [
6,
57]. Other ASFV strains have also undergone genetic mutation when grown in certain cell lines, such as Vero or CV1 [
58,
59]. Importantly, we have verified that isolation of Arm/07/CBM/c2 by three passages in COS-1 did not generate variations in its sequence. This finding is further supported by the fact that a low number of passages (typically three) in COS-1 cells did not produce any remarkable alteration in the genome of ASFV NH/P68 (
Supplementary Table S1). In addition, it has previously been reported that infection of COS-1 cells did not induce genome modifications after 20 passages of a Ba71 virulent-derived LAV [
27].
This study describes for the first time the sequence of the ASFV Arm/07 genome, using a workflow with an emphasis on non-biased viral DNA purification. Starting from cell-free infection supernatants enabled us to obtain high quality sequences and coverage depth while minimizing cellular DNA contamination, revealing an unexpected heterogeneity in an ASFV Arm/07 stock which might have been missed using standard NGS workflows [
12]. Plaque isolation confirmed the existence of at least two distinct viral populations within the original Arm/07 stock. The origin of these viral populations is currently unknown. It is plausible that these populations co-infected the same animal. Indeed, co-infections by two different isolates belonging to the same genotype, in a single animal have been previously found [
60].
The first sub-population was represented by the clone named Arm/07/CBM/c2 and has been fully characterized. The sequence identity that this clone Arm/07/CBM/c2 shares with the lastly updated sequence Georgia 2007/1 [
12], the currently circulating member of genotype II, is not surprising, as both are geographically close isolates and ASFV is a large DNA virus whose mutation rate is expected to be low. The second isolated clone, named Arm/07/CBM/c4, has been also characterized, showing a CCR homology compatible with ASFV genotype I. The Arm/07/CBM/c4 presented a high heterogeneity at both ends of the genome but only the variants that were fixed in the assembled genome (supported by ~100% of the reads) were taken into account for pairwise comparison with other genotype I strains.
From these fixed variants, Arm/07/CBM/c4 presented 23 unique variants compared with all genotype I strains displayed in
Table 8, thus making Arm/07/CBM/c4 a distinctive strain. This fact may discard the possibility of a laboratory contamination with any of the known genotype-I strains commonly used in the ASFV labs. It is further noteworthy that Arm/07/CBM/c4 compared to some of the most widely strains used in the ASFV field labs, such as Ba71V, OURT 88/3 and NH/P68, the number of 179, 181 and 187 fixed variants were respectively found (see
Table 8). Moreover, the in silico analysis revealed that recombination events, in vivo and/or in vitro, might have a role in the origin of Arm/07/CBM/c4. Even more interestingly, a single mutation found at the N-terminal region of the EP402R gene (CD2v) of Arm/07/CBM/c4, further assessed by NGS and Sanger, induced a frameshift variant that shortened the N-terminal domain of CD2v. The overall sequence of CD2 of Arm/07/CBM/c4 resulted to be unique, which is relevant since we also demonstrated that this virus is HAD+. Hemadsorption and virulence are two concepts that have been traditionally linked in ASFV [
61,
62]. Studies concerning the CD2v sequence and structure of Arm/07/CBM/c4 linked to HAD might be of great interest for the ASFV community. Not only that but also, our results showed that both clones have different ability to modulate innate immune response in vitro. While Arm/07/CBM/c2 is able to counteract STING and IRF3 phosphorylation, in Arm/07/CBM/c4-infected cells we detected pSTING and pIRF3 in a similar manner that the attenuated NH/P68. These results suggest that Arm/07/CBM/c4 induced the cGAS-STING pathway and may provoke an attenuated phenotype
in vivo, indicating that that may be an adequate model for LAV development in the near future, although in vivo experiments should be performed in order to verify this hypothesis. Due to the heterogeneity in the left and right ends of Arm/07/CBM/c4, a region that encodes genes reported to be important in immune evasion [
63,
64], individual Arm/07/CBM/c4-derived clones will be obtained, which may be useful in future vaccine clinical trials based on their special characteristics and putative alteration in degree of virulence that will be analyzed in pigs in the near future.