The Canine Morbillivirus Strain Associated with An Epizootic in Caspian Seals Provides New Insights into the Evolutionary History of this Virus

Canine morbillivirus (canine distemper virus; CDV) is a worldwide distributed morbillivirus that causes sporadic cases and recurrent epizootics among an increasing number of wild, feral, and domestic animal species. We investigated the evolutionary history of CDV strains involved in the 1988 Lake Baikal (CDVPS88) and the 2000 Caspian Sea (CDVPC00) seal die-offs by recovery of full-length sequences from archived material using next-generation sequencing. Bayesian phylogenetic analyses indicated that CDVPC00 constitutes a novel strain in a separate clade (tentatively termed “Caspian”) from the America-1 clade, which is comprised of older vaccine strains. The America-1/Caspian monophyletic group is positioned most basally with respect to other clades and is estimated to have separated from other CDV clades around 1832. Our results indicate that CDVPC00 recovered from the epizootic in the Caspian Sea in 2000 belongs to a previously undetected novel clade and constitutes the most ancestral wild-type CDV clade.


Introduction
Canine morbillivirus (canine distemper virus; CDV) is a member of the genus Morbillivirus, subfamily Orthoparamyxovirinae, within the family Paramyxoviridae, and contains a negative-sense single-stranded RNA genome of 15,690 bp. This virus has a broad host range and is transmitted within and between multiple carnivore species of the order Carnivora, especially members of the families Canidae, Felidae, Mustelidae, and Procyonidae, with individual cases and more sporadic epizootics documented in the families Ursidae, Phocidae, Viverridae, and Hyaenidae [1]. The true host range of CDV remains to be determined, as large outbreaks of canine distemper have occurred in colonies of rhesus macaques (Macaca mulatta, order Primates) in China [2,3] and cynomolgus macaques (Macaca fascicularis, order Primates) in Japan [4], a few cases have been recorded in collared peccaries (Tayassu tajacu, order Artiodactyla) in the USA [5] and captive marmots (Marmota caudata, order Rodentia) in

Sample Processing
Tissue samples were processed using a viral enrichment protocol as previously described [35]. Briefly, the samples were subjected to three cycles of mechanical homogenization and freeze-thawing followed by cell debris centrifugation and filtration of bacteria using 0.45 µm spin columns. RNA was extracted from the homogenates using TRIzol (Thermo Fischer Scientific), and cDNA was reverse transcribed with Superscript IV (Thermo Fischer Scientific) using non-ribosomal hexamers. Subsequently, a random PCR amplification of the nucleic acid contents in the samples was performed with Phusion polymerase (NEB).

Generation of Full-Length Genome
The samples were further prepared following the Nextera XT DNA Library Prep Kit protocol (Illumina, San Diego, CA, USA) and sequenced on an Illumina MiSeq system using the MiSeq Reagent Kit v3 (600 cycles; Illumina). Reference assembly was performed with the software CLC Genomics Workbench 10. Genome ends were generated using rapid amplification of cDNA ends (RACE) as described previously [35]. For the 3 end or leader sequence, a poly(T) adaptor was used as forward primer and an oligonucleotide with sequence 5 -ATCACCGACCAATCTAACAAGTCTATCC-3 was used as reverse primer. For the 5 end or trailer sequence, an oligonucleotide with sequence 5 -AGTGCACTGATTAGAAAC-3 was used as forward primer and a poly(T) adaptor as reverse primer.

Phylogenetic Analyses
Sequences generated in this study were aligned to complete genomes of CDV deposited in GenBank using MAFFT [36]. Maximum-likelihood analyses on MEGA7 [37] were performed with 1000 bootstraps and general time reversible (GTR + G + I) as best-fit model according to the Bayesian information criterion. To determine the clock-likeliness of the sequences, we used TempEst software [38]. Recombination analyses were performed with the RDP4 package [39], which included the methods RDP, GeneConv, Bootscan, MaxChi, Chimera, SiScan, and 3Seq with cut-off p-value of 0.01. Sequences without temporal signal and which went through extensive recombination were excluded from the Bayesian analyses. Bayesian estimations were then carried out with BEAST v2.4.6 software [40]. The sequences were partitioned into coding regions of N, P, M, F, H, and L and non-coding sequences including leader, trailer, as well as gene start and gene end of each gene. The tests were run for 50 million generations, sampling every 1000 steps with the following priors: coalescent constant population, strict clock model rate, and Hasegawa-Kishino-Yano as substitution model. In addition, maximum-likelihood analyses of the H gene were also run 20 times with 100 bootstrap replicates and GTR model using RAxML version 8.2.10 [41].

Identification of a Novel CDV Clade in Caspian Seals
The evolutionary origins of CDV strains associated with two well-documented epizootics in pinniped species, Baikal seals in 1987-1988 and Caspian seals in 2000, were investigated in conjunction with the analysis of contemporary CDV strains circulating in wildlife in Germany. Full-length genome sequences of three Caspian seal CDV strain variants PC00-46, PC00-20, PC00-21 (GenBank accession nos. MN267064-66), one Baikal seal CDV strain variant PS88-428 (GenBank accession no. MN267063), two raccoon CDV strain variants S460, S466 (GenBank accession nos. MN267062, MN267060), and one fox CDV strain variant S272 (GenBank accession no. MN267061) were recovered by next-generation sequencing (NGS) and RACE from frozen archived infected tissue material ( Table 1). Alignment of leader and trailer sequences from these strains with previously published CDV sequences, whose genome termini identity was confirmed using RACE or NGS, showed unexpected sequence variation ( Figure 1). The CDV strains from Baikal seals and German wildlife presented a nucleotide change from A to G at genomic position 5 (g.5A>G) in the leader sequence in comparison to most other published sequences. However, this nucleotide change has also been observed previously in four other strains encompassing different species and monophyletic groups ( Figure 1). This nucleotide change may be more prevalent than sequence alignments would indicate, given that the majority of CDV termini sequences were not derived de novo but instead represent primer sequences based on previously published authentic Onderstepoort strain termini sequences which are assumed to be invariable among CDV strains. indicate, given that the majority of CDV termini sequences were not derived de novo but instead represent primer sequences based on previously published authentic Onderstepoort strain termini sequences which are assumed to be invariable among CDV strains. Additionally, we identified a large number of unique amino acid changes in the Caspian seal CDV strain, a lower number in the German wildlife carnivores, and an even lower amount in the Baikal seal strain compared to 94 full-length CDV genomes ( Table 2). Potential variability in SLAMF1 and Nectin-4 binding sites of the newly generated sequences was also investigated on basis of the crystal structure of the complex formed by MV-H interacting with marmoset SLAMF1 (91% similarity with human SLAMF1 [42]) and human Nectin-4 [43] (Tables S1 and S2). Most SLAMF1 and Nectin-4 interaction sites were found to be conserved. However, a unique change at position 191 was identified in strain PS880-428, which interacts with SLAMF1-Site III. Variablity at position 459, which interacts with Nectin-4-Site I, was also identified in several CDV lineages.  Table 2. Unique amino acid changes in the genome strains generated from this study compared to 94 CDV full-length sequences. Additionally, we identified a large number of unique amino acid changes in the Caspian seal CDV strain, a lower number in the German wildlife carnivores, and an even lower amount in the Baikal seal strain compared to 94 full-length CDV genomes (Table 2). Potential variability in SLAMF1 and Nectin-4 binding sites of the newly generated sequences was also investigated on basis of the crystal structure of the complex formed by MV-H interacting with marmoset SLAMF1 (91% similarity with human SLAMF1 [42]) and human Nectin-4 [43] (Tables S1 and S2). Most SLAMF1 and Nectin-4 interaction sites were found to be conserved. However, a unique change at position 191 was identified in strain PS880-428, which interacts with SLAMF1-Site III. Variablity at position 459, which interacts with Nectin-4-Site I, was also identified in several CDV lineages. Table 2. Unique amino acid changes in the genome strains generated from this study compared to 94 CDV full-length sequences.

Matrix
No unique changes No unique changes S202L Total 80 changes 15 changes 39 changes Maximum-likelihood phylogenetic analyses of full-length genomes ( Figure 2) and H gene sequences ( Figure 3) showed that CDVPS88-428, the oldest recovered sequence of a wild-type CDV strain, grouped within the Arctic clade, as expected on the basis of previous analyses [9,23]. However, it diverged from strains within the same clade that are circulating in Siberian tigers (Panthera tigris altaica) in Russia and in domestic dogs (Canis lupus familiaris) and red foxes in Italy ( Figure 2). Sequence comparison between the H protein of CDVPS88-428 and another CDV Baikal seal strain (GenBank accession no. CAA59357.1) from 1988 showed six amino acid changes: M26V, I27T, I84V, V103I, S268T, P410T. Caution must be applied with respect to ascribing these differences to natural variation, as the previously published hemagglutinin sequence was derived from a virus isolate which had been adapted to growth in Vero cells [33]. However, variation was also observed between Caspian CDV sequences at amino acid 410 (I/M). The strains circulating in German wild carnivores grouped within the Europe-1 clade (Figures 2 and 3) and are related to an older strain (CDV5804) found in a domestic dog in 1989, with only minor nucleotide changes identified between raccoon and fox CDV sequences. The Caspian seal CDV strain forms its own clade, tentatively named Caspian, grouping with the America-1 clade primarily comprised of vaccine and laboratory-adapted strains (Figures 2  and 3). CDVPC00 showed 89.5-93.7% sequence identity to all other CDV strains, with 80 unique amino acid changes (Table 2), making this a newly identified strain in a separate clade. Moreover, within the three recovered strain variants PC00-46, PC00-21, and PC00-21, unique genomic differences were detected. These include 16 synonymous and 3 nonsynonymous mutations when using using PC00-46 as the reference strain variant, with amino acid changes in phosphoprotein (P90L), F glycoprotein (M126T), and H glycoprotein (I410M) ( Table S3). No changes in the consensus were detected between full-length genome sequences recovered from different tissues originating from the same animal.

Divergence Dates between CDV Clades
For the estimation of divergence dates of CDV based on the complete genome, we excluded Snyder Hill and Onderstepoort vaccine-related strains (GenBank accession nos. EU726268, HM046486, KY971529, KY971531, KY971530, MF926604, MF926601, MF926602, and MF926603). These sequences were excluded from analysis because of their high similarity of approximately 99% at the nucleotide level to the modified live vaccine (MLV) strains Snyder Hill and Onderstepoort, despite the recent recorded date of collection. The sequence Phoca/Caspian/2007 (Genbank accession no. HM046486) was excluded from subsequent date divergence and evolutionary rate analyses because of its high similarity of 99.9% to the sequence Shuskiy collected in 1989 in the same region from a mink (GenBank accession no. HM063009). Moreover, both sequences are 99.7% identical at the nucleotide level to the vaccine strain CDV3 (Genbank accession no. EU726268.1). Two Onderstepoort sequences (GenBank accession nos. AF305419 and AF014953) were also excluded from analysis because of unclear date of origin in the literature. In addition to vaccine-related strains, we also excluded CDV strains that went through extensive genetic recombination events from downstream analyses (Table S4), as determined by the application of all detection methods in the package RDP4. Recombination can distort tree topology and influence mutational evolutionary analyses [44]. The strains excluded were the Asia-1 strains ZC (KJ994343), HLJ1 (JX681125); the Asia-2 strain 007Lm(AB474397); the Asia-4 strain 4-TH (MH496775); and America-2 strains 2645 (AY445077), 2654 (AY466011), 2646 (AY542312), 2601 (AY443350), and 2689 (AY649446).
Bayesian estimates indicated that the Caspian and America-1 monophyletic group diverged around 1832 (95% HPD:1818-1846) from all other CDV strains found to date, while shortly thereafter, the Caspian clade itself diverged from America-1 around 1859 (95% HPD:1849-1871) (Figure 4). We also performed a Bayesian phylogenetic analysis excluding the America-1 clade, because of the non-natural evolutionary history that MLVs usually undergo. Divergence dates appeared to be more recent when excluding America-1 clade ( Figure S3). For instance, under this scenario, separation between the Caspian clade and all other clades appeared to have taken place around 1870. As divergence dates were significantly shifted when excluding America-1, it was necessary to include this clade in the analyses because of its basal position and close relationship with the Caspian clade.    Figure S5.   Figure S2.

Evolutionary Rate of CDV
For the estimation of the substitution rate of CDV based on the complete genome, we excluded the same sequences that were previously taken out of the analysis for the Bayesian estimation of the divergence dates. The overall evolutionary rate of CDV was estimated to be 4. Several outlier strains were identified through the estimation of the temporal signal of CDV full-length genomes ( Figure S4). The analysis suggested that the Caspian strain variants PC00-46, PC00-20, and PC00-21, America-2 strains 2645, 2654, 2646, and Asia-1 strains ZC and HLJ1 were more recent than what indicated by the date given, whereas the Arctic sequence PS88-428 and America-2 sequences R252 (KF640687), 164071 (EU716337), and A75/17 (AF164967) appeared to be older than thought. In summary, the Caspian seal CDV sequences reported in this study were derived from original tissue material dating from the epizootic in 2000, with no evidence of genetic recombination and a high genetic divergence with respect to other CDV strains.

Discussion
Outbreaks of CDV infection continue to occur in domestic animals and wildlife despite the introduction of vaccines for domestic carnivores in the mid-20th century. Diverse CDV clades circulate around the world, with increased surveillance leading to the recent discovery of new clades such as America-3 and Asia-4 [14,45]. This has been facilitated by the introduction of new technologies such as NGS which enables a more rapid assembly of full-length virus genome sequences, which are more powerful for resolving phylogenetic relationships than single genes for such analyses [46]. This also helps in the identification of recombinant CDV strains which complicate Bayesian phylogenetic analysis [14,47,48] and provides a better understanding of the evolutionary background of commonly used laboratory vaccine and "wild-type" isolates of CDV. In this study, we generated full-length genomes sequences of CDV strains associated with historical outbreaks in seal species and from German wild carnivores and report the molecular characterization of a divergent novel clade basal to all other recognized CDV clades, which was associated with the CDV epizootic in Caspian seals in 2000. This supports a more limited analysis of a single partial P gene sequence recovered from the brain of an infected Caspian seal during the 1997 epizootic [30].
The origins of CDV ( Figure 5), inferred from historical literature, are much later than those of the closely related measles virus and rinderpest virus. CDV is suspected to have originated in South America and spread to Europe in the mid-18th century, with the first description of a disease syndrome resembling canine distemper, reported to be similar to measles in dogs in Peru, made in 1746 by Antonio de Ulloa [49,50]. Shortly thereafter, the disease reached Europe with a deadly disease outbreak among dogs reported in Spain in 1760. During the 1760s, the epizootic subsequently spread to other countries including France, England, Ireland, Germany, Italy, and Russia, resulting in the deaths of thousands of dogs [49]. Although it can be assumed that wildlife was exposed to CDV throughout the late 18th and 19th century, the first report of CDV in a wildlife carnivore species was in 1925 following outbreaks of distemper in captive silver foxes (V. vulpes) in the USA [51]. The first attempts to control the spread of distemper was made in 1923, following the development and testing of the first experimental vaccine [52]. The subsequent introduction of more efficacious commercial CDV vaccines for domestic dogs in the mid-20th century [53] has not prevented the continual endemic transmission of CDV in wild carnivores species, as CDV continues to cause outbreaks in wildlife worldwide with occasional spillover into domestic dogs in countries with high rates of vaccination [9,54,55]. In contrast to the successful eradication campaign of rinderpest virus, the high capacity of CDV for sustained interspecies transmission means that prospects for eradication of this disease are remote. Moreover, CDV appears to have a much wider or expanding host range than was previously recognized, given the large number of new host species that have been infected in recent years including rhesus macaques, giant pandas (Ailuropoda melanoleuca), tigers, and seals (phocid species/pinnipeds), with associated high rates of morbidity and mortality [3,29,31,56,57].
sequences and did not include the Caspian clade. It is also rather similar to the estimated rate of 4.8 × 10 −4 determined by a study which only used the H gene [64]. However, the substitution rate calculated in this study is slower in comparison with other rates (7.41-11.35 × 10 −4 [8,65]) that were estimated by using only the H gene, which is under high selection pressure. The F and H glycoproteins had the fastest substitution rates, as would be expected from their role in virus-host binding and cell entry, implying that selective pressures act principally on these proteins. Genetic mutations in these proteins may be expected to contribute to virus jump and adaptation to new hosts. However, although specific amino acid positions at 530 (multiple changes) and Y549H in the H glycoprotein have been linked to interspecies transmission [66] via modulation of receptor interactions, no supportive evidence via mechanisitc in vitro assays is avaliable thus far.  CDV strains belonging to the Arctic clade appear to be widely dispersed geographically, as Arctic or Arctic-like strains have been detected in Greenland, Italy, North America, and Iran [55,[58][59][60]. The CDV outbreak in Lake Baikal was most likely generated by a spillover from terrestrial carnivores in the vicinity. Our analysis indicates that the Baikal seal CDV strain sits most basally to all other recognized strains in the Arctic clade and has a different evolutionary pathway compared to the more recent strains found in Italy [61]. It was previously reported that the Arctic strain has become endemic in Lake Baikal as it continues to circulate among Baikal seals [62]. It would therefore be interesting to analyze and compare recent Baikal seal CDV sequences to the 1988 Baikal seal CDV sequence to identify possible nucleotide changes that might indicate adaptation to the host species and to provide an additional reference point to better assess the evolutionary rate of CDV.

Conclusions
Our results, which include the newly identified Caspian clade, indicate an earlier date of divergence (early-19th century) between this America-1/Caspian cluster and all other CDV strains found to date, compared to previous estimations of 1855 [14] and 1886 [63], in which the Caspian clade was absent or was based only on the hemagglutinin gene, respectively. The findings in this study support the hypothesis that recent CDV strains emerged from the America-1/Caspian clades in the early 19th century. The America-1 clade is comprised of MLVs, which were developed more than 50 years ago. The Snyder Hill and Onderstepoort strains were the first MLVs developed and have not been discovered endemically circulating in wildlife in the last half-century. More recently reported strains in the America-1 clade appear to be isolated cases representing derivatives of MLVs introduced into the wild, rather than direct wild-type decedents of the wild-type ancestor of all America-1 clade viruses. Therefore, the relationship between America-1 and Caspian clades is intriguing. Our analysis suggests that they had a common ancestor in approximately 1859, after which they separated, and the Caspian strain was possibly kept circulating in unknown reservoir species, which eventually infected the Caspian Sea pinnipeds. Given the high degree of divergence between the Onderstepoort and Caspian clades (6.35%) and the absence of many distinctive Onderstepoort specific genetic signatures, it is unlikely that the Caspian strain is an evolved MLV. The close relationship of the Baikal strain to other strains within the Arctic clade also suggests that infection of seals and subsequent intraspecies transmission do not require the accumulation of extensive mutations.
The overall evolutionary rate of CDV was estimated to be 4.24 × 10 −4 nucleotide substitutions per site per year (subs/site/year). This rate is in a similar range to the previously calculated substitution rate of 2.46 × 10 −4 [14], which was estimated using a more limited number of CDV full-length sequences and did not include the Caspian clade. It is also rather similar to the estimated rate of 4.8 × 10 −4 determined by a study which only used the H gene [63]. However, the substitution rate calculated in this study is slower in comparison with other rates (7.41-11.35 × 10 −4 [8,64]) that were estimated by using only the H gene, which is under high selection pressure. The F and H glycoproteins had the fastest substitution rates, as would be expected from their role in virus-host binding and cell entry, implying that selective pressures act principally on these proteins. Genetic mutations in these proteins may be expected to contribute to virus jump and adaptation to new hosts. However, although specific amino acid positions at 530 (multiple changes) and Y549H in the H glycoprotein have been linked to interspecies transmission [65] via modulation of receptor interactions, no supportive evidence via mechanisitc in vitro assays is avaliable thus far.

Conclusions
Whereas the Baikal strain of CDV is closely related to other strains within the Arctic clade, we conclude that the Caspian clade is the most basal CDV clade recognized to date and has a different evolutionary history compared to all other CDV strains. The true reservoir, origins, and geographical distribution of this strain remain largely unknown. Enhanced CDV surveillance and molecular characterization of CDVs from wildlife species from central Asia are therefore warranted to determine if the Caspian clade is restricted to Caspian seals in the Caspian Sea or if strains from this clade are also present in terrestrial carnivores, especially in regions of Azerbaijan and Kazakhstan adjacent to the coastal sites where the epizootics of 1997 and 2000 originated. Increased focus is also warranted on the development of robust in vitro assays and additional in vivo assays to better understand the interspecies transmission of CDV, especially with respect to the use of heterologous CD150s and Nectin 4s and the identification of additional viral and host factors restricting cross-species infections. Given that eradication of CDV is not currently a viable prospect, obtaining a better understanding of CDV cross-species infections will better inform assessments of the risk of zoonotic infections in the event of a successful measles eradication campaign.