Cowpox virus: What’s in a Name?

Traditionally, virus taxonomy relied on phenotypic properties; however, a sequence-based virus taxonomy has become essential since the recent requirement of a species to exhibit monophyly. The species Cowpox virus has failed to meet this requirement, necessitating a reexamination of this species. Here, we report the genomic sequences of nine Cowpox viruses and, by combining them with the available data of 37 additional genomes, confirm polyphyly of Cowpox viruses and find statistical support based on genetic data for more than a dozen species. These results are discussed in light of the current International Committee on Taxonomy of Viruses species definition, as well as immediate and future implications for poxvirus taxonomic classification schemes. Data support the recognition of five monophyletic clades of Cowpox viruses as valid species.


Introduction
This study examined a group of non-cellular organisms, which consist of proteins and genetic material. These organisms are capable of invading and replicating within living cells. The preceding sentences would have been more concise if the name of these organisms (viruses) had been used instead of a description. We often use names to facilitate the communication or categorization of topics or objects; however, names only assist communication when all parties utilize the same moniker. Unfortunately, common names vary between languages and regions. In one example from the southeastern United States, both pocket gophers (fossorial mammals) and salamanders (amphibians) are frequently referred to as salamanders [1,2].
Fortunately, the hierarchical nature of scientific names, along with the disallowance for multiple organisms to use the same official name, can drastically reduce these potential miscommunications. The International Congress on Taxonomy of Viruses (ICTV; [3]) is the governing body for recognizing official names of viruses, and defines a species as "a monophyletic group of viruses whose properties can be distinguished from those of other species by multiple criteria" [4]. This recent definition change has placed more emphasis on common descent, which should lead to a better understanding of the true diversity and the evolutionary history of viruses [5]. These new guidelines for species recognition will have implications not only for newly discovered virus species, but also for currently and historically recognized species. groups have set a divergence threshold based on amino acid or nucleotide data, which must be met for an isolate to represent a new species (Filovirus; Arenavirus), but have not taken intraspecific genetic variation into consideration. Given the nature of diversification, the amount of intraspecific genetic variation present within lineages should be considered when determining the extent of inter-lineage genetic variation necessary to warrant species level recognition [5,[36][37][38][39].
It is widely accepted that gene trees and species trees are not necessarily identical. This may be a result of incomplete lineage sorting or selection pressures (among other things), depending upon the study organism. Phylogenetic analyses of poxviruses have generated highly-supported, yet contradictory topologies depending upon the genome region examined [26,28]. Therefore, when considering genetic variation in poxviruses, it is necessary to determine which genomic region(s) (i.e., one gene, multiple genes, conserved coding region, entire coding region, whole genome including inverted terminal repeats) should be examined to provide the 'true' evolutionary relationships. Some authors have utilized the entire coding region excluding inverted terminal repeats [27], whereas others have examined a smaller, more conserved central region of the genome [26], and yet others have analyzed various subsets of protein coding genes [28,30,40]. Each potential dataset's pros and cons should be considered from a biological perspective to determine if any phylogenetic assumptions might be violated. Genetic distance calculations between isolates and species are expected to decrease when comparing whole genomes, conserved central regions, or concatenated open reading frame datasets; however, utilizing sequences further from the variable termini should reduce any expectation of recombination issues.
Some researchers [26][27][28] have noted the polyphyly of Cowpox viruses, but have not set forth recommendations regarding how to split the species into a number of monophyletic species to conform to the ICTV regulations. Additionally, no researchers to date have examined the species Cowpox virus using statistical delimitation tools. Given these data, the purpose of this research was to: (1) increase the geographic sampling of CPXV genomes; (2) conduct genomic level phylogenetic analyses; (3) use a coalescent approach to run species tree and species delimitation analyses; (4) consider the validity of the species Cowpox virus as currently understood given the new ICTV definition of a virus species; and (5) determine if variability in the dataset (genome region) examined has the potential to affect the monophyly of Cowpox virus lineages.

Culturing, Sequencing, Assembly, and Annotation
Whole genome nucleotide sequences from nine CPXV isolates collected from localities throughout the United Kingdom (UK) and Norway were generated for this study. CPXV strains were grown on African Monkey Kidney cell line MA 104. Suspensions were freeze-thawed three times and DNA was extracted using the automated DNA extraction system MagNA Pure Compact (Roche Diagnostics, Penzberg, Germany). DNA quantification was measured using fluorometric system Qubit 2.0 (Life Technologies, Darmstadt, Germany), and purity was estimated by nanodrop (Thermo Scientific, Darmstadt, Germany).
Sequence libraries of isolated genomic DNA were prepared using TrueSeq PE library preparation kit (Illumina, San Diego, CA, USA). One nanogram of final library prep was used as input for the Illumina HiSeq 2000 system at Otogenetics (Atlanta, GA, USA). Reads were mapped to the reference genome CPXV strain Brighton Red (NC_003663.2) and variant calling was done using bowtie2 [41], samtools [42] and VarScan [43]. The Genome Annotation Transfer Utility (GATU [44]) was used for annotation of genomes. The criteria for annotation included a cut-off of at least 200 nt and 90% nt similarity with CPXV strain Brighton Red. After the transfer of annotation, assigned open reading frames of all genomes were visually inspected and corrected manually, if needed. All generated genomes were deposited into NCBI GenBank as a BioProject (PRJNA369073). Accession numbers are included in Table 1.

Phylogenetic Analysis
Forty-six CPXV strains (nine generated for this study) were examined along with 11 other OPXV genomes, representing all recognized species of Old World orthopoxviruses. Localities, accession numbers, and other pertinent data for all examined strains are available in Table 1. MAFFT v.7.125 [57] was used to generate the multiple sequence alignment using the FFT-NS-2 algorithm through the auto-select option. Given the concern of recombination at the variable termini [26,58], and the potential for recombination to affect the outcome of phylogenetic analyses [59], two datasets were examined for phylogenetic and species delimitation purposes. Dataset 1 (150,718 bp) included the entire coding portion of the alignment (genes C23L-B29R) with gap columns removed; whereas dataset 2 (95,410 bp) examined a smaller, conserved central region (E2L-A32L) with gap columns removed. It is worth noting the C23L-B29R alignment was identical to the whole genome alignment with gaps stripped. The most appropriate model of molecular evolution was selected with the Akaike information criterion using MrModelTest (v.2.2 [60]) and PAUP* (version 4.0b10 [61]) for each dataset. Outgroup selection was made based on phylogenetic relationships of Old World orthopoxviruses generated by previous studies and the availability of whole genome data [62,63].
Phylogenetic analyses were conducted using MrBayes v.3.2.2 [64] with the following settings: substitution model = GTR, rate variation = invgamma, gamma catagories = 4, chain length = 10 million generations, samplefreq = 1000, nruns = 2, nchains = 4, burnin = 2500, and outgroup was set to ECTV. MEGA (v.6.06 [65]) was used to calculate average pairwise distances (uncorrected p), whereas patristic distances (tree branch lengths) were calculated from the consensus tree using the program Geneious (v.7.1.7; Biomatters Inc., Aukland, New Zealand). Genetic distances were averaged across taxa to produce a single value for each inter-or intra-clade estimate. For thoroughness, a RAxML analysis starting with a completely random tree was run for each dataset using 1000 rapid bootstrap iterations followed by a search for the best scoring MLtree under the GTR + I + G model of molecular evolution using the RAxML plugin in Geneious [66].

Species Delimitation and Species Tree Analyses
The software Bayesian Phylogenetics and Phylogeography (BP&P v.3.1 [39,67]) was utilized to calculate statistical support simultaneously for species delimitation and species tree in a Bayesian framework. The smallest monophyletic groupings from the MrBayes topologies with divergence levels approximately equal to the divergence between Taterapox virus (TATV) and Camelpox virus were each examined for putative species level divergence. The average within species genetic distance prior, Viruses 2017, 9,101 numbers, and other pertinent data for all examined strains are available in Table 1. MAFFT v [57] was used to generate the multiple sequence alignment using the FFT-NS-2 algorithm thr the auto-select option. Given the concern of recombination at the variable termini [26,58], an potential for recombination to affect the outcome of phylogenetic analyses [59], two datasets examined for phylogenetic and species delimitation purposes. Dataset 1 (150,718 bp) include entire coding portion of the alignment (genes C23L-B29R) with gap columns removed; wh dataset 2 (95,410 bp) examined a smaller, conserved central region (E2L-A32L) with gap col removed. It is worth noting the C23L-B29R alignment was identical to the whole genome align with gaps stripped. The most appropriate model of molecular evolution was selected with the A information criterion using MrModelTest (v.2.2 [60]) and PAUP* (version 4.0b10 [61]) for dataset. Outgroup selection was made based on phylogenetic relationships of Old W orthopoxviruses generated by previous studies and the availability of whole genome data [62,6 Phylogenetic analyses were conducted using MrBayes v.3.2.2 [64] with the following set substitution model = GTR, rate variation = invgamma, gamma catagories = 4, chain length = 10 m generations, samplefreq = 1000, nruns = 2, nchains = 4, burnin = 2500, and outgroup was set to E MEGA (v.6.06 [65]) was used to calculate average pairwise distances (uncorrected p), wh patristic distances (tree branch lengths) were calculated from the consensus tree using the pro Geneious (v.7.1.7; Biomatters Inc., Aukland, New Zealand). Genetic distances were averaged a taxa to produce a single value for each inter-or intra-clade estimate. For thoroughness, a RA analysis starting with a completely random tree was run for each dataset using 1000 rapid boo iterations followed by a search for the best scoring MLtree under the GTR + I + G model of mole evolution using the RAxML plugin in Geneious [66].

Species Delimitation and Species Tree Analyses
The software Bayesian Phylogenetics and Phylogeography (BP&P v.3.1 [39,67]) was utiliz calculate statistical support simultaneously for species delimitation and species tree in a Bay framework. The smallest monophyletic groupings from the MrBayes topologies with diver levels approximately equal to the divergence between Taterapox virus (TATV) and Camelpox were each examined for putative species level divergence. The average within species g distance prior, Ө, for BP&P was estimated based on two multiple sequence alignments of genomic regions; 45 MPXV isolates examined by Nakazawa et al. [68], and 49 VARV is (GenBank accession numbers: L22579 [69], NC_001611 [54], Y16780 [45], DQ437580-DQ437592 DQ441416-DQ441448 [47]) using MEGA v.6.06 [65] with pairwise deletion and uncorrect distance settings. Estimated values for MPXV (0.169%) and VARV (0.137%) were averaged an value (0.153%) was used as the mean of the gamma distribution for the Ө prior. The τ prio calculated as the average genetic distance (uncorrected p) of all in-group taxa to ECTV (3.61% 3.01% for datasets 1 and 2, respectively). All BP&P analyses were run in duplicate with random to check convergence. Each dataset was analyzed with their respective τ parameter (τ 2 = 3 10,000[β], τ 1 = 361[α], 10,000[β]) with three Ө priors (ӨA = 1[α], 654[β]; ӨB = 2, 654; ӨC = 2, 327), w ӨA is based on the calculated average intra-species distance of MPXV and VARV, ӨB is twic average, and ӨC is four times the calculated average to provide sequentially more conserv delimitation estimates. Both MrBayes topologies were used as the starting species tree topolog each dataset. Based on Bayesian inference analyses, 14 CPXV lineages were examine consideration as putative species, and algorithm 0 was used for species delimitation with initia tune parameters set to 0.00019, 0.00007, 0.00339, 0.00033, 0.01521, 0.33, 1.0. All analyses were run a burn-in of 20,000, a sample frequency of 5, and the total number of samples was set to 100,00 , for BP&P was estimated based on two multiple sequence alignments of large genomic regions; 45 MPXV isolates examined by Nakazawa et al. [68], and 49 VARV isolates (GenBank accession numbers: L22579 [69], NC_001611 [54], Y16780 [45], DQ437580-DQ437592, and DQ441416-DQ441448 [47]) using MEGA v.6.06 [65] with pairwise deletion and uncorrected p-distance settings. Estimated values for MPXV (0.169%) and VARV (0.137%) were averaged and this value (0.153%) was used as the mean of the gamma distribution for the 5 of 14 nent data for all examined strains are available in Table 1. MAFFT v.7.125 the multiple sequence alignment using the FFT-NS-2 algorithm through ven the concern of recombination at the variable termini [26,58], and the on to affect the outcome of phylogenetic analyses [59], two datasets were ic and species delimitation purposes. Dataset 1 (150,718 bp) included the the alignment (genes C23L-B29R) with gap columns removed; whereas mined a smaller, conserved central region (E2L-A32L) with gap columns g the C23L-B29R alignment was identical to the whole genome alignment ost appropriate model of molecular evolution was selected with the Akaike ng MrModelTest (v.2.2 [60]) and PAUP* (version 4.0b10 [61]) for each tion was made based on phylogenetic relationships of Old World d by previous studies and the availability of whole genome data [62,63]. s were conducted using MrBayes v.3.2.2 [64] with the following settings: , rate variation = invgamma, gamma catagories = 4, chain length = 10 million 1000, nruns = 2, nchains = 4, burnin = 2500, and outgroup was set to ECTV. used to calculate average pairwise distances (uncorrected p), whereas anch lengths) were calculated from the consensus tree using the program ters Inc., Aukland, New Zealand). Genetic distances were averaged across value for each inter-or intra-clade estimate. For thoroughness, a RAxML mpletely random tree was run for each dataset using 1000 rapid bootstrap arch for the best scoring MLtree under the GTR + I + G model of molecular L plugin in Geneious [66].

d Species Tree Analyses
n Phylogenetics and Phylogeography (BP&P v.3.1 [39,67]) was utilized to rt simultaneously for species delimitation and species tree in a Bayesian monophyletic groupings from the MrBayes topologies with divergence al to the divergence between Taterapox virus (TATV) and Camelpox virus putative species level divergence. The average within species genetic &P was estimated based on two multiple sequence alignments of large XV isolates examined by Nakazawa et al. [68], and 49 VARV isolates bers: L22579 [69], NC_001611 [54], Y16780 [45], DQ437580-DQ437592, and ]) using MEGA v.6.06 [65] with pairwise deletion and uncorrected ped values for MPXV (0.169%) and VARV (0.137%) were averaged and this as the mean of the gamma distribution for the Ө prior. The τ prior was genetic distance (uncorrected p) of all in-group taxa to ECTV (3.61% and , respectively). All BP&P analyses were run in duplicate with random seeds ch dataset was analyzed with their respective τ parameter (τ 2 = 301[α], 00[β]) with three Ө priors (ӨA = 1[α], 654[β]; ӨB = 2, 654; ӨC = 2, 327), where ated average intra-species distance of MPXV and VARV, ӨB is twice that times the calculated average to provide sequentially more conservative th MrBayes topologies were used as the starting species tree topology for Bayesian inference analyses, 14 CPXV lineages were examined for species, and algorithm 0 was used for species delimitation with initial fine 0019, 0.00007, 0.00339, 0.00033, 0.01521, 0.33, 1.0. All analyses were run with ple frequency of 5, and the total number of samples was set to 100,000.
prior. The τ prior was calculated as the average genetic distance (uncorrected p) of all in-group taxa to ECTV (3.61% and 3.01% for datasets 1 and 2, respectively). All BP&P analyses were run in duplicate with random seeds to check convergence. Each dataset was analyzed with their respective τ parameter (τ 2 = 301[α], 10,000[β], τ 1 = 361[α], 10,000[β]) with three 5 of 14 strains are available in Table 1. MAFFT v.7.125 ignment using the FFT-NS-2 algorithm through bination at the variable termini [26,58], and the f phylogenetic analyses [59], two datasets were n purposes. Dataset 1 (150,718 bp) included the L-B29R) with gap columns removed; whereas d central region (E2L-A32L) with gap columns nt was identical to the whole genome alignment olecular evolution was selected with the Akaike 0]) and PAUP* (version 4.0b10 [61]) for each n phylogenetic relationships of Old World the availability of whole genome data [62,63]. rBayes v.3.2.2 [64] with the following settings: a, gamma catagories = 4, chain length = 10 million 4, burnin = 2500, and outgroup was set to ECTV. e pairwise distances (uncorrected p), whereas ated from the consensus tree using the program ealand). Genetic distances were averaged across tra-clade estimate. For thoroughness, a RAxML run for each dataset using 1000 rapid bootstrap Ltree under the GTR + I + G model of molecular . ogeography (BP&P v.3.1 [39,67]) was utilized to cies delimitation and species tree in a Bayesian from the MrBayes topologies with divergence een Taterapox virus (TATV) and Camelpox virus ivergence. The average within species genetic on two multiple sequence alignments of large Nakazawa et al. [68], and 49 VARV isolates 611 [54], Y16780 [45], DQ437580-DQ437592, and 5] with pairwise deletion and uncorrected p-%) and VARV (0.137%) were averaged and this a distribution for the Ө prior. The τ prior was cted p) of all in-group taxa to ECTV (3.61% and alyses were run in duplicate with random seeds with their respective τ parameter (τ 2 = 301[α], (ӨA = 1[α], 654[β]; ӨB = 2, 654; ӨC = 2, 327), where distance of MPXV and VARV, ӨB is twice that age to provide sequentially more conservative re used as the starting species tree topology for lyses, 14 CPXV lineages were examined for as used for species delimitation with initial fine 033, 0.01521, 0.33, 1.0. All analyses were run with total number of samples was set to 100,000.  Table 1. MAFFT v.7.125 quence alignment using the FFT-NS-2 algorithm through of recombination at the variable termini [26,58], and the outcome of phylogenetic analyses [59], two datasets were elimitation purposes. Dataset 1 (150,718 bp) included the genes C23L-B29R) with gap columns removed; whereas , conserved central region (E2L-A32L) with gap columns R alignment was identical to the whole genome alignment model of molecular evolution was selected with the Akaike st (v.2.2 [60]) and PAUP* (version 4.0b10 [61]) for each e based on phylogenetic relationships of Old World udies and the availability of whole genome data [62,63]. ed using MrBayes v.3.2.2 [64] with the following settings: invgamma, gamma catagories = 4, chain length = 10 million nchains = 4, burnin = 2500, and outgroup was set to ECTV. te average pairwise distances (uncorrected p), whereas ere calculated from the consensus tree using the program d, New Zealand). Genetic distances were averaged across nter-or intra-clade estimate. For thoroughness, a RAxML m tree was run for each dataset using 1000 rapid bootstrap t scoring MLtree under the GTR + I + G model of molecular eious [66]. alyses s and Phylogeography (BP&P v.3.1 [39,67]) was utilized to ly for species delimitation and species tree in a Bayesian groupings from the MrBayes topologies with divergence ence between Taterapox virus (TATV) and Camelpox virus es level divergence. The average within species genetic ted based on two multiple sequence alignments of large mined by Nakazawa et al. [68], and 49 VARV isolates ], NC_001611 [54], Y16780 [45], DQ437580-DQ437592, and v.6.06 [65] with pairwise deletion and uncorrected p-PXV (0.169%) and VARV (0.137%) were averaged and this the gamma distribution for the Ө prior. The τ prior was (uncorrected p) of all in-group taxa to ECTV (3.61% and ll BP&P analyses were run in duplicate with random seeds analyzed with their respective τ parameter (τ 2 = 301[α], Ө priors (ӨA = 1[α], 654[β]; ӨB = 2, 654; ӨC = 2, 327), where tra-species distance of MPXV and VARV, ӨB is twice that lated average to provide sequentially more conservative ologies were used as the starting species tree topology for ence analyses, 14 CPXV lineages were examined for orithm 0 was used for species delimitation with initial fine 0339, 0.00033, 0.01521, 0.33, 1.0. All analyses were run with 5, and the total number of samples was set to 100,000.  Table 1. MAFFT v.7.125 rate the multiple sequence alignment using the FFT-NS-2 algorithm through . Given the concern of recombination at the variable termini [26,58], and the ation to affect the outcome of phylogenetic analyses [59], two datasets were netic and species delimitation purposes. Dataset 1 (150,718 bp) included the of the alignment (genes C23L-B29R) with gap columns removed; whereas examined a smaller, conserved central region (E2L-A32L) with gap columns oting the C23L-B29R alignment was identical to the whole genome alignment e most appropriate model of molecular evolution was selected with the Akaike using MrModelTest (v.2.2 [60]) and PAUP* (version 4.0b10 [61]) for each election was made based on phylogenetic relationships of Old World ated by previous studies and the availability of whole genome data [62,63]. lyses were conducted using MrBayes v.3.2.2 [64] with the following settings: TR, rate variation = invgamma, gamma catagories = 4, chain length = 10 million q = 1000, nruns = 2, nchains = 4, burnin = 2500, and outgroup was set to ECTV. as used to calculate average pairwise distances (uncorrected p), whereas e branch lengths) were calculated from the consensus tree using the program atters Inc., Aukland, New Zealand). Genetic distances were averaged across le value for each inter-or intra-clade estimate. For thoroughness, a RAxML a completely random tree was run for each dataset using 1000 rapid bootstrap a search for the best scoring MLtree under the GTR + I + G model of molecular xML plugin in Geneious [66].
and Species Tree Analyses esian Phylogenetics and Phylogeography (BP&P v.3.1 [39,67]) was utilized to pport simultaneously for species delimitation and species tree in a Bayesian lest monophyletic groupings from the MrBayes topologies with divergence equal to the divergence between Taterapox virus (TATV) and Camelpox virus for putative species level divergence. The average within species genetic BP&P was estimated based on two multiple sequence alignments of large MPXV isolates examined by Nakazawa et al. [68], and 49 VARV isolates umbers: L22579 [69], NC_001611 [54], Y16780 [45], DQ437580-DQ437592, and [47]) using MEGA v.6.06 [65] with pairwise deletion and uncorrected pated values for MPXV (0.169%) and VARV (0.137%) were averaged and this sed as the mean of the gamma distribution for the Ө prior. The τ prior was age genetic distance (uncorrected p) of all in-group taxa to ECTV (3.61% and d 2, respectively). All BP&P analyses were run in duplicate with random seeds . Each dataset was analyzed with their respective τ parameter (τ 2 = 301[α], 0,000[β]) with three Ө priors (ӨA = 1[α], 654[β]; ӨB = 2, 654; ӨC = 2, 327), where lculated average intra-species distance of MPXV and VARV, ӨB is twice that ur times the calculated average to provide sequentially more conservative . Both MrBayes topologies were used as the starting species tree topology for on Bayesian inference analyses, 14 CPXV lineages were examined for ive species, and algorithm 0 was used for species delimitation with initial fine 0.00019, 0.00007, 0.00339, 0.00033, 0.01521, 0.33, 1.0. All analyses were run with ample frequency of 5, and the total number of samples was set to 100,000. , and other pertinent data for all examined strains are available in Table 1. MAFFT v.7.125 used to generate the multiple sequence alignment using the FFT-NS-2 algorithm through select option. Given the concern of recombination at the variable termini [26,58], and the for recombination to affect the outcome of phylogenetic analyses [59], two datasets were for phylogenetic and species delimitation purposes. Dataset 1 (150,718 bp) included the ding portion of the alignment (genes C23L-B29R) with gap columns removed; whereas (95,410 bp) examined a smaller, conserved central region (E2L-A32L) with gap columns . It is worth noting the C23L-B29R alignment was identical to the whole genome alignment s stripped. The most appropriate model of molecular evolution was selected with the Akaike ion criterion using MrModelTest (v.2.2 [60]) and PAUP* (version 4.0b10 [61]) for each Outgroup selection was made based on phylogenetic relationships of Old World viruses generated by previous studies and the availability of whole genome data [62,63]. logenetic analyses were conducted using MrBayes v.3.2.2 [64] with the following settings: ion model = GTR, rate variation = invgamma, gamma catagories = 4, chain length = 10 million ns, samplefreq = 1000, nruns = 2, nchains = 4, burnin = 2500, and outgroup was set to ECTV. v.6.06 [65]) was used to calculate average pairwise distances (uncorrected p), whereas distances (tree branch lengths) were calculated from the consensus tree using the program (v.7.1.7; Biomatters Inc., Aukland, New Zealand). Genetic distances were averaged across roduce a single value for each inter-or intra-clade estimate. For thoroughness, a RAxML tarting with a completely random tree was run for each dataset using 1000 rapid bootstrap followed by a search for the best scoring MLtree under the GTR + I + G model of molecular using the RAxML plugin in Geneious [66]. is four times the calculated average to provide sequentially more conservative ion estimates. Both MrBayes topologies were used as the starting species tree topology for aset. Based on Bayesian inference analyses, 14 CPXV lineages were examined for tion as putative species, and algorithm 0 was used for species delimitation with initial fine meters set to 0.00019, 0.00007, 0.00339, 0.00033, 0.01521, 0.33, 1.0. All analyses were run with of 20,000, a sample frequency of 5, and the total number of samples was set to 100,000. numbers, and other pertinent data for all examined strains are available in Table 1. MAFFT v.7.125 [57] was used to generate the multiple sequence alignment using the FFT-NS-2 algorithm through the auto-select option. Given the concern of recombination at the variable termini [26,58], and the potential for recombination to affect the outcome of phylogenetic analyses [59], two datasets were examined for phylogenetic and species delimitation purposes. Dataset 1 (150,718 bp) included the entire coding portion of the alignment (genes C23L-B29R) with gap columns removed; whereas dataset 2 (95,410 bp) examined a smaller, conserved central region (E2L-A32L) with gap columns removed. It is worth noting the C23L-B29R alignment was identical to the whole genome alignment with gaps stripped. The most appropriate model of molecular evolution was selected with the Akaike information criterion using MrModelTest (v.2.2 [60]) and PAUP* (version 4.0b10 [61]) for each dataset. Outgroup selection was made based on phylogenetic relationships of Old World orthopoxviruses generated by previous studies and the availability of whole genome data [62,63]. Phylogenetic analyses were conducted using MrBayes v.3.2.2 [64] with the following settings: substitution model = GTR, rate variation = invgamma, gamma catagories = 4, chain length = 10 million generations, samplefreq = 1000, nruns = 2, nchains = 4, burnin = 2500, and outgroup was set to ECTV. MEGA (v.6.06 [65]) was used to calculate average pairwise distances (uncorrected p), whereas patristic distances (tree branch lengths) were calculated from the consensus tree using the program Geneious (v.7.1.7; Biomatters Inc., Aukland, New Zealand). Genetic distances were averaged across taxa to produce a single value for each inter-or intra-clade estimate. For thoroughness, a RAxML analysis starting with a completely random tree was run for each dataset using 1000 rapid bootstrap iterations followed by a search for the best scoring MLtree under the GTR + I + G model of molecular evolution using the RAxML plugin in Geneious [66].

Species Delimitation and Species Tree Analyses
The software Bayesian Phylogenetics and Phylogeography (BP&P v.3.1 [39,67]) was utilized to calculate statistical support simultaneously for species delimitation and species tree in a Bayesian framework. The smallest monophyletic groupings from the MrBayes topologies with divergence levels approximately equal to the divergence between Taterapox virus (TATV) and Camelpox virus were each examined for putative species level divergence. The average within species genetic distance prior, Ө, for BP&P was estimated based on two multiple sequence alignments of large genomic regions; 45 MPXV isolates examined by Nakazawa et al. [68], and 49 VARV isolates (GenBank accession numbers: L22579 [69], NC_001611 [54], Y16780 [45], DQ437580-DQ437592, and DQ441416-DQ441448 [47]) using MEGA v.6.06 [65] with pairwise deletion and uncorrected pdistance settings. Estimated values for MPXV (0.169%) and VARV (0.137%) were averaged and this value (0.153%) was used as the mean of the gamma distribution for the Ө prior. The τ prior was calculated as the average genetic distance (uncorrected p) of all in-group taxa to ECTV (3.61% and 3.01% for datasets 1 and 2, respectively). All BP&P analyses were run in duplicate with random seeds to check convergence. Each dataset was analyzed with their respective τ parameter (τ is based on the calculated average intra-species distance of MPXV and VARV, ӨB is twice that average, and ӨC is four times the calculated average to provide sequentially more conservative delimitation estimates. Both MrBayes topologies were used as the starting species tree topology for each dataset. Based on Bayesian inference analyses, 14 CPXV lineages were examined for consideration as putative species, and algorithm 0 was used for species delimitation with initial fine tune parameters set to 0.00019, 0.00007, 0.00339, 0.00033, 0.01521, 0.33, 1.0. All analyses were run with a burn-in of 20,000, a sample frequency of 5, and the total number of samples was set to 100,000. rs, and other pertinent data for all examined strains are available in Table 1. MAFFT v.7.125 as used to generate the multiple sequence alignment using the FFT-NS-2 algorithm through to-select option. Given the concern of recombination at the variable termini [26,58], and the ial for recombination to affect the outcome of phylogenetic analyses [59], two datasets were ed for phylogenetic and species delimitation purposes. Dataset 1 (150,718 bp) included the coding portion of the alignment (genes C23L-B29R) with gap columns removed; whereas t 2 (95,410 bp) examined a smaller, conserved central region (E2L-A32L) with gap columns ed. It is worth noting the C23L-B29R alignment was identical to the whole genome alignment ps stripped. The most appropriate model of molecular evolution was selected with the Akaike ation criterion using MrModelTest (v.2.2 [60]) and PAUP* (version 4.0b10 [61]) for each t. Outgroup selection was made based on phylogenetic relationships of Old World oxviruses generated by previous studies and the availability of whole genome data [62,63]. ylogenetic analyses were conducted using MrBayes v.3.2.2 [64] with the following settings: ution model = GTR, rate variation = invgamma, gamma catagories = 4, chain length = 10 million tions, samplefreq = 1000, nruns = 2, nchains = 4, burnin = 2500, and outgroup was set to ECTV.
(v.6.06 [65]) was used to calculate average pairwise distances (uncorrected p), whereas ic distances (tree branch lengths) were calculated from the consensus tree using the program us (v.7.1.7; Biomatters Inc., Aukland, New Zealand). Genetic distances were averaged across produce a single value for each inter-or intra-clade estimate. For thoroughness, a RAxML is starting with a completely random tree was run for each dataset using 1000 rapid bootstrap ns followed by a search for the best scoring MLtree under the GTR + I + G model of molecular ion using the RAxML plugin in Geneious [66].

ecies Delimitation and Species Tree Analyses
e software Bayesian Phylogenetics and Phylogeography (BP&P v.3.1 [39,67]) was utilized to te statistical support simultaneously for species delimitation and species tree in a Bayesian ork. The smallest monophyletic groupings from the MrBayes topologies with divergence approximately equal to the divergence between Taterapox virus (TATV) and Camelpox virus ach examined for putative species level divergence. The average within species genetic e prior, Ө, for BP&P was estimated based on two multiple sequence alignments of large ic regions; 45 MPXV isolates examined by Nakazawa et al. [68], and 49 VARV isolates nk accession numbers: L22579 [69], NC_001611 [54], Y16780 [45], DQ437580-DQ437592, and 416-DQ441448 [47]) using MEGA v.6.06 [65] with pairwise deletion and uncorrected pe settings. Estimated values for MPXV (0.169%) and VARV (0.137%) were averaged and this (0.153%) was used as the mean of the gamma distribution for the Ө prior. The τ prior was ted as the average genetic distance (uncorrected p) of all in-group taxa to ECTV (3.61% and for datasets 1 and 2, respectively). All BP&P analyses were run in duplicate with random seeds k convergence. Each dataset was analyzed with their respective τ parameter (τ 2  numbers, and other pertinent data for all examined strains are available in Table 1. MAFFT v.7.125 [57] was used to generate the multiple sequence alignment using the FFT-NS-2 algorithm through the auto-select option. Given the concern of recombination at the variable termini [26,58], and the potential for recombination to affect the outcome of phylogenetic analyses [59], two datasets were examined for phylogenetic and species delimitation purposes. Dataset 1 (150,718 bp) included the entire coding portion of the alignment (genes C23L-B29R) with gap columns removed; whereas dataset 2 (95,410 bp) examined a smaller, conserved central region (E2L-A32L) with gap columns removed. It is worth noting the C23L-B29R alignment was identical to the whole genome alignment with gaps stripped. The most appropriate model of molecular evolution was selected with the Akaike information criterion using MrModelTest (v.2.2 [60]) and PAUP* (version 4.0b10 [61]) for each dataset. Outgroup selection was made based on phylogenetic relationships of Old World orthopoxviruses generated by previous studies and the availability of whole genome data [62,63]. Phylogenetic analyses were conducted using MrBayes v.3.2.2 [64] with the following settings: substitution model = GTR, rate variation = invgamma, gamma catagories = 4, chain length = 10 million generations, samplefreq = 1000, nruns = 2, nchains = 4, burnin = 2500, and outgroup was set to ECTV. MEGA (v.6.06 [65]) was used to calculate average pairwise distances (uncorrected p), whereas patristic distances (tree branch lengths) were calculated from the consensus tree using the program Geneious (v.7.1.7; Biomatters Inc., Aukland, New Zealand). Genetic distances were averaged across taxa to produce a single value for each inter-or intra-clade estimate. For thoroughness, a RAxML analysis starting with a completely random tree was run for each dataset using 1000 rapid bootstrap iterations followed by a search for the best scoring MLtree under the GTR + I + G model of molecular evolution using the RAxML plugin in Geneious [66].

Species Delimitation and Species Tree Analyses
The software Bayesian Phylogenetics and Phylogeography (BP&P v.3.1 [39,67]) was utilized to calculate statistical support simultaneously for species delimitation and species tree in a Bayesian framework. The smallest monophyletic groupings from the MrBayes topologies with divergence levels approximately equal to the divergence between Taterapox virus (TATV) and Camelpox virus were each examined for putative species level divergence. The average within species genetic distance prior, Ө, for BP&P was estimated based on two multiple sequence alignments of large genomic regions; 45 MPXV isolates examined by Nakazawa et al. [68], and 49 VARV isolates (GenBank accession numbers: L22579 [69], NC_001611 [54], Y16780 [45], DQ437580-DQ437592, and DQ441416-DQ441448 [47]) using MEGA v.6.06 [65] with pairwise deletion and uncorrected pdistance settings. Estimated values for MPXV (0.169%) and VARV (0.137%) were averaged and this value (0.153%) was used as the mean of the gamma distribution for the Ө prior. The τ prior was calculated as the average genetic distance (uncorrected p) of all in-group taxa to ECTV (3.61% and 3.01% for datasets 1 and 2, respectively). All BP&P analyses were run in duplicate with random seeds to check convergence. Each dataset was analyzed with their respective τ parameter (τ 2  is based on the calculated average intra-species distance of MPXV and VARV, ӨB is twice that average, and ӨC is four times the calculated average to provide sequentially more conservative delimitation estimates. Both MrBayes topologies were used as the starting species tree topology for each dataset. Based on Bayesian inference analyses, 14 CPXV lineages were examined for consideration as putative species, and algorithm 0 was used for species delimitation with initial fine tune parameters set to 0.00019, 0.00007, 0.00339, 0.00033, 0.01521, 0.33, 1.0. All analyses were run with a burn-in of 20,000, a sample frequency of 5, and the total number of samples was set to 100,000. C is four times the calculated average to provide sequentially more conservative delimitation estimates. Both MrBayes topologies were used as the starting species tree topology for each dataset. Based on Bayesian inference analyses, 14 CPXV lineages were examined for consideration as putative species, and algorithm 0 was used for species delimitation with initial fine tune parameters set to 0.00019, 0.00007, 0.00339, 0.00033, 0.01521, 0.33, 1.0. All analyses were run with a burn-in of 20,000, a sample frequency of 5, and the total number of samples was set to 100,000.

Topology, Dataset Heterogeneity, and Recombination Testing
The incongruence length difference (ILD) test is used to determine whether two datasets (often loci) should be combined, or if they should be analyzed separately due to significant differences in signal, as measured by the length of trees generated by each dataset. In this instance, we are comparing the portion of the genome examined in dataset 2 (bp 28,069-123,478) against the two peripheral regions (bp 1-28,068 & 123,479-150,718), which, when combined, comprise dataset 1. The reason for comparing these partitions is to determine if whole genome analyses might be considered inappropriate based on some metrics. Given the variable trees generated by different datasets in this study, these partitions were used and a heuristic search with a random starting seed and 10,000 replicates was run using PAUP* (v.4.0b10) to determine if the null hypothesis of homogeneous partitions could be rejected. A topological comparison test (Kishino Hasegawa) was utilized to compare both topologies generated by MrBayes, to determine which tree showed the best congruence to each dataset. The Pairwise Homoplasy Index, Maximum χ 2 and the Neighbor Similarity Score tests of recombination were performed using permutation tests for both datasets in the program PhiPack [70][71][72][73].

Phylogenetic Analyses
Bayesian inference analysis of dataset 1 recovered a highly-supported tree (Figure 1), with only two small polytomies (in clades D & E10) as a result of two sets of almost identical sequences, each representing a single outbreak consisting of several isolates (Table 1). All Bayesian Posterior Probability (BPP) values were ≥0.95, with the single exception of the CMLV/TATV relationship, which supports the high credibility of the overall topology. The CPXV polytomies and low support at the TATV/CMLV node have been reported in previous studies [27,28]. This topology identifies a minimum of five monophyletic clades (A-E). BP&P analyses require all putative species to be defined a priori; BP&P will not split a defined species, but can combine two or more defined species. Therefore, in subsequent analyses, clade E was further broken down to allow for the maximum resolution among the 10 subclades. Additionally, clades E1-E5 and E6-E10 were discussed as independent monophyletic clades by Dabrowski et al. [28], and are therefore discussed similarly (at times) herein.
Dataset 2 also generated a highly-supported topology ( Figure 2) with five nodes calculated to have a BPP of less than 0.95 (the TATV/CMLV relationship and four nodes within clade E). This topology identifies a minimum of four monophyletic groupings of isolates (A, B/C, D & E). The most noticeable differences between these two trees are the placement of HumLit08/1 (from Lithuania; clade C) and relationships of subclades within clade E (specifically, variability in the reciprocal monophyly of E1-E5 and E6-E10). Inter-clade genetic (uncorrected p) and patristic distances for major and minor clades are available in Supplementary Tables S1 and S2, calculated for datasets 1 and 2, respectively. Each of the minor clades highlighted (E1-E10) is at least as divergent (uncorrected p-distances) from its sister clade as two currently recognized species CMLV and TATV. Intra-lineage genetic distances can be seen in Supplementary Table S3. Topologies generated using RAxML were nearly identical for each dataset, displaying similar movement of clade C and conditional polyphyly of clades E6-E10 ( Supplementary  Figures S1 and S2).

Species Delimitation and Species Tree Analyses
In all instances, regardless of dataset examined or starting tree topology, duplicate runs of the joint species tree estimation/species delimitation analyses converged on the same number of species of Cowpox viruses (n = 14); however, the species tree topologies varied, none having above 0.50 BPP. For species delimitation analyses using a fixed guide tree, all 14 examined CPXV lineages had BPP values of 0.95 or greater for each set of priors, and therefore are statistically, sufficiently divergent to warrant species level recognition.

Species Delimitation and Species Tree Analyses
In all instances, regardless of dataset examined or starting tree topology, duplicate runs of the joint species tree estimation/species delimitation analyses converged on the same number of species of Cowpox viruses (n = 14); however, the species tree topologies varied, none having above 0.50 BPP. For species delimitation analyses using a fixed guide tree, all 14 examined CPXV lineages had BPP values of 0.95 or greater for each set of priors, and therefore are statistically, sufficiently divergent to warrant species level recognition.

Species Delimitation and Species Tree Analyses
In all instances, regardless of dataset examined or starting tree topology, duplicate runs of the joint species tree estimation/species delimitation analyses converged on the same number of species of Cowpox viruses (n = 14); however, the species tree topologies varied, none having above 0.50 BPP. For species delimitation analyses using a fixed guide tree, all 14 examined CPXV lineages had BPP values of 0.95 or greater for each set of priors, and therefore are statistically, sufficiently divergent to warrant species level recognition.

Topology, Dataset Heterogeneity, and Recombination Testing
The Kishino Hasegawa test determined the tree generated for each dataset by MrBayes was significantly better for the examined dataset than the topology generated by the opposite dataset (i.e., when measuring topologies against dataset 1, Figure 1 was a significantly better fit to the data than Figure 2, but when measuring topologies against dataset 2, Figure 2 was significantly better than Figure 1). ILD test results from PAUP*, indicated a significant difference in the heterogeneity between the central and peripheral partitions (p = 1 × 10 −4 ). Tests utilized to detect recombination (Pairwise Homoplasy Index, Maximum χ 2 and the Neighbor Similarity Score) returned p-values of p = 0.000, indicating high levels of homoplasy, potentially explained by recombination.

Phylogenetics and Species Delimitation
The CPXV genomic dataset has been expanded by the addition of nine, newly sequenced isolates from the UK and Norway. These isolates fit within the previously described clades [27,28] and do not represent any previously unidentified lineages. The sole new sequence from Norway (isolated from a cat) grouped sister to the only other Norway sequence publicly available (isolated from a human), but differed by 262 mutations (based on the conservative genomic region examined in dataset 2). The Catpox 3L97 isolate grouped sister to an isolate from Austria, which was isolated in 1999, but differs by 144 single nucleotide polymorphisms (based on dataset 2). This large geographic expanse between the only isolates of this monophyletic clade A is puzzling, and could represent the only sampled isolates from a large range, or could indicate importation of this isolate from one of the two countries to the other. The remaining isolates from the UK grouped with high support with German samples (clade E6) or other UK isolates, including the historic UK Brighton Red strain and one published by Carroll et al. [27] (clade E5). Previous to this study, the only available isolates from the UK had grouped together. This study drastically increases our understanding of the genetic variation within UK CPXV strains. The large genetic variation found in UK isolates (grouping within clades A, E5, and E6) is similar to that found in German isolates (grouping within clade D and all E subclades except E4 & E5). At this point it is unclear if these countries represent hotspots with large numbers of CPXV lineages or if this is a sampling bias issue, as more than 80% of isolates available are from either Germany (61%) or the UK (22%). Clearly, it is worth expanding geographic sampling to other areas of Eurasia to better understand the genetic diversity. At present, large areas in western, southern, and eastern Europe are not represented. It is possible that Cowpox viruses do not occur in these areas, but it is also possible that this paucity of geographical data is due to a sampling bias. Recent research has identified new orthopoxviruses in areas with no previous evidence of poxvirus circulation [62,63]). Similar topological differences between datasets and the movement of the Lithuania isolate throughout the tree was also noted by Dabrowski et al. [28]. In both studies the smaller dataset placed the Lithuania strain with the 'Vaccinia-like' group, whereas the larger dataset placed this group sister to either clade E or the clade containing CMLV, TATV and VARV as well as CPXV clades D and E. The same study reported conditional monophyly of clades E6-E10, depending upon the dataset examined (similar to results exhibited in this study). In their larger dataset, clades E1-E5 and E6-E10 are monophyletic sister clades; however, in multiple smaller datasets Dabrowski et al. [28] recovered a paraphyletic E6-E10 with respect to clades E1-E5. The central portion of the genome is conserved and comprised of genes understood to perform essential tasks (i.e., viral replication, virion assembly and release), whereas the peripheries are more variable [74], and are thought to characterize host specificity and pathogenicity [30,75]. The issue of various genomic regions of a single replicon generating unique topologies is one worth considering. The ILD test results indicate the central region of the genome and the termini contain different evolutionary signals, hence the distinct topologies generated through analyses of both datasets. Many (if not all) phylogenetic (or population genetics) programs assume no recombination occurs within a locus, but free recombination between different loci [76][77][78][79]. Potential recombination within the variable termini of poxvirus genomes could violate this assumption and cause issues with topology generation depending upon the dataset examined and software utilized. Despite results of the analyses utilized to detect potential recombination, (p = 0.000 for Pairwise Homoplasy Index, Maximum χ 2 and the Neighbor Similarity Score), these datasets were analyzed by phylogenetic inference packages for a number of reasons. First, these tests examine datasets for homoplasies, which may indicate recombination, but may also be caused by other events (convergent evolution, retention of ancestral polymorphisms, or stabilizing selection). Second, although recombination does occur in Vaccinia virus within laboratory settings [80], and has been suggested to occur in cowpoxviruses [81], as for it to occur in nature multiple species of viruses would need to infect the same cell of the same animal at the same time. Given the short incubation time of poxviruses, the location of viral DNA replication (cytoplasm), and the variability in both host range and geographic distributions between species, large and frequent recombination events are expected to be rare and limited in effect when analyzing the entire virus genome.
To understand the true evolutionary history of this genus, a subset of nucleotide data containing phylogenetically informative characters, capable of generating accurate topologies needs to be identified. This issue has a direct effect on the ability to delineate species, specifically when it comes to the requirement of monophyly. It may be beneficial for members of the poxvirus community to thoroughly examine and recommend a specific subset of sequence data to be used when determining if a newly sequenced isolate is representative of a new or existing species. The caveat would be that these specified data should be capable of generating consistent, and 'accurate' phylogenies and criteria or methodology for species delimitations should be standardized.
Phylogenetic analyses of available orthopoxvirus genomes revealed that CPXV, as currently recognized, is a polyphyletic assemblage of 5-14 highly-supported lineages based on monophyly and genetic distance criteria within a statistical framework (Figure 1). Of these evolutionarily independent lineages, five (A-E) is the smallest number of species necessary to concur with the ICTV requirement of monophyly, although this arrangement would leave a greater amount of genetic variation within clade E than in clades A-D (Supplementary Table S3). Statistical support for 14 species of CPXV was found through BPP analysis, and the examination of patristic and genetic (uncorrected p) distances indicated that each of these 14 (sub)clades are either: (1) at least as divergent as CMLV and TATV from their sister clade; or (2) group sister to a non-CPXV species. However, understanding the 'true' evolutionary relationships between subclades E1-E10 is necessary to determine the number of species actually represented within this grouping. Increased taxonomic sampling has the potential to solidify the relationships between these congeners.

Geographic Sampling
Geographic coverage of analyzed CPXV samples is depicted in Figure 3. Examining CPXV as currently recognized in a geographic context is puzzling, as no geographic trends are evident. The idea behind examining the phylogeographic pattern of a species or group of species assumes these organisms originate from a single most recent common ancestor (MRCA); however, numerous studies [26][27][28][29] have come to the conclusion that CXPV isolates do not all share a single MRCA. Therefore, examining each of these monophyletic clades independently would offer a more accurate picture regarding the evolutionary history and phylogeography of each clade or species. As geographic sampling has increased over the last two decades, the number of recognized monophyletic assemblages of CPXV also has increased from two [26] to three [27] to as many as 14 today. With continued sampling from human and animal cases [62,63], understanding of the genetic diversity and evolutionary relationships between orthopoxviruses is expected to improve.

Conclusions and Future Work
In closing, the underlying assumption of a single, most recent common ancestor shared exclusively by all representatives of CPXV has been disproven again. This time, statistical delimitation analyses have supported species level recognition of several monophyletic clades. The minimum number of monophyletic clades required to meet the ICTV requirement of monophyly within a species, when considering both potential topologies generated (as well as the robust phylogenies generated by Dobrowski et al. [28]) is five (clades A-E), although statistical support exists for up to 14 lineages. Monophyly is one part of the new species definition, the second states the properties of a species of virus "can be distinguished from those of other species by multiple criteria" [4]. This portion of the definition was likely intended to ensure valid species were not split due to inconsequential differences. The major issue with this requirement is 'multiple criteria' could (in theory) be any set of non-evolved traits (i.e., geographic locality, host organism). Peterson [5] provided a thorough review of this issue, and stressed the most important criteria is the evolutionary history and whether these lineages are on independent evolutionary trajectories. In keeping with the current definition of a virus, additional phenotypic, evolved traits (phenotypes derived from the genotype), such as A-type inclusions, pathogenicity, organ tropism, plaque size, incubation period, heat resistance, etc., should be examined to determine what variation may exist between and within lineages A-E.
Given the variability in topologies and the requirement for virus species to be differentiated by a number of criteria, a conservative approach in recognizing five species is proposed (clades A-E). As CPXV has been a recognized species for many decades, and referenced in the seminal works of vaccination [8,9], differentiation of the species is not trivial. To maintain continuity within the literature, it is proposed these evolutionarily independent clades be referred to as follows: Cowpox virus alpha, Cowpox virus beta, Cowpox virus gamma, Cowpox virus delta, and Cowpox virus epsilon. In this approach, the authors hope to minimize unnecessary changes in the literature, but still recognize the diversity within CPXV, as currently recognized. Future work may divide one or more of these species, but reducing multiple species into one will likely not be necessary.
As geographic sampling has increased over the past few decades, our understanding of the genetic diversity of Cowpox viruses has increased dramatically. When considering names for new taxa (isolates, species or genera), effort should be taken to propose informative names which reduce confusion (within the research community as well as the public in general) and follow nomenclatorial requirements. A systematic approach considering a variety of data should be considered when

Conclusions and Future Work
In closing, the underlying assumption of a single, most recent common ancestor shared exclusively by all representatives of CPXV has been disproven again. This time, statistical delimitation analyses have supported species level recognition of several monophyletic clades. The minimum number of monophyletic clades required to meet the ICTV requirement of monophyly within a species, when considering both potential topologies generated (as well as the robust phylogenies generated by Dobrowski et al. [28]) is five (clades A-E), although statistical support exists for up to 14 lineages. Monophyly is one part of the new species definition, the second states the properties of a species of virus "can be distinguished from those of other species by multiple criteria" [4]. This portion of the definition was likely intended to ensure valid species were not split due to inconsequential differences. The major issue with this requirement is 'multiple criteria' could (in theory) be any set of non-evolved traits (i.e., geographic locality, host organism). Peterson [5] provided a thorough review of this issue, and stressed the most important criteria is the evolutionary history and whether these lineages are on independent evolutionary trajectories. In keeping with the current definition of a virus, additional phenotypic, evolved traits (phenotypes derived from the genotype), such as A-type inclusions, pathogenicity, organ tropism, plaque size, incubation period, heat resistance, etc., should be examined to determine what variation may exist between and within lineages A-E.
Given the variability in topologies and the requirement for virus species to be differentiated by a number of criteria, a conservative approach in recognizing five species is proposed (clades A-E). As CPXV has been a recognized species for many decades, and referenced in the seminal works of vaccination [8,9], differentiation of the species is not trivial. To maintain continuity within the literature, it is proposed these evolutionarily independent clades be referred to as follows: Cowpox virus alpha, Cowpox virus beta, Cowpox virus gamma, Cowpox virus delta, and Cowpox virus epsilon. In this approach, the authors hope to minimize unnecessary changes in the literature, but still recognize the diversity within CPXV, as currently recognized. Future work may divide one or more of these species, but reducing multiple species into one will likely not be necessary.
As geographic sampling has increased over the past few decades, our understanding of the genetic diversity of Cowpox viruses has increased dramatically. When considering names for new taxa (isolates, species or genera), effort should be taken to propose informative names which reduce confusion (within the research community as well as the public in general) and follow nomenclatorial requirements. A systematic approach considering a variety of data should be considered when proposing names. For each species, the first published genome could be considered the type specimen and names could represent a geographic region where the virus was first discovered, or genomic or phenotypic characteristics (e.g., indels, gene synteny, incubation period, disease progression) common to members of a lineage. Both of these are accepted methods in naming other organisms (mammals); however, given the scale of global travel and variable incubation times of pathogens, naming a lineage after a geographic region could result in a misnomer if few isolates are known.
Orthopoxviruses have been named for the animal from which initial isolates were recovered (e.g., Monkeypox virus, Cowpox virus), but further research determined these names to be misrepresentations, as the suspected primary reservoirs are now thought to be rodents. The apparent overlapping geographic and host ranges of many Cowpox viruses suggests some could share reservoir host species as well. The International Committee for Zoological Nomenclature has a large number of rules and regulations, which reduces confusion of recognized species names and keeps track of precedence or priority among type specimens of various lineages. To implement this number of rules in viral taxonomy would be a massive undertaking, but considering the rationale behind these rules and using this to further develop regulations for naming viruses would help with consistency of new virus names.
Supplementary Materials: The following are available online at www.mdpi.com/1999-4915/9/5/101/s1. Figure S1: Dataset 1 RAxML Results; Figure S2: Dataset 2 RAxML Results; Table S1: Average uncorrected p-distance bottom left, patristic distance top right for dataset 1 for all examined taxa; Table S2: Average uncorrected p-distance bottom left, patristic distance top right for dataset 2 for all examined taxa; Table S3: Average intra-lineage genetic distance within each group, calculated using uncorrected-p distance in MEGA6.