Recombinant Goose Circoviruses Circulating in Domesticated and Wild Geese in Poland

Circoviruses are circular single-stranded DNA (ssDNA) viruses that infect a variety of animals, both domestic and wild. Circovirus infection in birds is associated with immunosuppression and this in turn predisposes the infected animals to secondary infections that can lead to mortality. Farmed geese (Anser anser) in many parts of the world are infected with circoviruses. The majority of the current genomic information for goose circoviruses (GoCVs) (n = 40) are from birds sampled in China and Taiwan, and only two genome sequences are available from Europe (Germany and Poland). In this study, we sampled 23 wild and 19 domestic geese from the Gopło Lake area in Poland. We determined the genomes of GoCV from 21 geese; 14 domestic Greylag geese (Anser anser), three wild Greylag geese (A. anser), three bean geese (A. fabalis), and one white fronted goose (A. albifrons). These genomes share 83–95% nucleotide pairwise identities with previously identified GoCV genomes, most are recombinants with exchanged fragment sizes up to 50% of the genome. Higher diversity levels can be seen within the genomes from domestic geese compared with those from wild geese. In the GoCV capsid protein (cp) and replication associated protein (rep) gene sequences we found that episodic positive selection appears to largely mirror those of beak and feather disease virus and pigeon circovirus. Analysis of the secondary structure of the ssDNA genome revealed a conserved stem-loop structure with the G-C rich stem having a high degree of negative selection on these nucleotides.


Introduction
Circoviruses (genus: Circovirus; family: Circoviridae) are non-enveloped, icosahedral viruses, with circular single-stranded DNA (ssDNA) genomes which are approximately 2 kb in length. Transcription is bidirectional with a replication-associated protein (Rep) encoded on the virion sense strand and a capsid protein (CP) on the complementary sense strand [1,2]. Circoviruses are known to infect various bird, mammal, and fish species [1]. The main consequence of circovirus

Sample Collection and Processing
Spleen and liver sections were collected during necropsy from 23 wild geese and one mallard duck that had been shot by sport hunters in the area around Gopło Lake in Kujawsko-Pomorskie District in Poland. Similar tissue types were also sampled from slaughtered domestic geese (n = 19) from farms located in the areas around Gopło Lake. Sample descriptions are provided in Table 1. Forty four tissue samples (each approximately 50 mg) were individually homogenized in 500 µL PBS using Tissuelyser II (Qiagen, Hilden, Germany). The supernatant from the homogenate (~200 µL) was used for DNA isolation using a Janus automated workstation (Perkin Elmer, Waltham, MA, USA) and NucleoMag Tissue Kit (Macherey-Nagel, Düren, Germany).

Screening and GoCV Genome Recovery
Each sample was prescreened with the broad-spectrum nested PCR method targeting the rep genes of various avian circoviruses in accordance with Halami et al. [20]. For each positive sample, circular DNA was amplified using 1 µL of total DNA and TempliPhi (GE Healthcare, Marlborough, MA, USA) as previously described [16,21]. Full GoCV genomes were recovered using the enriched DNA as a template with back-to-back primer pair GoCV-F 5 -CTSTCTCGWGCYCGGGGATCTGAC-3 and GoCV-R 5 -CCAGGCTCTTCCTCCCAGCKWCTCTT-3 using Kapa HiFI Hotstart DNA polymerase (Kapa Biosystems, Wilmington, DE, USA) with the following thermal cycling protocol: 96 • C for 3 min, 25 cycles (98 • C (20 s), 60 • C (30 s), 72 • C (2 min)), 72 • C for 3 min. Amplicons were resolved on 0.7% agarose gels and~2 kb fragments were excised. Gel excised fragments were purified using a MEGA-spin Agarose Gel DNA Extraction Kit (iNtRON Biotechnology, Daejeon, Korea). Cleaned products were ligated into the plasmid pJET 2.1 (Thermo Fisher Scientific, Waltham, MA, USA) and transformed into DH-5α Escherichia coli competent cells. Recombinant plasmid DNA was isolated from single E. coli colonies using a DNA-spin Plasmid DNA Extraction Kit (iNtRON Biotechnology, Daejeon, Korea) and these were subsequently Sanger sequenced (Macrogen Inc., Seoul, Korea). Full genome coverage was obtained using primer walking. Sequence contigs were assembled into full genomes using DNA Baser V4 (Heracle BioSoft S.R.L., Pitesti, Romania).

Bioinformatic Analysis
The genetic diversity among the GoCV isolates was analyzed using SDT v1.2 [22]. Given that recombination has been detected in a range of other circoviruses, prior to phylogenetic analysis the 63 GoCV full genome sequences were aligned using MUSCLE [23] and analyzed for recombination using the seven detection methods implemented in RDP version 4.70 [24]. A maximum likelihood (ML) phylogenetic tree accounting for recombination was then constructed with PHYML 3.0 [25] using the full genome sequence alignment from which all recombinationally inherited genome segments had been removed. This tree was inferred with the best-fit substitution model GTR + G determined using jModelTest [26], with 1000 bootstrap replicates used to infer branch support and rooted with two swan circovirus sequences (EU056309, EU056310). Branches with less than 60% support were collapsed using TreeGraph2 [27].
The GoCV, PiCV, and BFDV gene sequences were all initially codon aligned within a singled data set so as to ensure that homologous codon sites could be accurately compared to one another following selection analyses. Gene sequences from each of the three species were then analyzed separately to identify codon sites evolving under either positive or negative selection using FUBAR [28] and under episodic positive selection using MEME [29]. We subsequently used the difference between the non-synonymous (dN) and synonymous (dS) rates (dN-dS) obtained from FUBAR and the locations of sites evolving under episodic selection from MEME to generate a selection map comparing the types and degrees of selection between the three circovirus datasets (GoCV; pigeon circovirus, PiCV; beak and feather disease viruses, BFDV) using the computer program SelectionMap (http://www. cbio.uct.ac.za/~brejnev/ComputationalTools.html).
Given that evidence of pervasive biologically relevant secondary structural elements has been found in other circovirus genomes, we used the computer program NASP [30] as previously described by Muhire, et al. [31] to identify and rank, in order of conservation, the secondary structural elements that are most likely present within the 63 GoCV genomes. Briefly, this was achieved using the minimum free-energy (MFE) approach implemented in the hybrid-ssmin component of UNAFold (with sequences treated as circular and folding carried out at 37 • C under 0.1 M magnesium and 1 M sodium ionic conditions) [32] to infer ensembles of secondary structural elements within ten GoCV genomes representing the known breadth of GoCV diversity (DQ192281, DQ192285, KP203866, KP229371, KT808650, KT808653, KT808656, KT808657, KT808663, KT808668) and ranking of structures based on the relative degrees to which inferred base-pairing interactions were conserved within them. From these ranked lists of plausible conserved structural elements, subsets of high-confidence structural elements-referred to as a high-confidence structure set (HCSS)-were identified using a nucleotide-shuffling permutation test (with 100 permutations and a p-value cutoff 0.05). In subsequent analyses, the only nucleotides considered as being paired within secondary structures were those identified by NASP as being base-paired within the HCSSs, while all other nucleotides were treated as unpaired sites. Structures were visualized with overlaid evolutionary data using the computer program DOOSS [33] and compared to similar structures in other circovirus genomes (namely beak and feather disease viruses (BFDV) and pigeon circovirus (PiCV), which were analyzed previously using identical methods [21].

GoCV in Domestic and Wild Geese in Poland
We recovered the genomes of GoCV from 14 domestic Greylag geese (Anser anser) and three wild Greylag geese (A. anser), three bean geese (Anser fabalis), and one white fronted goose (Anser albifrons) ( Table 2; GenBank accession #s KT808650-KT808670). The 21 genomes share between 83% and 95% pairwise identities with other GoCV genomes available in GenBank ( Figure S1). One of the GoCV genome sequences (GenBank accession # KT808657; Figure S1) from a domestic Greylag goose was most closely related to a divergent GoCV sequence recovered from a Polish wild Greylag goose (89%) [14] and, collectively, these two genomes share <84% genome-wide pairwise identity with all other known GoCV genomes.  In order to rationally categorize the presently known GoCV sequences, we determined the distribution of genome-wide pairwise identities ( Figure 1). Owing to the trough in this distribution at 98% identity (indicating that there are very few sequences that share 98% genome-wide pairwise identity), we opted to use 98% as a threshold for assigning the genomes to different genotype groupings. According to this threshold, the 63 known GoCV genomes can be assigned to 17 genotypes (which we have simply named I through XVII; Figure 2; Table 2). Whereas the Polish sequences fell within nine of these genotype groupings (IV, V, VI, VII, XII, XIII, XIV, XVI, and XVII; Figure 2), the 15 GoCV genomes from China belong to four genotypes (III, VIII, IX, and X), the 25 from Taiwan to three genotypes (I, II, and XI), and the isolate from Germany to genotype XII ( Figure 2). In order to rationally categorize the presently known GoCV sequences, we determined the distribution of genome-wide pairwise identities ( Figure 1). Owing to the trough in this distribution at 98% identity (indicating that there are very few sequences that share 98% genome-wide pairwise identity), we opted to use 98% as a threshold for assigning the genomes to different genotype groupings. According to this threshold, the 63 known GoCV genomes can be assigned to 17 genotypes (which we have simply named I through XVII; Figure 2; Table 2). Whereas the Polish sequences fell within nine of these genotype groupings (IV, V, VI, VII, XII, XIII, XIV, XVI, and XVII; Figure 2), the 15 GoCV genomes from China belong to four genotypes (III, VIII, IX, and X), the 25 from Taiwan to three genotypes (I, II, and XI), and the isolate from Germany to genotype XII ( Figure  2).    It is noteworthy that the GoCV genotype assignments for the Polish isolates display a degree of host-structure ( Figure 2; Table 2). Specifically, the isolates from wild geese belong mainly to the genotypes V, VI, and VII with a divergent Greylag goose isolate [14] falling within genotype XVII. Interestingly, the isolates obtained from the migratory bean and white fronted geese which do not nest in Poland, fall exclusively within genotype V. Conversely the genotype VI, VII, and XVII isolates were from wild Greylag geese which nest in the area of Gopło Lake. The genome-wide pairwise identities that are shared by GoCV isolates from wild migratory geese and those from local population geese arẽ 97%, whereas the isolates obtained from wild geese belonging to genotype V all share close to 100% pairwise identity with one another.
The GoCV isolates from wild geese were closely related to genotype IV isolates from domestic Greylag geese (94-95% identity). GoCV isolates assigned to genotypes XIII through XV are from domestic Greylag geese used for reproduction which share between 84% and 90% genome-wide pairwise identity with isolates from wild and domestic geese that have been slaughtered. The GoCV genotypes circulating in wild geese in the area of Gopło Lake therefore differ from those found in domestic geese kept in this area.
Seven unique recombination events were identified within 28 of the 63 GoCV sequences with a transferred fragment size spanning between 27% and 50% of the genome (Figure 2). Six of the nine polish genotypes were identified to be recombinant (IV, V, VI, VII, XIII, and XIV; Figure 1). As was suggested by the pairwise sequence analyses, the maximum likelihood phylogenetic tree that was constructed following the removal of recombinationally acquired genome segments displayed strong evidence of geographical structure.
We detected and compared natural selection signals within codon alignments of the GoCV rep and cp genes using two different codon-by-codon selection detection approaches (Figure 3; Supplementary data 1). In the BFDV cp there are three instances of positive selection, as opposed to none in PiCV and one in GoCV. The instances of episodic positive selection appears to largely mirror each other.
The negative selection appears to be higher in the rep genes compared to cp (based on individual dN/dS values).

DNA Secondary Structure Analysis of GoCV Genomes
The stem-loop structures in GoCV and BFDV visualized in Figure 4A were both highly conserved among the plausible structural elements detected within the genomes of these two species (2nd out of 137 in GoCV and 9th out of 143 in BFDV). Despite sharing no obvious sequence similarity, the similar GC-rich, stable stem-loop conformation and genomic location of the structural elements in rep could indicate shared biological function across viral species. The excessively low synonymous substitution rates observable in the stem region of both structures (indicated by the blue coloring of the nucleotides) indicates a high degree of negative selection acting at the nucleotide-level for these particular sites.
Tests designed to determine whether the evolution of GoCV sequences was consistent with the selective preservation of biologically functional structural elements within their genomes were applied exactly as described by Muhire et al. [31]. These tests compared paired sites within the HCSS structures to unpaired sites with respect to degrees of evolutionary neutrality, synonymous substitution rate, and rates of complementary coevolution. Identical evolutionary analyses to those performed here on GoCV have been previously carried out on BFDV and PiCV [21], enabling the direct comparison of these three circovirus species.
At the whole-genome-scale, neutrality tests (which compare frequencies of minor/alternative allele frequencies at polymorphic paired sites with those occurring at unpaired sites) were used to test for elevated degrees of purifying selection at paired sites relative to unpaired sites. For all 3 datasets examined (GoCV, BFDV, and PiCV), minor allele frequencies were significantly lower at paired sites than at unpaired sites as indicated by lower D and F test static values (p < 0.01 in all cases), with the exception of GoCV (p = 0.66, Tajima's D test) ( Figure 4C). This finding is consistent with purifying selection being stronger at paired-nucleotide sites within the HCSS than at the remainder of unpaired genomic sites. This strong signal of purifying selection indicates that a substantial proportion of paired sites within the HCSS of the analyzed genomes are evolving in a manner that is consistent within many of the parent structures being evolutionarily preserved.  presented are the consensus of all available genomes and were amongst the most highly conserved of all plausible structural elements detected within the genomes of these two species (2nd out of 137 in GoCV and 9th out of 143 in BFDV). The rank ratio indicates the actual conservation rank of the structure over the total number of predicted secondary structures. Nucleotide sequence variability is reflected by a sequence logo at each position, while overlaid synonymous substitution rate estimates are represented by the shading of each nucleotide (ranging from blue for low to green for high). Although these two structures have no obvious sequence similarity, something expected given that presented are the consensus of all available genomes and were amongst the most highly conserved of all plausible structural elements detected within the genomes of these two species (2nd out of 137 in GoCV and 9th out of 143 in BFDV). The rank ratio indicates the actual conservation rank of the structure over the total number of predicted secondary structures. Nucleotide sequence variability is reflected by a sequence logo at each position, while overlaid synonymous substitution rate estimates are represented by the shading of each nucleotide (ranging from blue for low to green for high). Although these two structures have no obvious sequence similarity, something expected given that GoCV and BFDV groups are very divergent (sharing 58.5% sequence identity), they form within same genomic region in rep and have similar conformation consisting of a stable stem-loop structure with a GC-rich stem-region and with evidence of low synonymous substitution rates in codons occurring within the stem-region consistent with strong selection acting against synonymous substitutions at these sites. (B) Association between paired sites and complementarily coevolving sites. (C) Tajima's D and Fu and Li F statistics for paired and unpaired genomic site alignments. (D) Comparison of synonymous substitution rates at paired-and unpaired-codon sites.
A very strong association between nucleotide sites that are complementary coevolving and nucleotide sites that are base-paired was detected in the BFDV dataset (p = 1.59 × 10 −13 ) ( Figure 4B). This provides compelling evidence that at least a subset of the structures in the BFDV genome likely provide a significant fitness advantage to the viruses, which suggests that these structures may have crucial, as yet undetermined biological functions. The lack of significant evidence of coevolution between paired-nucleotides within the GoCV HCSS may be due to the aforementioned high inter-cluster divergence of the GoCV species, or a lack of detectable biologically important structures.

Conclusions
The GoCV genome diversity in domesticated geese was found to be higher than that in wild geese. The domestic geese sampled in this study were slaughter birds which were from parental flocks (reproductive birds). Furthermore, a large number of these genomes are recombinant and this is likely due to the high density of birds in slaughter flocks, and in the herds of wild geese during migration, which facilitate recombination among GoCVs, similar to what has been noted for BFDVs in parrot breeding facilities [16]. The farming of geese in areas that are also inhabited by wild geese may enable the transfer of pathogens between domestic and wild geese populations. Although the GoCV variants from wild and domestic geese are in different genotype groupings, the detection of recombination between viruses assigned to the "wild" and "domestic" genotypes and suggests that may be at least a low degree of GoCV circulation between domestic and wild geese population-possibly even in the area of Gopło Lake. If there are indeed persistent low-levels of viral transmission between wild and domestic birds, this may expose both the domestic birds to pathogens from throughout the seasonal geographical ranges of the wild-birds (which include taiga areas of Scandinavia and that of Central to North Siberia, and the Siberian tundra), and the wild birds throughout those same geographical ranges to pathogens originating within the Polish domestic goose farming sector.
Taking into consideration the high diversity of GoCVs, which is typical for all avian circoviruses, and the fact that these viruses are highly recombinant and evolving at a significantly rapid rate [16,34], full genome analysis allows for the determination viral dynamics amongst local and migratory population of birds.