Genetic Characterization of a Novel Equus caballus Papillomavirus Isolated from a Thoroughbred Mare

Papillomaviruses (PVs) are small, non-enveloped viruses, ubiquitous across the animal kingdom. PVs induce diverse forms of infection, such as cutaneous papillomas, genital papillomatosis, and carcinomas. During a survey on the fertility status of a mare, a novel Equus caballus PV (EcPV) has been identified using Next Generation Sequencing, and it was further confirmed with genome-walking PCR and Sanger sequencing. The complete circular genome 7607 bp long shares 67% average percentage of identity with EcPV9, EcPV2, EcPV1, and EcPV6, justifying a new classification as Equus caballus PV 10 (EcPV10). All EcPV genes are conserved in EcPV10, and phylogenetic analysis indicates that EcPV10 is closely related to EcPV9 and EcPV2, genus Dyoiota 1. A preliminary EcPV10 genoprevalence study, carried out on 216 horses using Real Time PCRs, suggested a low incidence of this isolate (3.7%) compared to EcPVs of the same genus such as EcPV2 and EcPV9 in the same horse population. We hypothesize a transmission mechanism different from the one observed in the closely related EcPV9 and EcPV2 that particularly infect Thoroughbreds. This horse breed is usually submitted to natural mating, thus indicating a possible sexual diffusion. No differences were detected for breeds in terms of susceptibility to EcPV10. Further studies are needed to investigate the molecular mechanisms behind the host and EcPV10 infection to explain the reduced viral spread.


Introduction
Papillomaviruses (PVs) are small, non-enveloped DNA viruses, able to infect a wide variety of vertebrates. More than 240 types of PVs have been described in mammals [1]. In humans, PV infections are mostly asymptomatic and are mainly resolved by the host's immune response. This tolerated partnership allows the virus to replicate and persist in the host, without the pressure of evolutionary events [2]. On the contrary, unsolved chronic high-risk PV infections can lead to uncontrolled cell proliferation and neoplastic transformations [2]. Upon infection, the viral genome can be either integrated into the host DNA or maintained as multiple episomes replicate concomitantly with the host cells. Usually, the integration is correlated with higher disease severity [3,4].
To date, 14 species of PVs have been reported to infect equines such as horses and donkeys: three bovine PVs (Bos taurus papillomavirus 1 (BPV1), 2 (BPV2), and 13 (BPV13)), two Equus asinus PVs 1-2 (EaPV1 and EaPV2), and nine Equus caballus PVs 1-9 (EcPV1-9) [5]. As it occurs in humans, PV infections in horses can lead to the development or progression of different kinds of lesions, such as aural plaques, genital masses, and verrucous lesions [5]. In particular, EcPV1 and EcPV8 have been associated with cutaneous papillomas and papillomatosis [6,7]; EcPV2 and EcPV3 with penile and preputial squamous cell carcinoma (SCC) [8]; EcPV1, 3, 4, 5, 6 and 7 with aural and genital plaques, with a high percentage of co-infection as well [9,10]. Furthermore, EcPV2 has been reported in penile, vulvar, clitoral, and oropharyngeal papillomas, as well as in both in situ carcinoma (CIS) and invasive SCCs [11][12][13]. PVs' genome sizes range from 6953 bp (Chelonia mydas papillomavirus type 1, CmPV1) to 8607 bp (Canine papillomavirus type 1, CPV1) [1]. Regardless of host species and PV type, all the PV genomes encode for at least five genes, divided into early (E) and late (L) genes according to their expression during the infection cycle, and an upstream regulatory region (URR) contains enhancers and promoters [3]. E1 is a helicase involved in replication, while E2 is a transcription factor that enhances E1 and E2 expression, both expressed during the early stage of the infection [14]. E4 is contained within E2 Open Reading Frame (ORF), and it is actively spliced to finalize virus amplification and synthesis [15]. L1 and L2, instead, encode for the icosahedral capsid protein and for a structural protein that helps the packing into viral particles, both expressed during the late infection. Three additional genes (E5, E6, E7), involved in cell growth and immune response, are considered accessory genes acquired during the PV evolution, and their presence vary among the different phylogenetic groups [1]. Among the PV genes, L1 is the most conserved one and, for this reason, used for PV detection and identification [3,16]. Hence, according to their relative clinical history and conditions, horses admitted to the veterinary hospital are routinely tested for the presence of PVs using specific detection and quantification methods targeting the L1 gene.
In this study, we report the identification of a new EcPV type named EcPV10, isolated from a vaginal cytobrush of an infertile mare, and provide its completely assembled genome. This sequence has been compared to closely related PVs, and the genoprevalence of this virus has been tested in 216 samples obtained from horse genital tracts.

Anamnesis and Sampling
A Thoroughbred of seven years old from Chillivani (Sardinia, Italy) was admitted in February 2021 to the Veterinary Teaching Hospital (OVU) of the University of Turin (Italy) for subfertility. During the previous breeding season, she was subjected to natural mating four times with a proven fertility stallion but failed to become pregnant. However, clinical signs of endometritis were not observed during any cycle (excessive intrauterine fluid accumulation before and/or after breeding, vaginal discharge, short inter-oestrous intervals). Therefore, after the admission, the mare was submitted to a complete reproductive tract examination, including transrectal palpation, ultrasound examination (MyLabOne, Esaote, Genova, Italy), and vaginal speculum examination. Furthermore, all the procedures routinely used to diagnose uterine illnesses in the mare were performed [17], including cytological examination [18], endometrial culture and histological examination [19], and PCR for Chlamydia spp. detection [20]. None of these procedures were able to identify a clear and convincing cause of infertility.
In light of these results, screening for EcPV2 infection was performed, following the protocol described in Cappelli et al. (2022) [13]. Cytobrush (Deltalab SLU, Barcelona, Spain) sampling was carried out through mild rubbing on the vulvar mucosa. The brush was moisturized in 800 µL of DNA/RNA Shield Stabilization Solution (Zymo Research, Irvine, CA, USA) and stored at −20 • C until extractions.
To check the presence of any possible EcPV, assembled contigs were compared using BLASTn algorithm against an in-house built database that included the EcPV genomes available on NCBI database at the time of this analysis (Table S1). Contigs that were related, even if distantly, to EcPV were aligned to each other for genome reconstruction. The sequence was further validated with filtered-raw-reads alignment using BWA [23]. The alignment was visualized through the Integrative Genomics Viewer [26], and breadth and depth coverage were evaluated with Qualimap [27].

Genome Annotation and Phylogenetic Analysis
Viral ORFs were identified through the NCBI ORF Finder, Gatu [28], and PuMA [29]. A schematic genome representation was carried out through Cgview [30]. To perform phylogenetic analysis, the sequence of the E1, E2, L1, and L2 genes was retrieved from 55 bovine and equine papillomavirus species listed in Table S1 and aligned with Virulign [31] using default parameters. A Maximum Likelihood (ML) tree was built with RaxML-HPC using the GTRCATI algorithm as nucleotide substitution model and 1000 bootstrap [32]

PCR Amplification and Sanger Sequencing
A genome walking approach was developed by designing overlapping PCR primers, listed in Table 1. Touchdown PCR was performed as follows: 2.5 µL of DNA (10 ng) were incubated with 0.4 µM of both forward and reverse primer, 1× of BiolineTM (Meridien Bioscience) reaction mix containing ultra-stableTaq DNA polymerase in a final volume of 25 µL. PCRs were performed with MJ Research PTC-200 Thermal Cycler using the following program: 5 min denaturation step at 95 • C, followed by 15 cycles of denaturation at 95 • C for 30 s, annealing starting from 62 • C and decreasing 0.5 • C per cycle for 30 s, extension at 72 • C for 30 s, followed by further 15 cycles of denaturation at 95 • C for 30 s, annealing at 58 • C for 30 s, and extension at 72 • C for 30 s, with a final elongation step carried out at 72 • C for 10 min. PCR products were visualized and evaluated on a 2.2% FlashGel (Lonza) and purified with ExoSAP-IT (ThermoFisher Scientific, Waltham, MA, USA), according to the manufacturer's instructions. PCR amplicons were Sanger sequenced at Eurofins genomics (Eurofins Genomics GmbH, Konstanz, Germany) and aligned to the assembled sequence with BLASTn.

Development of a Real Time Pcr-Based Specific Assay
To develop a Real Time PCR assay specific to the newly assembled EcPV genome, primers and a probe targeting the L1 gene were designed (Table 1). Their specificity was tested in silico by using the in-house built EcPV genome database as a custom-defined database in the Primer-BLAST tool, as well as by blasting the resulting amplicon towards the Papillomaviridae (taxid:151340) NCBI database. qPCR reactions were prepared with 1× iTAQ universal probes supermix (BioRad, Milan, Italy), 1 µM of both forward and reverse primers, 0.2 µM of the probe, and 2 µL of DNA, in a final volume of 25 µL of double sterile water. Thermal cycling parameters consisted of an initial preheating step for 5 min at 95 • C followed by 40 cycles at 95 • C for 15 s, 60 • C for 45 s, performed in a Rotor-gene Q machine (QIAGEN). The acquisition of the fluorescent signal occurred at the end of each cycle, and the threshold to consider the sample positive was set at cycle 38.

Genital Brush Samples Collection from Horses for EcPV10 Detection
From January 2021 to July 2022, genital brushes were collected at the Didactic Veterinary University Hospital (OVUD) of Perugia and Turin for reproductive reasons, with the following inclusion criteria: (i) no sign of neoplastic disease, (ii) no lesions were associated to PVs infection. Penile and vulvar swabs were collected through sampling with cytobrushes (Deltalab SLU, Barcelona, Spain) as previously described [13]. Brushes were stored in 2 mL tubes containing 800 µL of DNA/RNA Shield Stabilization Solution (Zymo Research, Irvine, CA, USA), then kept at −20 • C until processing.

Statistical Analysis
Breed and origin information were obtained from medical records. Descriptive statistics analysis was performed with Microsoft Excel (2016). STATA16.1 (StataCorp, College Station, TX, USA) software was used to fit a logistic regression model assessing the association, expressed as Odds Ratio (OR) between the positivity or negativity to EcPV10 L1 (dependent variable) and four classes of age (group 1: <6 yy; group 2: 6 ≤ 9 yy; group 3: 9 ≤ 13 yy; group 4: ≥13 yy) and two of breed (Thoroughbred vs. the others) (independent variable). Statistical analysis was restricted to mares; the OR was assessed through logistic regression using as dependent variables the positivity/negativity to EcPV10 L1, artificial insemination/natural service, and pluriparous/maiden, while breed and age were considered as independent variables. Finally, a third logistic regression model was fit to assess the OR between being pregnant vs. being empty (dependent variable) and the positivity/negativity to EcPV-10 L1 and fertility/hypo-fertility (independent variables).

Results
A survey was conducted on the fertility status of a seven-year-old Thoroughbred mare from Chillivani (Sardinia, Italy). Several tests to evaluate the mare fertility were carried out, such as transrectal palpation, ultrasound examination, and vaginal speculum examination, as detailed in Section 2.1, with negative results except for the detection of EcPV2 DNA on Vaginal Cytobrush with a mean Cq of 26.5 ± 0.8. Since our previous data [13] suggested a possible fertility alteration linked to an EcPV2 infection of the genital tract, we decided to sequence this sample to better characterize this case of study.

De Novo Assembly of the Viral Genome
DNA from three different brush replicates was sequenced and analyzed independently with Next Generation Sequencing (NGS) using Illumina technology. The total amount of raw and filtered reads, as well as the total number of assembled contigs, are summarized in Table S2. The assembled contigs were then aligned using BLASTn towards the in-house built database containing all the EcPV available genomes, listed in Table S1, and four contigs (named in Table S3: NODE_183 of 7543 bp in replicate 1; NODE_467 of 7607 bp in replicate 2; NODE_328 of 5539 bp and NODE_2291 of 1856 bp in replicate 3) were found to be distantly related to EcPV9, EcPV2, EcPV1, and EcPV6, with an average percentage of identity of only 67% (the better alignments of the four contigs are highlighted in yellow in Table S3). Thus, to test the hypothesis that this virus belongs to a new PV genotype, the four contigs were aligned to each other (File S1) to retrieve one single consensus sequence of 7607 bp, further validated with short-reads alignment using BWA [23] and resulting in a 26× average coverage (Figure 1). We analyzed the coding sequence with Gatu [28], PuMA [29], and ORF finder (NCBI) to identify the sequences related to the latent L1 and L2 genes, together with the earlier features E1, E2, E4, E6, and E7 (Figure 1), further supporting our hypothesis that this sequence represents a novel EcPV genotype.
Furthermore, in each replicate, several shorter contigs were related to EcPV2, even with more than 99% of identity highlighted in light green in Table S3, suggesting a double infection, in line with the qPCR results described above, performed using the EcPV2 probe [13].

Sanger Sequencing Validation
The genome sequence for this new PV genotype, named EcPV10, was further validated through Sanger sequencing using a genome walking approach. Primers for overlapping PCR amplicons (Table 1) are schematically represented on the genome sequence in Figure 1. Amplicons were sequenced with Sanger and aligned to the assembled genome sequence obtained from the NGS analysis with the BLASTn algorithm. The final genome obtained through NGS and Sanger sequencing was deposited on the NCBI database under the accession number OP870083, together with the annotation of its features. Furthermore, in each replicate, several shorter contigs were related to EcPV2, even with more than 99% of identity highlighted in light green in Table S3, suggesting a double infection, in line with the qPCR results described above, performed using the EcPV2 probe [13].

Sanger Sequencing Validation
The genome sequence for this new PV genotype, named EcPV10, was further validated through Sanger sequencing using a genome walking approach. Primers for overlapping PCR amplicons (Table 1) are schematically represented on the genome sequence in Figure 1. Amplicons were sequenced with Sanger and aligned to the assembled genome sequence obtained from the NGS analysis with the BLASTn algorithm. The final genome obtained through NGS and Sanger sequencing was deposited on the NCBI database under the accession number OP870083, together with the annotation of its features. Concentric circles, from outermost to innermost, show the following: the backbone of the EcPV10 genome; CDS in green arrows; overlapping primers for the genome walking validation method, in forward (red) and reverse orientation (blue); GC content percentage; GC skew, in orange for the forward strand and purple for the reverse strand; Illumina short reads coverage on the assembled sequence.

Genomic Properties and Phylogenetic Relationships of EcPV10
The ML phylogenetic analysis based on the concatenated sequences of E1, E2, L2, and L1 showed clustering of EcPV10 among an equine (including horses and donkeys) PV group and a subgroup of bovine PVs (Figure 2). Within the equine PV group, EcPV10 is closely related to EcPV9 and EcPV2 (Dyoiota 1 genus) (Figure 2). These results are in line with the similarity results obtained with BLASTn analysis (Table S3). The Dyoiota 2 genus, including EcPV4 and EcPV5, is the second closest to EcPV10 (Figure 2).
The ML phylogenetic analysis based on the concatenated sequences of E1, E2, L2, and L1 showed clustering of EcPV10 among an equine (including horses and donkeys) PV group and a subgroup of bovine PVs (Figure 2). Within the equine PV group, EcPV10 is closely related to EcPV9 and EcPV2 (Dyoiota 1 genus) (Figure 2). These results are in line with the similarity results obtained with BLASTn analysis (Table S3). The Dyoiota 2 genus, including EcPV4 and EcPV5, is the second closest to EcPV10 (Figure 2). Background colors separate the different EcPV phylogenetic groups from bovine EcPVs, according to [5]. EcPV groups Dyoiota 1 and 2, Dyorho, Treiskaappa, and Zata are indicated in the figure. Our EcPV10 isolate is indicated in red.

EcPV10 Genoprevalence
To investigate the EcPV10 genoprevalence, a total of 216 horses (206 females, 8 stallions, and 2 geldings) were sampled during clinical examination at OVUD (Perugia) and OVU (Turin) and tested for EcPV10 infection using our designed quantitative Real-Time PCR (see Table 1). The age of the sampled horses ranged from 6 months to 22 years, with a mean of 9.8 (±4.6) years and a median of 10 years. Overall, following four categories of age division, 40 animals were <6 yy (very young), 43 were 6 ≤ 9 yy (young), 65 were 9 ≤ 13 yy (adult), and 68 were ≥13 yy (elderly).

EcPV10 Genoprevalence
To investigate the EcPV10 genoprevalence, a total of 216 horses (206 females, 8 stallions, and 2 geldings) were sampled during clinical examination at OVUD (Perugia) and OVU (Turin) and tested for EcPV10 infection using our designed quantitative Real-Time PCR (see Table 1). The age of the sampled horses ranged from 6 months to 22 years, with a mean of 9.8 (±4.6) years and a median of 10 years. Overall, following four categories of age division, 40 animals were <6 yy (very young), 43 were 6 ≤ 9 yy (young), 65 were 9 ≤ 13 yy (adult), and 68 were ≥13 yy (elderly).
EcPV10 DNA was found in 3.7% (8 out of 216) of the examined horses, including the first one from Sardinia, with Cq from 30 to 38.7. Seven of the positive subjects were mares, whilst one was a stallion. The age of the positive subjects ranged from 7 to 19 years. Most positive subjects were found in Piedmont (six horses), while the other two were from Sardinia and Tuscany, respectively ( Figure S1). Regarding the breed, horses positive for EcPV10 L1 DNA were mostly represented by Thoroughbred (5), followed by Standardbred (1), Quarter horse (1), and one not identified (1) ( Table 3). The logistic regression model did not show any correlation between viral infection risk and age class or breed. Similarly, the presence/absence of the virus did not appear to be related to sex and did not influence fertility in the investigated horses. A larger number of subjects should be examined in future studies in order to confirm our observations.

Discussion
In this study, we report the discovery, full genome sequence, and classification of EcPV10, a novel Equus Caballus Papillomavirus type 10. The complete circular genome of 7607 bp has been deposited on the GenBank database under the accession number OP870083. Total DNA was isolated from a vaginal cytobrush of a mare affected by infertility in triplicate and subjected to different tests, including the screening of EcPV2. Screening for EcPV2 was performed since a recent study [13] suggested a possible negative effect of EcPV2 infection on mares' fertility, highlighting a pivotal role of EcPV2 infection in the management of horse health. However, the positivity to EcPV2 did not explain the complete clinical picture of the mare, nor her infertility. For this reason, total DNA was sequenced in three different replicates through the application of the Illumina short read technology, using paired-end sequencing methods. The de novo assembly of the raw reads filtered throughout the horse genome confirmed co-infection with EcPV2 and the newly identified EcPV10.
Our phylogenetic analysis demonstrates that EcPV10 is a novel species within the genus DyoiotaPV, most closely related to EcPV2 and EcPV9, and classified as DyoiotaPV1. PVs are classified on the basis of L1, the most conserved gene, but diversity between PV types is enhanced in the E6-E7 oncogene regions [14]. In this respect, L1 protein of EcPV10 shared high aa similarity to EcPV2 and EcPV9, 64.36% and 64.27%, respectively. The rather low similarity was indicative of a novel PV. As for other EcPVs, the ORF encoding for E5 oncogene is not found in EcPV10. This is surprising since E5 is very important for the modulation of host factors and for the control of viral replication and persistence. For example, E5 downregulates MHC I and II expression; however, it shows poor activity in classic transformation assays [33]. Regarding E6 and E7 oncogenes, EcPV10 genes showed high dissimilarity at the nucleotide and aa level when compared with the other EcPVs, as to be expected. Given that these genes are involved in the modulation of cell survival, cellular transcription, cell differentiation, responses to DNA damage, cell cycle progression, and cancer development and progression, it is likely that the genetic differences translate into a different pathogenesis for this virus.
Out of the 216 examined horses using the EcPV10-specific Real Time assay, only 7 mares and 1 stallion tested positive. Despite EcPV10 belonging to Dyoiota 1, the genus that includes the most prevalent PVs within the Italian horse population, i.e., EcPV2 and EcPV9 [3,13], our results indicate a limited circulation of EcPV10 in Italy, to date. Indeed, EcPV2 and EcPV9 showed a higher infection risk associated with breed compared to EcPV10, with the Thoroughbred being significantly more infected by EcPV2 and EcPV9 compared to other breeds. This finding may be the consequence of natural mating, making sexual transmission the probable way of EcPV2 and EcPV9 diffusion. Risk of spreading EcPV10 was not associated with breed, likely suggesting a different way of transmission for this PV and suggesting no impact on breed upon infection. In this preliminary study, we tested a moderate number of samples (216 horses), which was, however, considered representative enough to suggest a low occurrence of EcPV10 in Italy. Low prevalence of EcPV10 among the Italian horse population could be attributed to several features related to the viral proteins and the capability of the host's immune response to recognize and eliminate the virus. However, further studies are needed to better understand other possible host-EcPV10 interactions due to the diversity we encountered in its viral proteins. Deeper analyses are also needed in order to shed light into the transmission mechanisms and disease incidence of this new PV isolate.
Supplementary Materials: The following supporting information can be downloaded at https: //www.mdpi.com/article/10.3390/v15030650/s1: Table S1: List of PV and their related accession numbers used for phylogenetic analysis; Table S2: Total amount of raw reads, filtered reads, and number of assembled contigs in the three replicates; Table S3: BLASTn results of the assembled contigs towards the in-house built PV database; File S1: Contigs alignments from the three different replicates; Figure S1: Map of Italy showing the three regions of origin of horses positive for EcPV10. The numbers indicate the number of samples collected in each site.