Genetic Diversity of the Noncoding Control Region of the Novel Human Polyomaviruses

The genomes of polyomaviruses are characterized by their tripartite organization with an early region, a late region and a noncoding control region (NCCR). The early region encodes proteins involved in replication and transcription of the viral genome, while expression of the late region generates the capsid proteins. Transcription regulatory sequences for expression of the early and late genes, as well as the origin of replication are encompassed in the NCCR. Cell tropism of polyomaviruses not only depends on the appropriate receptors on the host cell, but cell-specific expression of the viral genes is also governed by the NCCR. Thus far, 15 polyomaviruses have been isolated from humans, though it remains to be established whether all of them are genuine human polyomaviruses (HPyVs). The sequences of the NCCR of these HPyVs show high genetic variability and have been best studied in the human polyomaviruses BK and JC. Rearranged NCCRs in BKPyV and JCPyV, the first HPyVs to be discovered approximately 30 years ago, have been associated with the pathogenic properties of these viruses in nephropathy and progressive multifocal leukoencephalopathy, respectively. Since 2007, thirteen novel PyVs have been isolated from humans: KIPyV, WUPyV, MCPyV, HPyV6, HPyV7, TSPyV, HPyV9, HPyV10, STLPyV, HPyV12, NJPyV, LIPyV and QPyV. This review describes all NCCR variants of the new HPyVs that have been reported in the literature and discusses the possible consequences of NCCR diversity in terms of promoter strength, putative transcription factor binding sites and possible association with diseases.

. The novel human polyomaviruses, their original source of isolation and their association with human diseases.

Virus Original Source Associated Disease Reference
KIPyV Nasopharyngeal aspirate None [7] WUPyV Bronchoavelar lavage None [8] MCPyV Merkel cell carcinoma None [9] HPyV6 Healthy skin Pruritic skin eruption in immunocompromised patients [10] HPyV7 Healthy skin Pruritic skin eruption in immunocompromised patients [10] TSPyV Trichodysplasia spinulosa spicules Trichodysplasia spinulosa [11] HPyV9 Serum from renal transplant recipient None [12] HPyV10 Condyloma specimens from a patient with WHIM * syndrome None [13] STLPyV Stool sample from a healthy 15-month-old child None [14] HPyV12 Liver sample from patient with malignant disease None [15] NJPyV Muscle biopsy from a pancreatic transplant patient None [16] LIPyV Skin swab None [17] QPyV Stool sample from 85-year old hospital patient None [18] involved in replication and transcription of the viral genome. The major early proteins are large Tantigen (LT) and small t-antigen (sT). The late region codes for the structural proteins VP1, VP2 and VP3 that form the capsid. VP1 is the major capsid protein, while VP2 and VP3 are the minor capsid proteins [1,2]. However, not all PyVs express VP3 [33]. Interspersed between the early and late region are sequences that do not code for viral proteins, and is referred to as the NCCR. This is a neighbor-joining tree without distance corrections using Clustal Omega multiple sequence alignment [34].
Studies with simian virus 40 (SV40 or Macaca mulatta polyomavirus 1) and murine polyomaviruses have been pivotal in unveiling the functions of this region. The SV40 NCCR contains the origin of replication (ori), which consists of GRGGC motifs to which LT binds and is flanked by an AT-rich sequence and an easily denaturated imperfect palindrome [35,36]. Binding of LT to these motifs is also involved in regulation of viral transcription [37,38]. The NCCR also contains promoter and enhancer elements that control early and late transcription [39,40]. SV40 directly isolated from its natural host, rhesus monkey, has a NCCR that consists of an AT-rich region, triple GC-rich 21 base-pairs (bp) repeats, and a single 72 bp element. The 21 bp repeats contain the LT binding motif (GRGGC; [41,42]). This NCCR organization is known as the archetype. SV40 adapted to grow in cell culture has a duplication of this 72 bp element, with this type of NCCR referred to as prototype [43,44]. SV40 isolated from human tumors usually contain a single 72 bp repeat [43]. Rearrangements in the SV40 NCCR affect viral transcription and replication, as well as oncogenic properties of the virus [45,46]. The Mouse polyomavirus (Mus musculus polyomavirus 1; MPyV) NCCR encompasses the ori consisting of an AT-tract and a GC-rich (LT binding motifs) inverted repeat, and the transcription regulatory domains A (or α) and B (or β), C and D [47][48][49][50]. Alterations in the MPyV (B) Schematic presentation of the NCCR of the novel HPyVs. The NCCR is the region between the start codon of Large T antigen (LT) and Small T antigen (sT) and the start codon of VP2. The AT-rich region (AT), repeated sequences (black dots), and LT binding motifs (upward pointing triangle = 5 -GRGGC-3 ; downward pointing triangle = 5 -GCCYC-3 ) are shown. (C) Phylogenetic tree bases on NCCR sequences of the different HPyVs. This is a neighbor-joining tree without distance corrections using Clustal Omega multiple sequence alignment [34]. Studies with simian virus 40 (SV40 or Macaca mulatta polyomavirus 1) and murine polyomaviruses have been pivotal in unveiling the functions of this region. The SV40 NCCR contains the origin of replication (ori), which consists of GRGGC motifs to which LT binds and is flanked by an AT-rich sequence and an easily denaturated imperfect palindrome [35,36]. Binding of LT to these motifs is also involved in regulation of viral transcription [37,38]. The NCCR also contains promoter and enhancer elements that control early and late transcription [39,40]. SV40 directly isolated from its natural host, rhesus monkey, has a NCCR that consists of an AT-rich region, triple GC-rich 21 base-pairs (bp) repeats, and a single 72 bp element. The 21 bp repeats contain the LT binding motif (GRGGC; [41,42]). This NCCR organization is known as the archetype. SV40 adapted to grow in cell culture has a duplication of this 72 bp element, with this type of NCCR referred to as prototype [43,44]. SV40 isolated from human tumors usually contain a single 72 bp repeat [43]. Rearrangements in the SV40 NCCR affect viral transcription and replication, as well as oncogenic properties of the virus [45,46]. The Mouse polyomavirus (Mus musculus polyomavirus 1; MPyV) NCCR encompasses the ori consisting of an AT-tract and a GC-rich (LT binding motifs) inverted repeat, and the transcription regulatory domains A (or α) and B (or β), C and D [47][48][49][50]. Alterations in the MPyV NCCR have an effect on viral replication in cell culture and in the host, the host range, and in vitro transformation [51][52][53][54][55].
The NCCR of the HPyVs varies between 267 bp (JCPyV CY-strain; accession number AB038249) to 645 bp (WUPyV prototype; accession number NC_009539) (see Figure S1 for the NCCR sequences of the novel HPyV), and similar to the NCCR of SV40 and MPyV, the NCCR of HPyVs also contain the origin of replication, LT binding motifs, and AT-rich region ( Figure 1B). This region of the genome displays little or no sequence identity between the different HPyV species ( Figure S2). A neighbor-joining tree without distance corrections shows which NCCRs are most closely related ( Figure 1C). The diversification based on the presence of a certain NCCR rearranged structure contributed to determining HPyVs strains as "archetype" or prototype". The importance of the NCCR rearrangements during HPyVs infection became obvious when different strains of JCPyV were examined. The archetype JCPyV NCCR strain (CY) is divided into six boxes named A (36 bp), B (23 bp), C (55 bp), D (66 bp), E (18 bp), and F (69 bp) and contains the origin of replication (ORI), the promoter and the enhancer elements [56]. The NCCR harbored transcription factor binding sites such as the nuclear transcription factor-1 (NF1), a JCPyV cell-specific regulator of promoter and enhancer activity [57,58], the activating protein 1 (AP1), involved in JCPyV early gene expression [57,59], and the specificity protein-1 (SP1) able to regulate JCPyV transcription [57,60]. The archetype NCCR is considered the transmissible form of the virus among the population, and could be released into the urine of healthy individuals due to periodic and subclinical reactivation in the kidney [61,62]. In contrast, in the context of immunosuppression or during immunomodulatory therapy or in AIDS patients, JCPyV can reactivate from latency to cause a fatal pathology of the central nervous system (CNS), known as progressive multifocal leukoencephalopathy (PML) [61]. JCPyV variants carrying rearranged NCCR were usually isolated from PML patients. The prototype Mad-1 strain is the most studied variant of JCPyV and is characterized by 98-bp tandem repeats in the NCCR late proximal region (arranged as ORI-A-C-E-A-C-E-F), and is able to increase viral gene expression in human glial cells, thereby indicating that it is involved in controlling cell gene expression [63][64][65]. The enhancer repeats found in the Mad-1 strain are lacking in the archetype JCPyV strains isolated from the urine of healthy individuals [64]. Additional NCCR rearrangements are implicated in the development of the JCPyV pathogenic strains. In fact, in a significant proportion of JCPyV archetype isolates, short deletions or duplications were observed, corroborating that this region is highly unstable [66,67]. Therefore, it is possible to assume that subsequent archetypal NCRR rearrangements could determine the onset of PML strains, such as Mad-1 [68].
Based on the occurrence in the NCRR of transcriptional enhancer repeat elements, BKPyV isolates can also be identified as archetype and prototype strains. The archetype BKPyV WW strain, characterized by five blocks named O (35 bp), which includes the origin of replication and a TATA-box, P (68 bp), Q (39 bp), R (63 bp), and S (63 bp), containing TATA-like elements and the regulatory regions for early and late genes expression, is considered the infectious strain, shed in the urine of immunocompetent individuals [69][70][71]. Approximately 30 transcription factor binding sites are in silico predicted: SP1 has been the most extensively studied [72][73][74], although the additional role played by other transcription factors such as NF1, ETS1, NFκB, the glucocorticoid and progesterone receptors, and CREB were evidenced in several studies [73,[75][76][77].
Similarly to JCPyV, the plausible instability of the archetype BKPyV NCRR could contribute to the development of the prototype strains, which is able to cause polyomavirus-associated nephropathy in kidney transplant recipients and hemorrhagic cystitis in hematopoietic stem cell transplant recipients [78][79][80][81]. The Dunlop strain, the most salient prototype strain, was isolated from a kidney transplant recipient with ureteral stenosis [82]. This strain displays three 68-bp tandem repeat within the NCRR (O-P-P-P-S arrangement) with respect to the archetype strain, carrying a single 68-bp motif. This strain showed less enhancer activity than the prototype strain, thus confirming the significance of the triplicated motifs on transcriptional regulation, and on viral infectious activity [83]. In fact, BKPyV strains isolated from kidney transplant recipients with rearranged NCRR showed higher viral gene expression and viral loads with more extensive pathogenicity [84].
Additional NCCR structures have been described for both viruses [85][86][87][88]. In particular, the presence of a common pattern of JCPyV NCCR rearrangement, such as the D-box deletion, can be considered a hallmark needed for the initial NCCR rearrangements critical co-factor for the development of PML in immunosuppressed individuals [88,89]. Besides the triplication of the P region, rearrangements of BKPyV NCCR involve the adjacent O and Q blocks. Differently, the S block is always retained, hence highlighting the importance of these nucleotide sequences [70]. NCCR mutations were also observed during in vitro JCPyV and BKPyV cultivation, confirming that NCRR variants could Viruses 2020, 12, 1406 6 of 30 arise after prolonged propagation of the viruses in cells [71,85,90,91]. The mechanisms by which both viruses determine relevant human diseases are not established, but it is accepted that the regulation of gene expression in HPyVs plays a role in determining the viral tropism, and in the promotion of pathogenesis progression [92].
Little is known about the genetic diversity of the NCCRs from the novel HPyVs and the biological relevance in terms of viral transcription, replication, and possible pathogenic properties. In this review, we provide an overview of the mutations in the NCCR, which is defined as the sequence between the start codon of the LT/sT gene and the start codon of the VP2 gene, of the novel HPyVs and their known effect on promoter activity. We discuss how NCCR rearrangements may affect the binding of putative transcription factors, and whether specific NCCR configurations are associated with disease.

KI and WU NCCR Variants
KIPyV has been mostly isolated from oral and respiratory specimens from (pediatric) patients with respiratory diseases that suffer from other viral and bacterial infections (reviewed in [93]). Whether KIPyV is a genuine respiratory pathogen or an opportunistic co-infector has not been established [93,94]. Seventy-two full-length NCCR sequences have been deposited in GenBank so far (Table S1). They contain the LT binding motifs, an AT-rich stretch and repeated sequences ( Figure 1B and Table 2). Most KIPyV NCCR sequences are obtained from nasopharyngeal swabs or aspirates, but also from blood form healthy blood donors [96] and from feces from a child with acute gastroenteritis [97]. The NCCR sequence of the Stockholm 60 isolate (Genbank accession number NC_009238; [7]; Figure  S1) may represent the archetype NCCR because it is the most common sequence reported, and has been isolated from different biological samples in different parts of the world. Stockholm 60 KIPyV was originally isolated from respiratory tract specimens from a child. We found that 21 out of 48 isolates from nasopharyngeal aspirates of patients with respiratory symptoms or infections and 23 out of 38 isolates from healthy blood donors have the Stockholm 60 NCCR [96]. As described by us and others, NCCRs of other isolates, contain only minor point mutations scattered throughout the entire NCCR ( Figure 2 and Table 3). Exceptions are the isolates Brisbane 001, Brisbane 005 and CU-255, whose NCCRs have the 10 bp AGGCGCTGCG insertion, and are clinical isolates obtained from respiratory tract (Table S1). from blood form healthy blood donors [96] and from feces from a child with acute gastroenteritis [97]. The NCCR sequence of the Stockholm 60 isolate (Genbank accession number NC_009238; [7]; Figure S1) may represent the archetype NCCR because it is the most common sequence reported, and has been isolated from different biological samples in different parts of the world. Stockholm 60 KIPyV was originally isolated from respiratory tract specimens from a child. We found that 21 out of 48 isolates from nasopharyngeal aspirates of patients with respiratory symptoms or infections and 23 out of 38 isolates from healthy blood donors have the Stockholm 60 NCCR [96]. As described by us and others, NCCRs of other isolates, contain only minor point mutations scattered throughout the entire NCCR ( Figure 2 and Table 3). Exceptions are the isolates Brisbane 001, Brisbane 005 and CU-255, whose NCCRs have the 10 bp AGGCGCTGCG insertion, and are clinical isolates obtained from respiratory tract (Table S1).
. Figure 2. Mutations and their prevalence in variants of Karolinska Institute polyomavirus (KIPyV) noncoding control region (NCCR). The numbering of the NCCR is from early to late, with nucleotide 1 being the most proximal to the ATG start codon of the early genes and the most distal nucleotide, just upstream of the start codon of the VP2 gene. The number of times a peculiar mutation is found in the different variants is given as frequency (with n the number of times the mutation was described/total NCCR variant sequences available). Putative Large T antigen (LT) binding motifs 5′-GRGGC-3′ (→) or 5′-GCCYC-3′ (←) are shown. The table summarizes the mutations, their location in the NCCR and their frequency. For details, see Table S1. Putative transcription factor binding sites are shown in Supplementary Table S2.  noncoding control region (NCCR). The numbering of the NCCR is from early to late, with nucleotide 1 being the most proximal to the ATG start codon of the early genes and the most distal nucleotide, just upstream of the start codon of the VP2 gene. The number of times a peculiar mutation is found in the different variants is given as frequency (with n the number of times the mutation was described/total NCCR variant sequences available). Putative Large T antigen (LT) binding motifs 5 -GRGGC-3 (→) or 5 -GCCYC-3 (←) are shown. The table summarizes the mutations, their location in the NCCR and their frequency. For details, see Table S1. Putative transcription factor binding sites are shown in Supplementary Table S2.
The KIPyV NCCR contains putative binding sites for several transcription factors (Table 4 and  Table S2). The effect of this 10 bp AGGCGCTGCG insertion on KIPyV promoter activity or replication is not known, but the sequence contains a putative binding site for transcription factor AP4 [98]. AP4 is ubiquitously expressed, and can both activate and repress transcription [99,100]. Its effect on KIPyV NCCR has not been investigated. The point mutations remove or create putative binding sites for several transcription factors, including nuclear receptors, STAT proteins, HOXD, and POU the general transcription factors TBP and TFIID (see Table 3 in [101] for a detailed overview). We examined the effect of NCCR polymorphism in isolates from blood and nasopharyngeal samples on early and late promoter activity in HEK 293 cells [96]. These cells had previously been shown to give highest promoter activity of 10 different cell lines tested [102]. Eighteen isolates with a single nucleotide substitution were tested and revealed significant differences in early and late promoter activities for some of the isolates. One variant (NPA7d) had a mutation that destroyed a putative c-Myb binding motif compared to Stockholm 60 NCCR. Ectopic expression of c-Myb stimulated the early and late promoter activities of both Stockholm 60 and NPA7d, but there was no significant difference in c-Myb induced activation of the promoters [96]. Some of the mutations are located in putative LT binding sites and may therefore have an effect on promoter activity or/and viral DNA replication. It remains to be determined whether the NCCR may have an effect on the pathogenic properties of KIPyV because Stockholm 60 and Stockholm 60-like NCCRs have also been isolated from blood and respiratory specimens from healthy individuals, with no direct association between KIPyV and diseases having been established. Larger KIPyV NCCR rearrangements as seen for BKPyV and JCPyV NCCRs seem to be rare. Table 3. Frequency of mutations in the noncoding control region of Karolinska Institute polyomavirus.

Mutation Frequency * Mutation Frequency Mutation Frequency
A total of 185 partial or complete WUPyV NCCR sequences are available in the GenBank (Table  S1). All strains have a NCCR of 645 bp, except variant J1, which has an insertion of one A at position 277 (Table S1), and contains an AT-rich stretch, GRGGC pentamers and repeated sequences of, respectively, 10 and 16 bp ( Figure 1B and Table 2). Polymorphisms are predominantly in the NCCR part proximal to the early region ( Figure 3 and Table 5). The most common point mutations are G54A and T59G. Both mutations are often present simultaneously. The substitution C52G is also common, but is always found in combination with the G54A mutation. The variants GD-WU709 and WU/Wuerzburg01/07 have C52T rather than C52G, whereas 12 variants have the triple mutation C52G/G54A/T59G. These three nucleotides are part of a sequence that is flanked, respectively, by 4 and 5 T residues; the triple substitution removes the putative binding site for transcription factor c-MYB, and creates motifs for TATA/TBP and retinoic acid receptor-related orphan receptor α [98]. The mutations A94G and C105G are also always simultaneously present, with the double mutations generating a remote sequence similarity with the binding motif of transcription factor AP1, though this does not seem to affect the binding of other putative factors (Table S2; [98]). Other common mutations include A284C and C285A, which are also found together except for the WU/Wuerzberg03/07 variant, which lacks the A284C substitution. A284/C285 are part of a putative site for RUNX1 (AML1; [98]), a transcription factor involved in hematopoiesis [103]. While G295A is found in 9 NCCR sequences, one strain (CQ6029/China_CQ/2014) had a G295C replacement. The CU_CHONBURI3 isolate from a patient with respiratory diseases had several unique point mutations. Overall, no typical mutations in specific specimens were detected, nor was an apparent correlation with a genotype and geographic regions. To the best of our knowledge, the effect of mutations on the WUPyV promoter activity has not been studied, nor have the consequences of viral replication been addressed. Whether mutations had an effect on putative transcription factor binding sites is also unknown, but because most mutations are single or few point mutations, they may not destroy or create novel binding sites.   . The numbering of the NCCR is from early to late, with nucleotide being 1 the most proximal to the ATG start codon of the early genes and the most distal nucleotide, just upstream of the start codon of the VP2 gene. The number of times a peculiar mutation is found in the different variants is given as frequency (with n the number of times the mutation was described/total NCCR variant sequences available). Putative Large T antigen (LT) binding motifs 5′-GRGGC-3′ (→) or 5′-GCCYC-3′ (←) are shown. The table summarizes the mutations, their location in the NCCR, and their frequency. For details see Table S1. Putative transcription factor binding sites are shown in Table S2.

MCPyV NCCR Variants
NCCR rearrangements are described as a pivotal event in the onset of HPyVs-related pathology, as demonstrated for JCPyV and BKPyV, in which NCCRs not only control gene expression, but also noncoding control region (NCCR). The numbering of the NCCR is from early to late, with nucleotide being 1 the most proximal to the ATG start codon of the early genes and the most distal nucleotide, just upstream of the start codon of the VP2 gene. The number of times a peculiar mutation is found in the different variants is given as frequency (with n the number of times the mutation was described/total NCCR variant sequences available). Putative Large T antigen (LT) binding motifs 5 -GRGGC-3 (→) or 5 -GCCYC-3 (←) are shown. The table summarizes the mutations, their location in the NCCR, and their frequency. For details see Table S1. Putative transcription factor binding sites are shown in Table S2. Table 5.
Frequency of mutations in the noncoding control region of Washington University polyomavirus.

Mutation
Frequency * Mutation Frequency Mutation Frequency

MCPyV NCCR Variants
NCCR rearrangements are described as a pivotal event in the onset of HPyVs-related pathology, as demonstrated for JCPyV and BKPyV, in which NCCRs not only control gene expression, but also serve as the main determinants in viral replication, containing the origin of DNA replication and transcription factor binding sites [104,105]. MCPyV is a major causative agent of the skin cancer Merkel cell carcinoma [9], but whether the NCCR can influence the outcome of the infection remains elusive. More than 100 partial or complete NCCR sequences are available from MCC and non-MCC tissue (Table S1). Mutations have been described throughout the entire NCCR, but especially in the late promoter part region ( Figure 4 and Table 6). Nucleotides 360-425 of the MCPyV NCCR contain putative binding sites for transcription factors AP1, AP2, C/EBPα and β, EVI1, NFκB, c-Myb, p53, SOX5, TST-1, and SP1 (Table 4; [98]), although their binding has not been proven so far. Some of the mutations affect putative LT binding motifs, and may therefore interfere with transcription and replication of the viral DNA. Indeed, studies by the group of Chang and Moore showed that mutations in nucleotides G143, C145A, A173 and C176 abolished the replication of MCC isolates MCV339 and MCV350 in the presence of full-length LT [106,107]. The NCCR from MCC isolates FraMerk22 and FraMerk24 both contain the mutations G143T and C176T, whereas MCC isolate MKT-23 has the mutations G143A, C145G, and A173A, with MKT-32 carrying the transversion C146G. Since all these isolates are derived from MCC, they are replication deficient due to the expression of truncated LT and integration. None of the mutations identified by the work of the Chang-Moore group that abrogate MCPyV replication have been reported in non-MCC PyV isolates (see Table S1).
Viruses 2020, 12, x 13 of 30 serve as the main determinants in viral replication, containing the origin of DNA replication and transcription factor binding sites [104,105]. MCPyV is a major causative agent of the skin cancer Merkel cell carcinoma [9], but whether the NCCR can influence the outcome of the infection remains elusive. More than 100 partial or complete NCCR sequences are available from MCC and non-MCC tissue (Table S1). Mutations have been described throughout the entire NCCR, but especially in the late promoter part region ( Figure 4 and Table 6). Nucleotides 360-425 of the MCPyV NCCR contain putative binding sites for transcription factors AP1, AP2, C/EBPα and β, EVI1, NFκB, c-Myb, p53, SOX5, TST-1, and SP1 ( Since all these isolates are derived from MCC, they are replication deficient due to the expression of truncated LT and integration. None of the mutations identified by the work of the Chang-Moore group that abrogate MCPyV replication have been reported in non-MCC PyV isolates (see Table S1).  Table S1. Putative transcription factor binding sites are shown in Supplementary Table S2.   Table S1. Putative transcription factor binding sites are shown in Supplementary Table S2.
Whether a MCPyV variant with a particular NCCR architecture is associated with specific patient groups is not known. MCPyV with different NCCRs have been characterized in Merkel cell carcinoma samples (see Table S1 [109]. Prezioso, et al., studying the MCPyV NCCR from urine, plasma and rectal swabs recovered from immunosuppressed population, observed, in plasma and rectal swabs, as well as the occurrence of the MCPyV NCCR IIa-2 strain, which contains the 5 bp insertion and represents the predominant strain among white persons of European descent [110]. The deletion of nucleotide G352 is unique for the MCPyV isolates in plasma, urine and rectal swab specimens from HIV-1 patients, and has not been described in MCPyV isolates from other patient groups. In addition to the NCCR genotypes circulating within a HIV-1-positive population in the same study, Prezioso et al. evaluated the MCPyV NCCR alterations focusing on putative binding sites of cellular transcription factors, in order to verify whether mutations and/or rearrangements could fall in some binding sites [110]. The analysis of distal NCCR sequences (nucleotides 302-464) and the analysis of the relative putative binding site, revealed a high degree of homology with R17b strain in urine samples, whereas transitions, transversions, and single or double deletions were observed in plasma and rectal swabs (Table S1). Differently from JCPyV and BKPyV, in which the early proximal side of NCCR is highly conserved and the late proximal side undergoes rearrangements [111], insertions and deletions occurred in both the early and late proximal side of the MCPyV NCCR. More specifically, representative TCAAT and AAC insertions (nucleotide positions 5210-5211) were observed in both plasma and rectal swabs. Analysis of the putative binding site showed that the MCC350 NCCR sequence contains putative NF1, NFκB, TST-1, OCT1, AP-1, and TATA sites, already described within the NCCRs of other HPyVs [98,112]. In several strains obtained from MCPyV-positive plasma and rectal swabs samples, deletions, insertions, or single base substitutions fell within these putative binding sites, thus making predictable that some of these changes would not allow the identification of putative binding motifs, such as SP1 and/or p53, already described in the NCCR of other HPyVs [112]. Further studies are warranted in order to define the importance of these NCCR binding sites and to understand how their changes (mutations, insertions, or deletions) may influence in vivo MCPyV pathogenicity. In contrast to NCCR analysis conducted on rectal swabs from an HIV-1-positive population, which were characterized by the onset of transitions, transversions, and single or double deletions [110], MCPyV NCCR in stool samples from patients with hematological disorders exhibited a high degree of sequence stability, thereby suggesting that sequence rearrangements occurred rarely in the gastrointestinal anatomical site [113]. To date, although it is well documented that MCPyV DNA has been detected in the upper and lower respiratory tract specimens of children and adults and in immunocompetent and immunocompromised patients [114][115][116][117][118] and that the detection of MCPyV DNA was also observed in cystic fibrosis patient respiratory secretions [119,120], the respiratory NCCR structure organization has not yet been investigated.
The relative early and late promoter strength of seven MCPyV NCCR variants was compared in human dermal fibroblasts, and in the non-classical MCPyV-positive MCC cell line MCC13 [121]. All variants that had mutations compared to the consensus strain R17b (GenBank accession number HM011556) had a 10-50% lower basal early and late activity in both cell lines. However, the I strain described by Hashida et al. ([108]) had an approximately 30% higher early and late promoter activity and the early promoter of isolate MKL1, a MCC isolate [122], was approximately 40% stronger in the fibroblasts. The promoter activity of other variants has not been compared, nor has the effect of mutations on the viral life cycle and transforming potential of this oncovirus been examined.

HPyV 6 and HPyV7 NCCR Variants
Although HPyV6 and HPyV7 DNA is commonly present in the normal skin of healthy persons [10,123], HPyV6 and HPyV7 are associated with rash and pruritic skin eruption [124][125][126][127][128], HPyV7 DNA was found in 19/35 cholangiocarcinomas [129], while HPyV6 DNA has been detected in a few cases of keratoacanthomas, basal cell carcinomas, squamous cell carcinomas and trichoblastomas [130,131]. HPyV6 DNA was detected in 1/234 cerebrospinal fluid samples and 1/1016 serum samples of healthy blood donors [109,132]. HPyV6 DNA prevalence was much higher in tonsil brushing samples from immunocompetent children and adults than HPyV7 DNA (113/689 versus 6/689). HPyV6 and HPyV7 DNA prevalence and copy number were significantly higher in skin swabs collected from lesional and non-lesional skins of 86 Japanese patients with inflammatory skin diseases and mycosis fungoides compared with specimens from 149 healthy control individuals [133].
HPyV6 and HPyV7 were detected in 1/55 skin specimens from cutaneous T-cell lymphoma patients [29]. Despite the presence of HPyV6 and HPyV7 DNA in samples of various disorders, it remains to be established whether these viruses play a direct role in causing such skin conditions. Seventeen HPyV6 NCCR sequences are deposited in GenBank. Four of them are sequences obtained from HPyV6 DNA amplified in sewage (H6-cg-A2.f, B159.4, U43.1 and U43.3), six are from healthy skin (606b, 607a, 607b, 609a, 614a, and 627a), two are from bile samples (Bile-72 and Bile-81), and two are combined nose and throat samples from kidney transplant patients (QLD-49Br and QLD-61Br). One sample was obtained from pruritic skin lesion (UTSW6.1), one from a lymph node from a patient with an angiolymphoid hyperplasia with Kimura disease (LN1), and one from a nasopharyngeal aspirate of a child with respiratory tract infections (BJ376) (see Table S1 for details and references). Identical HPyV6 NCCRs were found in healthy skin, along with bile from patients with malignant biliary obstruction, combined nose and throat specimens from kidney transplant patients and a nasopharyngeal sample of a child with respiratory infection (Table S1). Two clinical samples (UTSW6.1 from pruritic skin and LN1 from the lymph node of a patient with Kimura disease) and the DNA amplified from sewage water had mutations compared to the reference strain. The mutation spectrum is shown in Figure 5 and Table 7. The UTSW6 isolate had a deletion of nucleotides 183-193 (CAAAGGTCAAA), a mutation of nucleotides 223-229 (except 228), and insertions of GGC and of TGGGCAGGGCCATTT distal of these point mutations. The 11 bp deletion removes binding motifs for AP1 and CREB, while the 15 bp insertion adds a putative SP1 and p53 binding site. Other putative binding sites are shown in Table 4 and Table S2. The point mutations affect an AT-rich region but no putative binding motifs are predicted in this sequence [98], which may affect viral replication, as this region is part of the predicted ori [134]. Based on the limited available HPyV6 NCCR sequences, no specific HPyV6 NCCR is associated with disease. The effect of mutations in the NCCR on the promoter activity and viral life cycle has not been tested.  Table 4 and Table S2. The point mutations affect an AT-rich region but no putative binding motifs are predicted in this sequence [98], which may affect viral replication, as this region is part of the predicted ori [134]. Based on the limited available HPyV6 NCCR sequences, no specific HPyV6 NCCR is associated with disease. The effect of mutations in the NCCR on the promoter activity and viral life cycle has not been tested. The numbering of the NCCR is from early to late with nucleotide 1 being the most proximal to the ATG start codon of the early genes and the most distal nucleotide, just upstream of the start codon of the VP2 gene. The number of times a peculiar mutation is found in the different variants is given as frequency (with n the number of times the mutation was described/total NCCR variant sequences available). Putative Large T antigen (LT) binding motifs 5′-GRGGC-3′ (→) or 5′-GCCYC-3′ (←) are shown. The table summarizes the mutations, their location in the NCCR, and their frequency. For details see Table S1. Putative transcription factor binding sites are shown in Supplementary Table S2.  The numbering of the NCCR is from early to late with nucleotide 1 being the most proximal to the ATG start codon of the early genes and the most distal nucleotide, just upstream of the start codon of the VP2 gene. The number of times a peculiar mutation is found in the different variants is given as frequency (with n the number of times the mutation was described/total NCCR variant sequences available). Putative Large T antigen (LT) binding motifs 5 -GRGGC-3 (→) or 5 -GCCYC-3 (←) are shown. The table summarizes the mutations, their location in the NCCR, and their frequency. For details see Table S1. Putative transcription factor binding sites are shown in Supplementary Table S2.  (Table S1 and references therein). The length of these NCCRs varies from 371 bp (PITT2 isolate) to 399 bp (PITT1 isolate). DNA of these two variants was isolated from the skin of lung transplant patients with a rash [124]. Five isolates from the same patient had an NCCR of 381 bp (BIO, MUQ, PLA1, PLA2, URI), five had a 383 bp NCCR (707a, 707b, 715b, 727a, UTSW7.1), two had a 385 bp NCCR (713a and 713b), and one had a NCCR of 387 bp (CRC01). No repeated sequences are present ( Table 2). The mutations in the different HPyV7 variants are concentrated in the central part of the NCCR ( Figure 6 and Table 8). The consensus is the nucleotide sequence that was present in the majority of the 15 available sequences, with the nucleotide numbering based on the HPyV7 reference strain R713a (GenBank accession number NC_014407=713). Most mutations are point mutations, whereas PITT1 also contains the insertion ACAGGATATGAT and PITT2 has a deletion removing nucleotides 150-161 (CTGGGTTACTGG). The insertion contains putative binding sites for the transcription factors ETS1, GATA1/2/3 and EVI1, whereas the deletion removes possible GATA2 and CDP binding motifs [98]. EVI1, CDP and GATA3 are expressed in the skin, while ETS1, GATA1, and GATA2 are not or weakly expressed in skin [100]. Putative binding sites for transcription factors in the HPyV7 NCCR are summarized in Table 4 and Table S2. The early promoter activity of the PITT1 and PITT2 variants was significantly higher than the activity of the reference strain in the colon adenocarcinoma cell line SW480, whereas a tendency to lower activity in human embryonal kidney HEK293 cells was observed [112]. The promoter activity was not examined in skin cells, although these variant were originally isolated from the skin [124]. Colon and kidney cells may not be authentic host cells because no HPyV7 LT expression was detected in 10 normal and 94 malignant colon samples, and in 10 normal and 65 renal cancers [135] and so far there are no reports of HPyV7 DNA in these organs. A transversion of A to T in the putative 5 -GAGGC-3 LT motif was reported (Figure 6), although the effect on viral replication has not been exploited.
Interestingly, the NCCR of the recently isolated QPyV DNA shows >80% identity with the HPyV7 NCCR ( Figure S3), while the complete genome is 81% identical with HPyV7 [18].
Viruses 2020, 12, x 17 of 30 numbering based on the HPyV7 reference strain R713a (GenBank accession number NC_014407=713). Most mutations are point mutations, whereas PITT1 also contains the insertion ACAGGATATGAT and PITT2 has a deletion removing nucleotides 150-161 (CTGGGTTACTGG). The insertion contains putative binding sites for the transcription factors ETS1, GATA1/2/3 and EVI1, whereas the deletion removes possible GATA2 and CDP binding motifs [98]. EVI1, CDP and GATA3 are expressed in the skin, while ETS1, GATA1, and GATA2 are not or weakly expressed in skin [100]. Putative binding sites for transcription factors in the HPyV7 NCCR are summarized in Table 4 and Table S2. The numbering of the NCCR is from early to late with nucleotide 1 being the most proximal to the ATG start codon of the early genes and the most distal nucleotide, just upstream of the start codon of the VP2 gene. The number of times a peculiar mutation is found in the different variants is given as frequency (with n the number of times the mutation was described/total NCCR variant sequences available). Putative Large T antigen (LT) binding motifs 5′-GRGGC-3′ (→) or 5′-GCCYC-3′ (←) are shown. The table summarizes the mutations, their location in the NCCR, and their frequency. For details see Table S1. Putative transcription factor binding sites are shown in Supplementary Table S2. The early promoter activity of the PITT1 and PITT2 variants was significantly higher than the activity of the reference strain in the colon adenocarcinoma cell line SW480, whereas a tendency to lower activity in human embryonal kidney HEK293 cells was observed [112]. The promoter activity was not examined in skin cells, although these variant were originally isolated from the skin [124]. Colon and kidney cells may not be authentic host cells because no HPyV7 LT expression was detected in 10 normal and 94 malignant colon samples, and in 10 normal and 65 renal cancers [135] and so far there are no reports of HPyV7 DNA in these organs. A transversion of A to T in the putative 5′-GAGGC-3′ LT motif was reported (Figure 6), although the effect on viral replication has not been exploited. The numbering of the NCCR is from early to late with nucleotide 1 being the most proximal to the ATG start codon of the early genes and the most distal nucleotide, just upstream of the start codon of the VP2 gene. The number of times a peculiar mutation is found in the different variants is given as frequency (with n the number of times the mutation was described/total NCCR variant sequences available). Putative Large T antigen (LT) binding motifs 5 -GRGGC-3 (→) or 5 -GCCYC-3 (←) are shown. The table summarizes the mutations, their location in the NCCR, and their frequency. For details see Table S1. Putative transcription factor binding sites are shown in Supplementary Table S2.

TSPyV NCCR Variants
Twenty-four TSPyV NCCR sequences are deposited in the GenBank (Table S1). Most samples are derived from skin spicules, but also a nasopharyngeal aspirate from a heart transplant patient, a heart from a myocarditis patient, and the CSF and serum of immunosuppressed patients contained TSPyV DNA. The NCCRs of the non-spicule isolates were identical or quasi identical with isolates from skin spicules. Most mutations are point mutations (Figure 7 and Table 9), but two skin spicule isolates (0602 and 1312) had deletions of 54 and 38 bp, respectively [136]. The relative promoter activity of these NCCR variants has not been examined, nor has the effect of mutations on the viral life cycle been investigated. The 39 bp deletion removes putative binding sites for AP1, SOX5, HNF3, OCT1, TATA/TBP, STAT, glucocorticoid receptor, retinoic acid receptor-related orphan receptor α, and CREB, whereas 54 bp deletion possesses possible binding sites for ARNT, AP1, AP2, AP4, CREB/ATF, CAAT, E2F, ELK, EVI1, GATA1/2/3, NHLH1, MYB, MYC, MYOD, NFκB, OCT1, PAX5, TST1, and USF [98]. While most of these factors are expressed in the skin, MYOD, PAX5, GATA1, and GATA2 seem to be absent in the skin [100]. However, the binding of these transcription factors and their possible role in regulating TSPyV transcription remain to be proven. The TSPyV NCCR contains several putative LT binding motifs, and mutations in some of them have been reported (Figure 7). Whether they have an effect on viral replication has not been tested. and CREB, whereas 54 bp deletion possesses possible binding sites for ARNT, AP1, AP2, AP4, CREB/ATF, CAAT, E2F, ELK, EVI1, GATA1/2/3, NHLH1, MYB, MYC, MYOD, NFκB, OCT1, PAX5, TST1, and USF [98]. While most of these factors are expressed in the skin, MYOD, PAX5, GATA1, and GATA2 seem to be absent in the skin [100]. However, the binding of these transcription factors and their possible role in regulating TSPyV transcription remain to be proven. The TSPyV NCCR contains several putative LT binding motifs, and mutations in some of them have been reported (Figure 7). Whether they have an effect on viral replication has not been tested. . The numbering of the NCCR is from early to late with nucleotide 1 being the most proximal to the ATG start codon of the early genes and the most distal nucleotide, just upstream of the start codon of the VP2 gene. The number of times a peculiar mutation is found in the different variants is given as frequency (with n the number of times the mutation was described/total NCCR variant sequences available). Putative Large T antigen (LT) binding motifs 5′-GRGGC-3′ (→) or 5′-GCCYC-3′ (←) are shown. The table summarizes the mutations, their location in the NCCR, and their frequency. For details see Table S1. Putative transcription factor binding sites are shown in Supplementary Table S2.   Table S1. Putative transcription factor binding sites are shown in Supplementary Table S2.

HPyV 9 NCCR Variants
HPyV9 was originally detected in the serum and urine from a renal transplant patient under immunosuppressive treatment [12]. Shortly after, HPyV9 DNA was isolated from the facial surface of a Merkel cell carcinoma patient and tentatively named Institute Pasteur polyomavirus (IPPyV) [137]. The genome of IPPyV only differs by two nucleotides from HPyV9, hence IPPyV is a variant of HPyV9. Yet, none of these mutations are within the NCCR (Table S1; [137]). The HPyV9 isolate M149 from tonsils has an identical NCCR sequence as the original HPyV9 isolate (GenBank accession MH844627). An HPyV9 isolate (UF-1 isolate) from the blood of an AIDS patients displays an eight base-pair deletion, a 13 base-pair insertion and 24 point mutations in its NCCR [138]. These NCCR rearrangements created putative SP1 binding sites in the late promoter. We compared the basal early and late promoter activity of the original HPyV9 strain and the UF-1 clinical isolate in the human cell lines BEL7402, C33A, HEK293, HeLa, SK-N-BE, SW480, and U2OS. We found that the UF-1 early promoter was stronger in all cell lines except in U2OS, and the UF-1 late promoter was stronger in all cell lines except in C33A and HeLa cells [139]. The effect of LT on early and late promoter activity was monitored in BEL7402, HEK293 and HeLa cells. Whereas the UF-1 late promoter activity was more potently stimulated than the HPyV9 late promoter by LT in all three cell lines tested, a stronger LT-induced activation of the UF-1 early promoter compared to the HPyV9 early promoter was only observed in HEK293 cells. The mutations in the UF-1 NCCR generate two putative SP1 binding sites in the distal part of the late promoter. Mutating these two SP1 sites did not have an effect on the basal early promoter activity, but increased basal late promoter 2-fold. Disruption of these SP1 sites had also no effect on LT-induced early promoter activity, but reduced late promoter activity 7-fold compared to non-mutated late UF-1 promoter activity. Our results showed that the promoter activity of the clinical isolate UF-1 is stronger and more potently induced by LT compared with the promoter of the original HPyV9 isolate. A later study confirmed that the UF-1 promoter was stronger than the promoter of the original isolated HPyV9 in HEK293 and the lung carcinoma A549 cells [112]. Whether the rearrangements in the UF-1 NCCR may affect the life cycle and possible pathogenic properties of the virus remains to be determined. Additional putative transcription factor binding sites are summarized in Table 4 and Table S2.

HPyV10 NCCR Variants
Twenty NCCR sequences are available in GenBank (Table S1), with the length ranging from to 430 to 442 bp. The original isolates MWPyV (NC_018102) and MA095 (JQ898291), both from feces [140], are identical, but contain an 11 bp deletion compared to the other variants (Table S1, Figure 8 and Table 10). The NCCRs of isolates ww10, TEDDY-01, QLDMW04 and QLDMW10 are identical, although they were derived from different specimens from different patients. The ww10 isolate was detected in a condyloma specimen from a patient with warts, hypogammaglobulinemia, infections, and myelokathexis (WHIM) syndrome [13], QLDMW04 and 010 are from respiratory samples [141], and TEDDY_01 is from feces (direct submission to GenBank; accession number KC549591). Point mutations are dispersed throughout the NCCR for the other isolates. The 11bp deletion (ATTGTTGGCAA) contains possible binding sites for CDP and SOX5 [98]. CDP is ubiquitously expressed, but SOX5 is enriched in testis [100]. Other possible transcription factors that may bind the HPyV10 NCCR are given in Table 4 and Table S2. It is not known whether HPyV10 is associated with a disease, and the biological consequence of NCCR mutations remains elusive.  The numbering of the NCCR is from early to late with nucleotide 1 being the most proximal to the ATG start codon of the early genes and the most distal nucleotide, just upstream of the start codon of the VP2 gene. The number of times a peculiar mutation is found in the different variants is given as frequency (with n the number of times the mutation was described/total NCCR variant sequences available). Putative Large T antigen (LT) binding motifs 5′-GRGGC-3′ (→) or 5′-GCCYC-3′ (←) are shown. The table summarizes the mutations, their location in the NCCR, and their frequency. For details see Table S1. Putative transcription factor binding sites are shown in Supplementary Table S2.  The numbering of the NCCR is from early to late with nucleotide 1 being the most proximal to the ATG start codon of the early genes and the most distal nucleotide, just upstream of the start codon of the VP2 gene. The number of times a peculiar mutation is found in the different variants is given as frequency (with n the number of times the mutation was described/total NCCR variant sequences available). Putative Large T antigen (LT) binding motifs 5 -GRGGC-3 (→) or 5 -GCCYC-3 (←) are shown. The table summarizes the mutations, their location in the NCCR, and their frequency. For details see Table S1. Putative transcription factor binding sites are shown in Supplementary Table S2.

STLPyV NCCR Variants
Sequences of 7 STLPyV NCCRs are available in GenBank (Table S1). These variants were discovered in feces, respiratory swab or peri-anal warts [14,26,27,142,143]. Point mutations and one bp deletions have been observed (Figure 9 and Table S1), although the biological implications on promoter activity and viral replication have not been examined, nor has their possible role in disease been defined. So far, no variants with mutations in the possible LT binding motifs have been isolated. Putative binding sites for transcription factors in the STLPyV, NCCR are shown in Table 4 and Table S2.

STLPyV NCCR Variants
Sequences of 7 STLPyV NCCRs are available in GenBank (Table S1). These variants were discovered in feces, respiratory swab or peri-anal warts [14,26,27,142,143]. Point mutations and one bp deletions have been observed (Figure 9 and Table S1), although the biological implications on promoter activity and viral replication have not been examined, nor has their possible role in disease been defined. So far, no variants with mutations in the possible LT binding motifs have been isolated. Putative binding sites for transcription factors in the STLPyV, NCCR are shown in Table 4 and Table  S2.

Figure 9.
Mutations and their prevalence in variants of Saint Louis polyomavirus (STLPyV) noncoding control region (NCCR). The numbering of the NCCR is from early to late with nucleotide 1 being the most proximal to the ATG start codon of the early genes and the most distal nucleotide, just upstream of the start codon of the VP2 gene. The number of times a peculiar mutation is found in the different variants is given as frequency (with n the number of times the mutation was described/total NCCR variant sequences available). Putative Large T antigen (LT) binding motifs 5′-GRGGC-3′ (→) or 5′-GCCYC-3′ (←) are shown. The table summarizes the mutations, their location in the NCCR, and their frequency. For details see Table S1. Putative transcription factor binding sites are shown in Supplementary Table S2.

HPyV12, NJPyV, LIPyV and QPyV Variants
The NCCR sequence of two HPyV12 isolates is known. One of them carries a 26 bp deletion in the distal part of the late promoter (nucleotides 297-322) (Table S1; [102]). This deletion reduces the early and late promoter activity in 10 different human cell lines tested, except for the early promoter activity in BEL7402 and HEK293 cells, in which a significantly higher activity was measured compared with the early promoter activity of the original HPyV12 isolate [102]. The deletion eliminates putative c-MYB, CREB and AP4 binding sites, but it is not known whether these transcription factors actually bind the HPyV12 promoter.
Only one human and one feline LIPyV isolate have been reported, and they differ by four-point mutations (Table S1). No NJPyV and QPyV NCCR variants have been described thus far. Putative . Mutations and their prevalence in variants of Saint Louis polyomavirus (STLPyV) noncoding control region (NCCR). The numbering of the NCCR is from early to late with nucleotide 1 being the most proximal to the ATG start codon of the early genes and the most distal nucleotide, just upstream of the start codon of the VP2 gene. The number of times a peculiar mutation is found in the different variants is given as frequency (with n the number of times the mutation was described/total NCCR variant sequences available). Putative Large T antigen (LT) binding motifs 5 -GRGGC-3 (→) or 5 -GCCYC-3 (←) are shown. The table summarizes the mutations, their location in the NCCR, and their frequency. For details see Table S1. Putative transcription factor binding sites are shown in Supplementary Table S2.

HPyV12, NJPyV, LIPyV and QPyV Variants
The NCCR sequence of two HPyV12 isolates is known. One of them carries a 26 bp deletion in the distal part of the late promoter (nucleotides 297-322) (Table S1; [102]). This deletion reduces the early and late promoter activity in 10 different human cell lines tested, except for the early promoter activity in BEL7402 and HEK293 cells, in which a significantly higher activity was measured compared with the early promoter activity of the original HPyV12 isolate [102]. The deletion eliminates putative c-MYB, CREB and AP4 binding sites, but it is not known whether these transcription factors actually bind the HPyV12 promoter.
Only one human and one feline LIPyV isolate have been reported, and they differ by four-point mutations (Table S1). No NJPyV and QPyV NCCR variants have been described thus far. Putative binding sites for transcription factors in the HPyV12, NJPyV and LIPyV NCCR are summarized in Table 4 and Table S2.

Conclusions
Similar to BKPyV and JCPyV, novel HPyV isolates with mutations in their NCCR are commonly detected in human samples. However, for the most recently described PyVs isolated from human specimens, none or very few isolates have been reported, and large deletions and/or duplication are lacking. Our knowledge of the effect of mutations in the NCCR on viral promoter activity and viral replication is incomplete because only a few studies have addressed the effect of NCCR mutations on the promoter activity, while the impact on viral replication has not been examined. Replication studies have been hampered by the lack of permissive cell systems for all novel HPyVs, except dermal fibroblasts which support productive MCPyV infection [144]. The HPyV NCCRs contain a plethora of binding motifs for host cell proteins, but their binding to the NCCR has not been confirmed. Chromatin immunoprecipitation assays at the early and late stages of infection may allow for the identification of transcription factors involved in early and late expression. Another unsolved question for most of the novel HPyVs is whether they are associated with specific diseases. Apart from MCPyV and its etiologic role in MCC, TSPyV as the causative agent of trichodysplasia spinulosa [9,[145][146][147], and HPyV6 and HPyV7 with pruritic skin eruption in immunocompromised patients [124][125][126][127][128], firm evidence for pathogenic properties of the other novel HPyVs is lacking. So, far no specific MCPyV, HPyV6, HPyV7, and TSPyV NCCR variants seem to be associated with disease because virus variants with (quasi) identical NCCRs were also detected in samples from healthy individuals. Studies on different patient groups are required to unveil possible novel HPyV-associated diseases, as more NCCR sequences from larger and different patient cohorts are required to establish a possible connection between the genetic diversity of the NCCR and disease. The biological consequences of NCCR mutations for the viral life cycle warrants further investigation.

Supplementary Materials:
The following are available online at http://www.mdpi.com/1999-4915/12/12/1406/s1, Table S1: mutations in the NCCRs of the novel HPyV. Table S2: Putative transcription factor binding sites in the NCCR of the novel HPyVs. Figure S1: NCCR sequence of the novel HPyVs. Figure S2: alignment of the NCCR from the novel HPyVs. Figure S3: Alignment of the HPyV7 and QPyV NCCRs.
Author Contributions: Conceptualization, V.P., C.P. and U.M.; writing-original draft preparation, V.P., C.P. and U.M.; writing-review and editing, V.P., C.P. and U.M. All authors have read and agreed to the published version of the manuscript.
Funding: The APC was funded by UiT, The Arctic University of Norway. C.P. was supported by the Italian Ministry of Health (starting Grant: SG-2018-12366194).

Conflicts of Interest:
The authors declare no conflict of interest.