Merkel Cell Polyomavirus in Merkel Cell Carcinoma: Integration Sites and Involvement of the KMT2D Tumor Suppressor Gene

Merkel cell carcinoma (MCC) is an uncommon, lethal cancer of the skin caused by either Merkel cell polyomavirus (MCPyV) or UV-linked mutations. MCPyV is found integrated into MCC tumor genomes, accompanied by truncation mutations that render the MCPyV large T antigen replication incompetent. We used the open access HPV Detector/Cancer-virus Detector tool to determine MCPyV integration sites in whole-exome sequencing data from five MCC cases, thereby adding to the limited published MCPyV integration site junction data. We also systematically reviewed published data on integration for MCPyV in the human genome, presenting a collation of 123 MCC cases and their linked chromosomal sites. We confirmed that there were no highly recurrent specific sites of integration. We found that chromosome 5 was most frequently involved in MCPyV integration and that integration sites were significantly enriched for genes with binding sites for oncogenic transcription factors such as LEF1 and ZEB1, suggesting the possibility of increased open chromatin in these gene sets. Additionally, in one case we found, for the first time, integration involving the tumor suppressor gene KMT2D, adding to previous reports of rare MCPyV integration into host tumor suppressor genes in MCC.


Introduction
Merkel cell carcinoma (MCC) is an aggressive skin cancer (five year overall survival rate of about 44% and disease-specific mortality of 33-46% [1,2]), predominantly arising in the elderly and immunocompromised. MCC is associated with Merkel cell polyomavirus (MCPyV) in the majority of cases [2,3]. MCPyV was discovered in 2008 by Feng et al. and found to be integrated into the MCC tumor genome in a mutated, replication-deficient form [3]. MCPyV is a small (5 kb), non-enveloped, circular dsDNA polyomavirus that is commonly found on human skin [2][3][4]. The MCPyV early  Long-read third generation NGS, Nanochannel and Nanopore sequencing Multiple 9 chr3, 4, 5 (4), 9,11,13,20 LT (5), NCCR (2), VP1 (8)  Although most tumors demonstrate a single, unique genomic integration site, rare exceptions have been described. Starrett et al., 2020, found that two independent tumors had overlapping (although non-identical) integration sites on chromosome 1, representing a rare example of a recurrent integration site. An additional case had three distinct integration sites on different chromosomes within a single tumor: one full copy of the viral genome, one smaller segment predicted to retain oncogenic activity (containing NCCR and T antigen coding regions), and one smaller segment predicted to lack oncogenic activity [9].
Patterns by which the circular MCPyV viral genome linearizes and integrates have been defined by several studies (Table 1) [8,9]. For both MCPyV-and HPV-mediated tumors, integration may be followed by the formation of a transiently circular DNA intermediate containing the viral genome and flanking regions of the host genome which can be further amplified through aberrant firing of the viral origin of replication. This piece then reintegrates into the host genome, appearing as amplified regions of the host genome in a tandem head-to-tail conformation interspersed with the viral genome [8,9].
Although the breakpoint in the MCPyV genome is highly variable, most studies describe that junctions are predominantly located in exon 2 of the LT gene after the pRb binding site ( Table 2). This could result in a truncated, replication-deficient LT protein equivalent to tumor-specific LT truncating mutations.
More comprehensive characterization of MCPyV integration locations, and thereby preferences, will advance the understanding of virus-mediated tumorigenesis and may reveal therapeutic vulnerabilities. Our study aimed at reviewing published MCPyV integration site data and expanding this repertoire of known MCPyV integration sites. We adapted a bioinformatics tool designed for HPV integration site detection (HPV Detector/Cancer-virus Detector [22]) to detect MCPyV from previously described MCC whole-exome sequencing datasets from patient tumor samples classified as MCPyV-positive by quantitative PCR [7]. Our results identify additional specific viral and host junction sites, including integration into a tumor suppressor gene frequently mutated in MCPyV-negative MCC (KMT2D). Furthermore, we demonstrate the successful use of HPV Detector/Cancer-virus Detector, a simple open-access analysis tool with potential utility in research and clinical testing, for detection of MCPyV integration in standard whole-exome sequencing datasets.

MCPyV Integration Analysis
MCC whole-exome datasets were taken from a previously published study by Harms et al., Cancer Research, 2015 [7]. The sequencing approach used by this study is briefly summarized here. Exome libraries were generated using an Illumina TruSeq DNA Sample Prep Kit, and paired-end libraries were sequenced with the Illumina HiSeq 2000. Reads that passed the Illumina BaseCall software chastity filter were used for subsequent analysis. Sequence and alignment quality were assessed by FastQC and the Picard package, respectively. More details can be found in the supplement of Harms et al., Cancer Research, 2015 [7].
The HPV Detector used a computational subtraction approach to identify HPV DNA traces from NGS datasets, as published earlier [22]. We modified the HPV Detector to align NGS data to human and MCPyV genomes. In the downstream analysis, we subtracted all NGS reads that were mapped to only the human genome. The remaining reads were either mapped to the MCPyV genome completely or were split reads with parts mapped to the human genome and other parts mapped to the MCPyV genome. We identified all split reads and annotated them using nearby human and MCPyV genome regions through BEDTools [23]. The source code can be accessed via GitHub (https://github.com/pratikchandrani). The sequencing datasets generated previously in Harms et al., 2015 [7], were used for this study. The integration site junctions/genes were further subjected to pathway analysis using GSEA-MSigDB [24] under the default parameters.

PCR
Primers were designed at the flanking regions (one primer on the human side and another on the viral side) to amplify the integration breakpoints. PCR was performed on genomic DNA from the MCC tumor samples using Platinum Taq DNA Polymerase (catalog # 10966018, Invitrogen, MI, USA), DMSO spike, and 20ng of DNA. Water was used as a negative control.

MCPyV Integration Site Meta-Analysis
Based on the closest chromosome band for each integration junction on the human genome, the coordinates were taken from UCSC Browser, GRCh38/hg38. The chromosome and start positions were then plotted by Circo plot to represent the integration sites. COSMIC v91 data were downloaded from the Sanger Institute website (https://cancer.sanger.ac.uk/census) [25], and the Human Fragile Sites dataset was downloaded from the HumCFS database (https://webs.iiitd.edu.in/raghava/humcfs/) [26]. The Circo plot was drawn using Circa Software (OMGenomics http://omgenomics.com/circa/). Of the 123 sites for which the chromosome number was available, 96 (including five from our study) reported information regarding location on the chromosome. Using chromosome band information, a Circo plot was drawn for these 96 sites. Dots on the bars represent the location of genes in which integration sites were found.

Integration Site Detection from Exome Sequencing Data
Of the seven MCC whole-exome sequencing datasets previously published by Harms et al. [7], we predicted viral-host junctions in five (including one primary tumor and four metastases). Similar to previous studies, we found that the integration site was highly variable and at either interchromosomal locations or in introns [8,12,20]. Three of these five sites were found in intergenic regions in chromosomes 4, 7, and 16, whereas for two tumors the sites were found in introns of two genes: KMT2D (intron 34) on chromosome 12 and DCAF7 (intron 3) on chromosome 17 (Table 1, Table 3 and Figure 1A). KMT2D is a lysine methyltransferase and tumor suppressor gene. DCAF7 (DDB1 and CUL4 Associated Factor 7) encodes a scaffold protein for kinase signaling. Unlike findings from a previous report [12], we did not observe a trend toward involvement of sensory genes.
Recent sequencing studies have shown that chromatin remodeling and histone modulators are frequently altered in cancers, with implications for tumor biology. KMT2D (lysine (K)-specific methyltransferase 2D), also known as MLL2 (myeloid/lymphoid or mixed-lineage leukemia 2, also known as ALR/MLL4), is one such histone methyltransferase that plays an important role in regulating gene transcription [27]. In particular, it targets lysine 4 of histone H3 (H3K4) whose methylations serve as a gene activation mark. KMT2D is essential for maintaining the level of global H3K4 monomethylation, with its enzymatic SET domain being directly responsible for this function. Furthermore, most KMT2D binding sites are located in regions of potential enhancer elements. KMT2D has [27] emerged as a frequently mutated gene in about 8.5% of all cancers [28] and has been proposed to play either a tumor suppressor or oncogenic role, depending on cancer type. In MCC, KMT2D predominantly demonstrates inactivating mutations, thus supporting its tumor suppressor role [29].
To determine the integration junction for the viral genome, we mapped the sequences to MCPyV DysKA isolate [30] (GenBank ID # KX781279.1). The sites we found differed from the previously reported pattern in which integration junctions predominantly involved the LT 2nd exon. In our cases we found two integration sites in the NCCR region, two in the VP1 region, and only one in LT 2nd exon ( Table 1, Table 3 and Figure 1B). Intriguingly, the sequence we found for LT at the junction is preceded by a TAA sequence in the MCPyV genome, indicating that this integration could also be associated with the generation of a premature LT stop codon, similar to findings by Schrama et al. [18].
Several cases showed integration sites in the NCCR region (of these, our study reports two of the total eight), VP1 (our study reports two of the total 23), and VP2 regions (total of 10 reported). The integration junctions may be disruptive or non-disruptive in gene expression and function, depending on the frame of fused product, mechanism of the integration, number of viral copies integrated, etc. Based upon the previous description of our cases, six of seven tumors had detectable LT protein expression by immunohistochemistry, and six of six tumors with RNASeq data had detectable LT transcript expression (Harms et al., 2015, Supplemental Material) [7]. Therefore, all integration sites were permissive for LT expression, even those involving the NCCR. Of note, our approach is not designed to evaluate effects of integration on LT (especially relevant for integration sites that truncate the protein).
By PCR and using primers flanking the virus-host junctions, followed by Sanger sequencing of the resulting PCR amplicon, we validated the integration sites predicted for three of these cases ( Figure 1C). The remaining cases lacked adequate remaining DNA for PCR.

Correlation with Previously Reported MCPyV Integration Sites
We evaluated our 5 integration site results in the context of 91 other previously reported MCPyV integration sites into the human genome by using Circo plot [31] overlaid with COSMIC gene datasets and fragile sites [26]. Chromosome 5 stands out with the highest (21) MCC cases having integrated MCPyV. There are no integration sites in chromosomes 21, 22, and X. Only five genes of the many associated with MCPyV integration, namely SF3B1, TLX3, CD74, MYC, and KMT2D, are cancer-linked genes listed in the COSMIC database [25]. Additional genes with potential tumor suppressor function that have been shown to be involved in MCPyV integration include PTPRG, GRWD1, and XRCC4 [3,19,21]. Hence, in a minority of MCPyV-positive MCC tumors, MCPyV integration itself may contribute to oncogenesis in a variable and case-specific manner via disruption of cancer genes. Additionally, although there is some overlay with fragile sites as previously reported, this effect did not appear significant. Furthermore, the integration junctions in the human genome were observed to be enriched 91in genes having motifs/binding sites of known oncogenic

Correlation with Previously Reported MCPyV Integration Sites
We evaluated our 5 integration site results in the context of 91 other previously reported MCPyV integration sites into the human genome by using Circo plot [31] overlaid with COSMIC gene datasets and fragile sites [26]. Chromosome 5 stands out with the highest (21) MCC cases having integrated MCPyV. There are no integration sites in chromosomes 21, 22, and X. Only five genes of the many associated with MCPyV integration, namely SF3B1, TLX3, CD74, MYC, and KMT2D, are cancer-linked genes listed in the COSMIC database [25]. Additional genes with potential tumor suppressor function that have been shown to be involved in MCPyV integration include PTPRG, GRWD1, and XRCC4 [3,19,21]. Hence, in a minority of MCPyV-positive MCC tumors, MCPyV integration itself may contribute to oncogenesis in a variable and case-specific manner via disruption of cancer genes. Additionally, although there is some overlay with fragile sites as previously reported, this effect did not appear significant. Furthermore, the integration junctions in the human genome were observed to be enriched 91in genes having motifs/binding sites of known oncogenic transcription factors such as LEF1 [32], MYB [33], and AREB6 (ZEB1) [34] along with other pathways (Table S2). We speculate this integration pattern might be related to an increased probability of open chromatin for these gene sets (Figure 2).
For the integration junctions in the viral genome (89 in total including this study), we observed the previously reported bias to the T antigen region, with 42 cases displaying junctions in LT, and 4 in sT ( Table 2).
Viruses 2020, 12, x FOR PEER REVIEW 8 of 12 For the integration junctions in the viral genome (89 in total including this study), we observed the previously reported bias to the T antigen region, with 42 cases displaying junctions in LT, and 4 in sT (Table 2). The outermost layers represent the chromosomes (color coded) and their respective chromosome bands. The grey circle marks the human fragile sites from the HumCFS database, and the next concentric circle marks the COSMIC database gene sites. The black bars mark 91 integration locations of MCPYV in MCC from previous reports. Dots on the bars represent the human genes associated with this integration event, but their position on the bar is arbitrary. The red bars and dots are data from five MCC cases analyzed using the HPV Detector in this study. The Circo plot was drawn using Circa Software (OMGenomics http://omgenomics.com/circa/).

Limitations
We were limited by the amount and quality of available MCC tumor DNA for PCR-Sanger validation, as the majority was used for next generation sequencing. Similarly, previously described inverse PCR approaches for demonstrating viral genome concatemerization [18] were not possible due to limited material. In addition, our approach relies upon identification of off-target capture or shoulder reads of viral sequences within human whole-exome sequencing data, thus generating relatively less sequencing depth than a sequencing approach specifically targeting MCPyV. We also cannot exclude the possibility that the lack of detected integration sites in 2/7 tumors may represent a technical limitation of our approach rather than true absence of viral integration. The grey circle marks the human fragile sites from the HumCFS database, and the next concentric circle marks the COSMIC database gene sites. The black bars mark 91 integration locations of MCPYV in MCC from previous reports. Dots on the bars represent the human genes associated with this integration event, but their position on the bar is arbitrary. The red bars and dots are data from five MCC cases analyzed using the HPV Detector in this study. The Circo plot was drawn using Circa Software (OMGenomics http://omgenomics.com/circa/).

Limitations
We were limited by the amount and quality of available MCC tumor DNA for PCR-Sanger validation, as the majority was used for next generation sequencing. Similarly, previously described inverse PCR approaches for demonstrating viral genome concatemerization [18] were not possible due to limited material. In addition, our approach relies upon identification of off-target capture or shoulder reads of viral sequences within human whole-exome sequencing data, thus generating relatively less sequencing depth than a sequencing approach specifically targeting MCPyV. We also cannot exclude the possibility that the lack of detected integration sites in 2/7 tumors may represent a technical limitation of our approach rather than true absence of viral integration.

Conclusions
Our analysis adds to the repertoire of MCPyV integration junction data and systematically collates current information regarding the same. We show unique genetic sites of integration for both the virus and host, confirming the highly variable nature of MCPyV integration. For the first time, we demonstrate integration involving the tumor suppressor gene KMT2D, representing the potential inactivation of a tumor suppressor gene also recurrently inactivated in MCPyV-negative MCC. We also demonstrate that data from tumor whole-exome sequencing data can be effectively interrogated by the simple and open-access Cancervirus Detector tool to demonstrate viral integration sites, thus representing a powerful and accessible new approach for analyzing existing sequencing datasets.