Genome-Wide Tiling Array Analysis of HPV-Induced Warts Reveals Aberrant Methylation of Protein-Coding and Non-Coding Regions

The human papillomaviruses (HPV) are a group of double-stranded DNA viruses that exhibit an exclusive tropism for squamous epithelia. HPV can either be low- or high-risk depending on its ability to cause benign lesions or cancer, respectively. Unsurprisingly, the majority of epigenetic research has focused on the high-risk HPV types, neglecting the low-risk types in the process. Therefore, the main objective of this study is to better understand the epigenetics of wart formation by investigating the differences in methylation between HPV-induced cutaneous warts and normal skin. A number of clear and very significant differences in methylation patterns were found between cutaneous warts and normal skin. Around 55% of the top-ranking 100 differentially methylated genes in warts were protein coding, including the EXOC4, KCNU, RTN1, LGI1, IRF2, and NRG1 genes. Additionally, non-coding RNA genes, such as the AZIN1-AS1, LINC02008, and MGC27382 genes, constituted 11% of the top-ranking 100 differentially methylated genes. Warts exhibited a unique pattern of methylation that is a possible explanation for their transient nature. Since the genetics of cutaneous wart formation are not completely known, the findings of the present study could contribute to a better understanding of how HPV infection modulates host methylation to give rise to warts in the skin.


Introduction
Human papillomaviruses (HPV) are double-stranded DNA viruses that exclusively infect the squamous epithelial layers of the mucosa and skin [1]. HPV can be transmitted between individuals via direct contact as well as through fomites, and HPV infection is considered to be the most common sexually transmitted disease on a global scale [2]. Moreover, a number of communicable as well as non-communicable diseases have been associated with HPV infection, including various types of warts and cancers [3]. Hundreds of HPV types have been identified and classified as either low-risk or high-risk depending on their likelihood upon infection to cause malignant or benign symptoms,

Infinium MethylationEPIC BeadChip
At the AGRF, samples were subject to further quality control measures through assessment on the QuantiFluor ® dsDNA System (Promega, Gorman, NC, USA) and via 0.8% agarose gel electrophoresis. The 24 samples were then individually made up to approximately 500 ng of DNA in 45 µL, after which they were bisulfite converted by the Zymo EZ DNA Methylation kit (Zymo Research, Irvine, CA, USA). Finally, the samples were individually inputted into the Infinium MethylationEPIC BeadChip microarray (Illumina, San Diego, CA, USA) for interrogation of over 850,000 CpG sites.

Data Processing
A computational R package (RnBeads) was adapted to process and analyze the raw intensity data from the methylation chip in the form of IDAT files [29]. Quality control, preprocessing, batch effects adjustment, and normalization were carried out on all probes and samples according to the RnBeads package pipeline.

Differential Methylation and Statistical Analysis
Differential methylation (DM) analysis was carried out based on tiling region, the latter of which is a genomic region of 5000 bp length. At the genome-tiling level, the mean of the mean β values (mean.mean β) of all the tested probes which span specified genomic regions were computed. The distribution of the number of CpG sites per genome tiling region can be seen in Figure 1A. Additionally, Figure 1B shows the distributions of CpG sites across genome-tiling regions. DM for each genome-tiling region was calculated using three measures: the mean.mean β difference between warts (W) and normal skin (NS); the log2 of the mean quotient in β means across all CpG sites in a genome-tiling region; and the adjusted combined p-value of all CpG sites in the genome-tiling region using a limma statistical test [30]. The Benjamini and Hochberg (B-H) 5% false discovery rate (FDR) was used to correct for multiple testing. Additionally, these three measures were used to give each genome-tiling region a combined rank, which was computed as the maximum (=worst) rank among the three ranks. Regions that exhibit more DM will have a smaller combined rank [29]. Genome-tiling regions were sorted from smallest to largest using the combined ranking score, then the best 1000 ranking regions were selected for further analysis. In Panel B, the relative coordinates of 0 and 1 correlate to the start and end coordinates of the genomic tiling region. Those coordinates that are smaller than 0 and larger than 1 indicate flanking regions normalized by region length.

Locus Overlap Analysis (LOLA) for Enrichment of Genomic Ranges
LOLA was carried out for the top 1000 genome-tiling regions, the latter of which were selected based on combined ranking score [31]. The following LOLA reference databases were utilized in the current analysis: cistrome_cistrome, cistrome_epigenome, codex, encode_segmentation, encode_tfbs, sheffield_dnase, ucsc_features. LOLA uses a Fisher's exact test to assess the significance of the overlap.

Signaling Pathway Analysis
A signaling network of the genes located in the top 100 most DM genome-tiling regions was created via the Signaling Network Open Resource 2.0 (Signor) [32]. The type of relation was selected to include 'all' interactions with a relaxed layout and a score of '0.0'.

Differential Methylation of Genome-Tiling Regions
252,698 genome-tiling regions passed the quality control and pre-processing procedures. Notable differences were seen during the assessment of the methylation level (β) distributions for the genome-tiling regions in wart and normal skin samples ( Figure 2). The list of DM genome tiling in warts was limited to the top-ranking 1000 tiling regions using the combined ranking score. Using this scoring method, 772 and 228 tiling regions were found to be hypomethylated and hypermethylated, respectively, in warts compared to normal skin ( Figure 3A). Of the 772 hypomethylated tiling regions, the β difference ranged between −0.192 to −0.543, but it ranged between 0.192 and 0.472 in the 228 hypermethylated regions. The log 2 of the quotient in methylation between warts and normal skin had a maximum value of 2.275 and a minimum value of −2.44 ( Figure 3B). The 100 genome-tiling regions with the lowest combined rank scores are presented in Table 1 alongside their gene names.  In panel A, the mean of mean methylation levels (β) for warts (W) is on the y-axis, while the mean of mean methylation levels (β) for normal skin (NS) is on the x-axis. β values range from unmethylated (0) to methylated (1). In panel B, the volcano plot shows the differential methylation of genomic tiling regions as quantified by log 2 of the mean quotient in means across all sites in a region on the x-axis and the adjusted combined p-value on the y-axis between warts (W) and normal skin (NS). The color scale corresponds with the combined rank of each genomic tiling region.

Clustering of Samples
Samples showed expected clustering based on all methylation values of the top 1000 most variable loci ( Figure 4). Samples sharing similar methylation patterns or phenotypes tended to cluster together. In addition, the dataset was subject to a dimension reduction test using multi-dimensional scaling (MDS) in order to inspect for a strong signal in the methylation values of the samples ( Figure 5). MDS confirmed that our analysis was dominated by differences between the wart and normal skin samples.

Locus Overlap Analysis (LOLA) for Enrichment of Genomic Ranges
The overall observation of enrichment and overlap across the LOLA reference databases for the hypermethylated and hypomethylated tiling regions are shown in Figures 6 and 7, respectively, while the details of the significantly enriched terms are shown in Figures 8 and 9, respectively. The top-ranking 1000 hypermethylated tiling regions show strong association with the Sheffield_dnase and encode_tfbs databases as indicated by the large odds ratio values. In addition to the strong association with the top tiling regions, encode_tfbs also exhibits a higher statistical significance overlap. From the encode_tfbs database, the c-Fos, STAT3, and c-Myc were among the most enriched terms. Using the Sheffield_dnase database, these tiling regions were predicted to be enriched in several cell and tissue types, including fibroblast cells, epithelial cells, muscle tissue, and skin tissue.   On the other hand, the top-ranking 1000 hypomethylated tiling regions show a strong association with Sheffield_dnase, ucsc_features, and codex indicated by the large odds ratio values, while cistrome_epigenome, encode_tfbe, and encode_segmentation exhibit statistical significance overlap with these tiling regions as indicated by the higher q-value. The most enriched terms include the Dnase weak-normal human dermal fibroblast (NHDF) cell line, Dnase-fibrobalsts, Weak Enhancer-human umbilical vein endothelial cells (HUVEC), nuclear receptor subfamily 2 group F member 2 (NR2F2)-Endometrial Stromal Cell, and monomethylation of Histone H3 at lysine 4 (H3K4me1)-lymph node carcinoma of the prostate (LNCaP) cell line.

Pathway Analysis
Signaling network analysis of the genes located in the top 100 DM genome-tiling regions illustrated that two genes, ATF2 and HDAC2, were found to be common regulators of the gene network with a minimum of 12 connectivities each ( Figure 10).

Discussion
Due to their benign nature, low-risk HPV and the cutaneous warts they induce have not been the subject of the same research attention and focus as their high-risk counterparts. In this study, a tiling array was carried out on paired samples of normal skin and warts from 12 Arab males. Methylation profiles were found to significantly differ between the cases and controls, and the top-ranking differentially methylated (DM) genes were mostly protein-coding (55%) and non-coding RNA (ncRNA) (11%) genes.

Aberrant Methylation of Protein-coding Genes
Of the top-ranking 100 DM genes in warts compared to normal skin, 55 were protein-coding genes. The exocyst complex component 4 (EXOC4) gene was the most DM protein-coding gene in warts. Not much is known about EXOC4 except that adjacent polymorphisms were associated with Type 2 diabetes [33]. However, EXOC4 encodes a component of the exocyst complex, the latter of which is posited to be involved in viral protein transfer between cells [34]. As part of its infection process, HPV relies heavily on membrane-bound transport vesicles to deliver the viral material from the extracellular matrix to the host cell's nucleus [35][36][37]. The second most DM protein-coding gene is the potassium calcium-activated channel subfamily U member 1 (KCNU1) gene, which is a sperm-specific potassium channel that is essential for male fertility [38]. KCNU1 might be important to HPV biology due to the fact that potassium channels are involved in cell proliferation and apoptosis, among other cellular processes [39]. The reticulon 1 (RTN1) gene was the third most DM protein-coding gene in warts, with previous reports showing that RTN1 deficiency and isoforms were associated with senile plaque formation and kidney disease progression, respectively [40,41]. In the context of viral infection, deletion of RTN1 in yeast cells led to a significant inhibition of viral replication [42].
Interestingly, several DM protein-coding genes, namely the LGI1, IRF2, and NRG1 genes, have been implicated in squamous cell carcinoma, which involves the same keratinocyte layer that is exclusively hijacked by HPV infection. The leucine-rich glioma inactivated 1 (LGI1) gene is a putative suppressor of metastasis and, when downregulated, was found to stimulate esophageal squamous cell carcinoma metastasis [43]. Similarly, the interferon regulatory factor 2 (IRF2) gene, was reported to increase the tumorigenicity of esophageal squamous cell carcinoma when overexpressed [44]. Lastly, the neuregulin 1 (NRG1) gene, a cell adhesion molecule, was previously reported to be upregulated in oral squamous cell carcinoma cells [45]. It is important to understand the methylation profiles of benign warts so as to improve the understanding of the effects of low-risk HPV infection on host methylation, the latter of which has been extensively explored in the context of high-risk HPV infection.
Furthermore, various DM protein-coding genes in warts have been previously associated with cancers other than squamous cell carcinoma. Encoding an argonaute family protein, the piwi-like protein 4 (PIWIL4) gene was reported to be highly expressed in breast cancer cells, and its knockdown was found to lessen leukemic growth [46,47]. Moreover, increased expression of the protein tyrosine phosphatase (PTPRA) gene and promoter methylation of the calcium-binding protein 39-like (CAB39L) gene were associated with gastric cancer [48,49]. In addition, the abundant expression of the solute carrier family 22 member 16 (SLC22A16) gene, a carnitine transporter, has been reported in ovarian carcinoma cell lines, and its upregulation helped induce melanoma cell death when combined with chemotherapy [50,51]. Likewise, the dimethylarginine dimethylaminohydrolase 1 (DDAH1) gene, which plays an integral role in methylarginine removal, was found to be frequently upregulated in prostate cancer [52], downregulated in gastric cancer [53], and inhibited in attenuated triple negative breast cancer cells [54]. In contrast, high expression of the Kelch-like family member 7 (KLHL7) gene, which encodes for a mediator of ubiquitination, is associated with aggressive breast cancer progression [55]. The fact that some protein-coding genes are DM in both warts and various HPV-associated cancers could potentially suggest that the extent of methylation contributes to whether the phenotype is malignant or benign.

Aberrant Methylation of Non-Coding Genes
Non-coding RNAs (ncRNAs) are non-protein coding RNA molecules that generally do not possess a known biological function, although a small minority have been identified as having important functional roles [56]. Dysregulated ncRNA expression patterns have also been implicated in HPV-associated cancers caused by high-risk HPV infection [57]. 11 of the top-ranking 100 DM genes in warts were non-coding RNAs (ncRNAs), including the most DM gene in warts, AZIN1-AS1. Very little is known about the AZIN1 antisense RNA 1 (AZIN1-AS1) gene both in terms of its function and disease associations. Likewise, the second and third most DM ncRNAs in warts were the long intergenic non-protein coding RNA 2008 (LINC02008) and the uncharacterized MGC27382 (MGC27382) genes. The MGC27382 gene was previously reported to be a part of an endogenous RNA network that could serve as a prognostic biomarker for lung squamous cell carcinoma [58]. Moreover, MGC27382 was found to be significantly upregulated in colorectal cancer but downregulated in lung adenocarcinoma [59,60]. In addition, the long intergenic non-protein coding RNA 2241 (LINC02241) gene was the fourth most DM ncRNA in warts and was previously associated with hepatocellular and colorectal carcinomas [61,62]. The fifth most DM ncRNA was the FER1L6 antisense RNA 1 (FER1L6-AS1) gene, which was found to be dysregulated in esophageal squamous cell carcinoma [63].

Genomic Hypermethylation
Hypermethylation of DNA has been associated with transcriptional silencing of the affected genes. From among ENCODE's transcription factor binding sites, the c-Fos, STAT3, and c-Myc genes were the most hypermethylated in warts (Figure 7). The proto-oncogene c-Fos is involved in cell differentiation and proliferation, and its expression is required for skin tumors to become malignant [64]. Moreover, the induction of c-Fos expression promotes inflammation of the skin that, in turn, mediates the development of preneoplastic lesions [65]. On the other hand, c-Fos expression was found to decrease keratinocyte growth by increasing sensitivity to apoptosis, but this state was reversed upon the addition of c-Jun [66]. In high-risk HPV infection, skin tumorigenesis was found to be critically dependent on E2-induced c-Fos expression [67].
Similarly, the signal transducer and activator of transcription 3 (STAT3) gene is a transcription factor that is essential for cell apoptosis and growth [68]. In the skin, STAT3 has a dual role in maintaining homeostasis and wound-healing as well as promoting carcinogenesis and psoriasis when aberrantly expressed [69,70]. During periods of bacterial and viral infection, STAT3 creates an inflammatory microenvironment that induces carcinogenesis [71]. STAT3 activation is a common mechanism of herpesvirus pathogenesis, especially in the context of the varicella-zoster virus [72]. Additionally, autocrine STAT3 activation is an integral part of HPV-mediated cervical cancer, and loss of STAT3 expression is detrimental to high-risk HPV infection of keratinocytes [73,74].
Lastly, the c-Myc gene is an oncogenic transcription factor that regulates the expression of 15% of the human genome and is dysregulated in the majority of human cancers [75]. In the mammalian epidermis, c-Myc overexpression helps stimulate keratinocyte proliferation, while c-Myc knockdown results in the latter's inhibition [76]. Alongside the SIN3 transcription regulator family member A (SIN3A) gene, c-Myc helps maintain tissue homeostasis in the skin, but epidermal cells deficient in c-Myc were found to be resistant to Ras-mediated tumorigenesis [77,78]. Furthermore, c-Myc gene amplification was highly associated with infection by oncogenic high-risk HPV types in cervical carcinomas [79].

Genomic Hypomethylation
In contrast to hypermethylation, hypomethylation of genomic DNA often leads to the activation of the affected genes. In warts, the AR, NR2F2, and AFF1 genes were among the most significantly hypomethylated in warts compared to normal skin ( Figure 9). The androgen receptor (AR) gene, which is a nuclear receptor activated by androgenic hormones, normally functions as a DNA-binding transcription factor that is important in the development of male sexual characteristics [80]. Moreover, AR upregulation has been reported to suppress wound healing and increase inflammation in a murine model, and its dysregulation is involved in a number of different skin pathologies [81]. Loss of AR expression was reported to be a common event in intraepithelial neoplasia and invasive squamous cell carcinoma associated with high-risk HPV infection of the cervix [82].
Likewise, the nuclear receptor subfamily 2 (NR2F2) and the AF4/FMR2 family member 1 (AFF1) genes encode for nuclear transcription factors that play a critical role in vascular development and osteogenic differentiation, respectively [83,84]. NR2F2 inhibition by miR-302 contributes to somatic cell pluripotency, and its expression plays an important role in the chondrogenesis of mesenchymal stem cells [85,86]. In addition, the AFF1 gene was found to be downregulated in melanoma tissue and dysregulated in acute lymphoblastic leukemia [87,88]. The involvement of the NR2F2 and AFF1 genes in HPV infection is still not clear.

Genes Involved in the Signaling Network Pathway
Two genes, ATF2 and HDAC2, were found to be common regulators of the top-ranking 100 most DM genes in warts ( Figure 10). Also known as cyclic AMP response element binding protein 2 (CREB2), the activating transcription factor 2 (ATF2) gene was found to be a common regulator of the DM gene network in HPV-induced warts. ATF2 encodes a leucine zipper transcription factor that can also act as a histone acetyltransferase, and it has been implicated in various malignant skin diseases [89]. In fact, ATF2 overexpression was necessary for tumor growth and progression in murine skin and for melanoma metastasis in human skin [90,91]. On a similar note, the histone deacetylase 2 (HDAC2) gene, which encodes for an enzyme that deacetylates lysine residues situated within core histone N-terminal regions, plays an essential role in epidermal development [92]. HDAC2 inhibition was found to stabilize tumor suppression and induce apoptosis in human keratinocyte cells infected with high-risk HPV [93].

Anatomical Location of Warts
In the present study, most of the warts were obtained from the hands (n = 20), with a minority taken from the feet (n = 2) and forehead (n = 2). Anatomic site is an important factor to consider in such studies, as it can influence the expression and methylation of genes. The hands and forehead, for example, are exposed to environmental factors that other parts of the body are not, leading to differences in gene expression between exposed and non-exposed skin [94]. Interestingly, in a cancer context, one study reported a novel epigenetic signature of HPV infection that was independent of the anatomic location in HPV-associated head and neck squamous cell carcinomas [95].

Conclusions
Wart formation was found to involve a clear methylation pattern that sets it apart from normal skin. It was demonstrated that most differentially methylated genes were protein coding and included non-coding RNA genes as well. Surprisingly, many of the DM genes found in benign warts were previously reported to be involved in high-risk HPV infection and HPV-associated malignancies. The main limitation of the present study was that the participants were all male, but only males were included in order to minimize any genetic variation that might occur due to differences in sex.