An Evolutionary Cancer Epigenetic Approach Revealed DNA Hypermethylation of Ultra-Conserved Non-Coding Elements in Squamous Cell Carcinoma of Different Mammalian Species

Background: Ultra-conserved non-coding elements (UCNEs) are genomic sequences that exhibit > 95% sequence identity between humans, mammals, birds, reptiles, and fish. Recent findings reported their functional role in cancer. The aim of this study was to evaluate the DNA methylation modifications of UNCEs in squamous cell carcinoma (SCC) from different mammal species. Methods: Fifty SCCs from 26 humans, 17 cats, 3 dogs, 1 horse, 1 bovine, 1 badger, and 1 porcupine were investigated. Fourteen feline stomatitis and normal samples from 36 healthy human donors, 7 cats, 5 dogs, 5 horses, 2 bovines and 1 badger were collected as normal controls. Bisulfite next generation sequencing evaluated the DNA methylation level from seven UCNEs (uc.160, uc.283, uc.416, uc.339, uc.270, uc.299, and uc.328). Results: 57/59 CpGs were significantly different according to the Kruskal–Wallis test (p < 0.05) comparing normal samples with SCC. A common DNA hypermethylation pattern was observed in SCCs from all the species evaluated in this study, with an increasing trend of hypermethylation starting from normal mucosa, through stomatitis to SCC. Conclusions: Our findings indicate that UCNEs are hypermethylated in human SCC, and this behavior is also conserved among different species of mammals.


Introduction
Comparative studies on whole vertebrate genomes identified highly conserved non-coding sequences with length >200 bp, called ultra-conserved non-coding elements (UCNEs) [1,2]. In the first seminal work, Bejerano et al. [1] compared genomes of humans, mice and rats, identifying 481 regions with perfect identity. More recently, Dimitreva et al. [2], using slightly-relaxed criteria with a 95% identity and >200 bp in length, identified a list of 4351 orthologues in 18 vertebrate species, mostly located within intergenic regions (2139) and the rest in non-coding parts of genes (1713 in introns, 499 in Untranslated Regions (UTRs)). Relatively-frequent polymorphisms exist in UCNEs, but their derived alleles are frequently found homozygous (less than 6% occurring at frequency >1%) [3]; additionally, 112 single nucleotide polymorphisms (SNPs) are annotated in the Ensembl genome browser as phenotypes associated with muscular dystrophies, amyotrophic lateral sclerosis, eye-related disorders, or cancers. Recent data indicated that the conservation of at least some UCNEs is of high importance in normal phenotype, which is in agreement with knockout studies [4].
Most UCNEs were found in clusters, and more often than expected by chance near coding regions for transcription factors and molecules involved in development. These features have suggested the hypothesis that UCNEs may be candidate regulatory elements with a crucial role in early stages of vertebrate development and differentiation. They harbor important sequence features, such as binding sites of developmental transcription factors to coordinate the expression of essential genes, which is why they were readily conserved over the long course of evolution [5]. Being frequently located at both fragile sites and genomic regions involved in tumors, they were studied as biomarkers in several types of cancers [6,7]. Interestingly, it has been shown that their expression profiles are tissue-and cancer-specific, providing a new tool to successfully distinguish different cancer types and subtypes [8]. Gene expression may be regulated by epigenetic modifications such as DNA methylation at CpG sites, where a cytosine can be methylated (5 mC) on the 5th position on the pyrimidine ring. Since SNPs increase from 1% in the genome to 15% at CpG sites due to deamination of methylated C giving a mutation from C to T, selection pressure has preserved these ancient CpGs within some UCNEs, which have escaped the rapid loss of CpG sites typically seen throughout vertebrate genomes. In general, UCNEs have lower CpG density than other regions [9], while some UCNEs mapped on specific clusters revealed CpG islands with a possible role in gene expression of these loci. In this study, we identified a small set of UCNEs, which contained a CpG island and were already reported to be differentially expressed in various types of cancer, such as uc.160 [10,11], uc.270 [11], uc.283 [12], uc.328, uc.339 [13], uc.299 [14], and uc.416 [15]. We evaluated the DNA methylation level of all these CpG loci in SCC from different mammalian species, comparing it with related normal samples and with stomatitis when available.

Ethics Statement
All clinical investigations regarding human samples were conducted according to the principles of the Declaration of Helsinki. The study was approved by the local ethics committee (study number 520/2018/Sper/AOUBo, protocol number OB-200). All information regarding the human material used in this study was managed using anonymous numerical codes.
For all non-human samples, as the research did not influence any therapeutic decision, the approval by an ethics committee was not required. All the histological samples were collected for diagnostic purposes in our routine standard care. Owners gave informed consent and consciously agreed to the use of clinical data and stored biological samples for research purposes. For the brushing of normal oral mucosa, samples were collected for research purposes only upon owners' informed consent.

Study Population and Brushing Collection
Oral human brushing specimens were collected from 26 consecutive patients treated surgically for oral squamous cell carcinoma (OSCC). All 26 patients (13 females and 13 males (mean age 71 years, 10 smokers)) were diagnosed and treated at the Department of Biomedical and Neuromotor Sciences, University of Bologna, Section of Oral Sciences and the Maxillofacial Surgery Unit, Sant'Orsola Hospital during the period 2018-2019. Index human SCC locations were the following: in 10 patients the tongue and/or the floor of mouth, in 6 patients the right or left cheek, in 9 patients the hard palate or the gingiva and in 1 patient the inferior lip. Fourteen SCCs were diagnosed at early stage (T1-2N0) whereas 12 SCCs were diagnosed at advanced stage (T3-4N+) according to the p-TNM classification of tumors (AJCC 8th edition) [16]. Surgical resection of OSCCs was always performed in accordance with standard treatment practice [17]. Brushing specimens from 36 healthy donors (17 females, 19 males) were collected in a cluster of age, sex, and smoking-habits-matched patients presenting at the University Unit only for dental care, during the same period. In this group we avoided collecting samples with any type of lesion in the oral cavity (infective, reactive, or benign).
Oral brushing sample collection was performed in the population study as previously described [18,19]; in brief, a cytobrush (manufactured by N.H.M.P. Co., Ltd. PRC EC REP: Shanghai International trading corporation, Hamburg, Germany) was used to collect exfoliated cells from oral mucosa. Each cytobrush sample was placed in a 2-mL tube containing 500 µL of DNA/RNA Shield (Zymo Research, Irvine, CA, USA) for nucleic acid preservation. The whole surface of the lesion was gently brushed with rotation and translation movements. In these patients, oral brushing was always performed before incisional biopsy and samples were enrolled in the population study only after histological confirmation of oral SCC.
For non-human samples, a retrospective-prospective survey was carried out on medical records of the Service of Veterinary Pathology and Veterinary Teaching Hospital at the Department of Veterinary Medical Sciences (University of Bologna, Italy), at the Institute of Veterinary Pathology, Vetsuisse Faculty (University of Bern), and at the Institute of Veterinary Pathology, Vetsuisse Faculty (University of Zurich, Switzerland).
Representative histological specimens with a diagnosis of SCC (24 cases) from various locations were collected from as many mammalian species as possible. Additionally, histological cases of feline chronic lymphoplasmacellular stomatitis (14 cases) were also included, in order to compare this condition with feline neoplastic oral mucosa, as a greater number of SCCs was available for this species. All the histological samples were formalin-fixed and paraffin-embedded (FFPE), sectioned at 4 µm, and stained with hematoxylin and eosin (HE).
To obtain controls for as many species as possible with SCC cases, 20 brushing samples of oral mucosa were collected from animals received for clinical visits or autopsies. Only animals without clinical and macroscopic evidence of oral lesions were sampled. The sampling was performed with the same technique as human brushing. Table 1 summarizes clinical data of non-human samples.
MethPrimer was applied to identify CpGs and the best primers of choice [20] for targeted sequencing. The list of genomic regions, primer sequences and mapping coordinates interrogated in this study are available in Table 2.

DNA Methylation Analysis
DNA methylation analysis was evaluated as previously described [18,19]. In brief, DNA from exfoliating brush specimens was purified using the MasterPure™ Complete DNA Purification Kit (Lucigen, Middleton, WI, USA, cod. MC85200). DNA from 10 µm sections of FFPE tissues (five for each sample) were purified using the QuickExtract™ FFPE DNA Extraction Kit (Lucigen, Middleton, WI, USA, cod. QEF81050) following the protocol described by Gabusi et al. [21]. Fifty to five hundred ng of DNA was treated with sodium bisulfite using the EZDNA Methylation-Lightning™ Kit (ZymoResearch, Irvine, CA, USA, cod. D5031) according to the manufacturer's instructions. Quantitative DNA methylation analysis was performed by next-generation sequencing for the following genes: uc.160, uc.283, uc.283 promoter, uc.416, uc.339, uc.270, uc.299, and uc.328. Locus-specific amplicon libraries were generated with tagged primers in two steps: a first PCR amplification for target enrichment, and a second shorter amplification session (eight cycles) to allow the barcoding of the template-specific amplicons obtained from the first amplification step using the Nextera™ Index Kit (Illumina, San Diego, CA, USA) [22][23][24][25][26]. Sequencing was conducted on MiSeq sequencer (Illumina, San Diego, CA, USA), according to the manufacturer's protocol. Each Next Generation Sequencing (NGS) experiment was designed to allocate at least 1 k reads/region, in order to have a depth of coverage of 1000×.
The methylation ratio of each CpG was calculated in parallel by different tools in a Galaxy Project environment (Europe) [27]: FASTQ files were processed for quality control (>Q 30) and for read lengths (>80 bp) by Filter by Quality and Filter FASTQ reads for quality score and length, respectively. Reads were mapped by BWAmeth, generating a bam file which was processed by MethylDackel using hg38 for human samples and Felis_catus_9.0 for feline samples as a reference. This tool created an excel file assigning at each CpG position the exact methylation level; additionally, we adopted in parallel the EPIC-TABSAT web-tool to confirm our data [28] for human samples, but it worked also for other species as the regions investigated were ultraconserved compared to hg38. Once having catalogued methylation levels for all CpGs in an excel file, the evaluation of differential DNA methylation with group comparison was performed by methylation plotter [29], which provides descriptive statistics and basic non-parametric variance analysis (Kruskal-Wallis tests). For each sample and group, a data table summarizing the mean, standard deviation, minimum and maximum was produced (see Supplementary  Tables S1-S3). ClustVis [30] was used to create the HeatMap. Receiver operating characteristic (ROC) analysis was performed using the web tool EasyROC (http://www.biosoft.hacettepe.edu.tr/easyROC/). Multiclass linear discriminant analysis was calculated using IBM SPSS Statistics 21 (IBM). The R packages pcaMethods (https://bioconductor.org/packages/release/bioc/html/pcaMethods.html) and OmicCircos (https://bioconductor.org/packages/release/bioc/html/OmicCircos.html), both included in the software project Bioconductor, were used for principal component analysis (PCA) and circular plots, respectively. The original contributions presented in the study are publicly available. This data can be found at GEO (gene expression omnibus) repository (accession: GSE157436).
The same pattern was observed by considering the single species individually. In brief, comparing human SCCs with normal human samples, hypermethylation of uc.270, uc.283, and uc.339 gave the best discriminative power, with most of the CpGs being statistically significant (  Figure S1).
In cats, all the investigated UCNEs were significantly different between SCCs and non-neoplastic samples ( Figure 3). All methylation data are reported in Table S3 (Supplementary Files).
In dogs uc.283, uc.328, and uc.416 showed the best discriminatory potential as shown in Figure S2 (Supplementary File). In this species, uc.339 displayed the lowest level (close to 0) of methylation both for normal samples and SCCs, a condition not observed in the other species. Comparing the equine SCCs with five normal equine samples, uc.160, uc.328, and uc.416 showed the most prominent epigenetic alterations ( Figure S3). We also evaluated one bovine SCC against two normal bovine samples detecting hypermethylation in uc.160, uc.270, uc.283, uc.299, and uc.339; the badger SCC, when compared to its relative normal sample showed hypermethylation in uc.160, uc.283, and uc.299; the porcupine SCC, compared to all normal samples of different species, exhibited aberrant methylation in uc.339, uc.270, uc.299, and uc.328.
We also compared DNA methylation levels in SCCs among different species as shown in Figure S4. We found a different methylation level in all the UCNEs investigated. The same different pattern was also detected when comparing normal samples among different species ( Figure S5).

Principal Component Analysis (PCA)
A plot showing the first three principal components for PCA with tissue type as grouping factor is shown in Figure 4. SCC elements (square) were quite spread, while normal samples (circle) tended to be more clustered in a well-defined and restricted area, similarly to the stomatitis samples (triangle). the porcupine SCC, compared to all normal samples of different species, exhibited aberrant methylation in uc.339, uc.270, uc.299, and uc.328.
We also compared DNA methylation levels in SCCs among different species as shown in Figure  S4. We found a different methylation level in all the UCNEs investigated. The same different pattern was also detected when comparing normal samples among different species ( Figure S5).

Principal Component Analysis (PCA)
A plot showing the first three principal components for PCA with tissue type as grouping factor is shown in Figure 4. SCC elements (square) were quite spread, while normal samples (circle) tended to be more clustered in a well-defined and restricted area, similarly to the stomatitis samples (triangle). . PCA results for the three different types of tissue. Only for display purposes, different species are highlighted with different colors. Unit variance scaling was applied to rows and Singular value decomposition (SVD) with imputation was used to calculate principal components. X, Y, and Z axes show principal components 1, 2, and 3 that explain 47%, 11%, and 9.7% of the total variance, respectively.
A HeatMap showing a hierarchical clustering related to all the quantitative data and their relationship among all CpGs and all cases was created and shown in Figure 5. Feline stomatitis tend to group together in the same cluster (left side); a second cluster was created with animal SCCs located on the left side and normal animal samples on the right; a third cluster contains only human SCCs (left and right) and normal human oral mucosa samples (middle part); and a 4th cluster includes 18 SCCs (15 humans, 3 dogs), and 24 normal samples (16 humans, 5 dogs, 1 horse, 1 bovine). Clinical features (sex, age, and smoking habits) of the human study population did not influence the clusterization, in particular, for clusters 3 and 4 where human SCCs and normal samples were located. . PCA results for the three different types of tissue. Only for display purposes, different species are highlighted with different colors. Unit variance scaling was applied to rows and Singular value decomposition (SVD) with imputation was used to calculate principal components. X, Y, and Z axes show principal components 1, 2, and 3 that explain 47%, 11%, and 9.7% of the total variance, respectively.
A HeatMap showing a hierarchical clustering related to all the quantitative data and their relationship among all CpGs and all cases was created and shown in Figure 5. Feline stomatitis tend to group together in the same cluster (left side); a second cluster was created with animal SCCs located on the left side and normal animal samples on the right; a third cluster contains only human SCCs (left and right) and normal human oral mucosa samples (middle part); and a 4th cluster includes 18 SCCs (15 humans, 3 dogs), and 24 normal samples (16 humans, 5 dogs, 1 horse, 1 bovine). Clinical features (sex, age, and smoking habits) of the human study population did not influence the clusterization, in particular, for clusters 3 and 4 where human SCCs and normal samples were located.
The following circle plot ( Figure 6) represents graphically how the methylation level increases from normal donors, through stomatitis to SCC depending on the position of each CpG within the UCNE in humans, cats, and dogs.

Receiver-Operating Characteristic (ROC) Analysis
For each UCNE, the best CpGs to discriminate SCCs from control samples were identified using the EasyROC web tool (Table 3):  The following circle plot ( Figure 6) represents graphically how the methylation level increases from normal donors, through stomatitis to SCC depending on the position of each CpG within the UCNE in humans, cats, and dogs.
Data coming from these calculations were used to evaluate sensitivity (0.875), specificity (0.750) and the AUC (0.887, Figure 7), with a threshold of −0.232. We were able to correctly identify 20 human and 22 animal SCCs and 13 human and 26 animal normal cases, while we reported 4 human and 8 animal false positive results (of which 7 were stomatitis), and 5 human and 1 animal false negative cases. Cells 2020, 9, x FOR PEER REVIEW 13 of 18 Figure 6. Circular visualization of methylation levels for each of the seven UCNEs in the different tissues for cat, dog, and human samples. UCNE 283 is highlighted as it showed the best discriminative power between SCCs and normal tissues for cats, dogs, and humans. For each UCNE, the best CpG to discriminate SCCs from control samples is also highlighted.

Receiver-Operating Characteristic (ROC) Analysis
For each UCNE, the best CpGs to discriminate SCCs from control samples were identified using the EasyROC web tool (Table 3):

Discussion
In this study, the DNA methylation level of seven CpG-rich UCNEs were investigated in SCC from different mammalian species, including human, cat, dog, horse, bovine, porcupine, and badger.

Discussion
In this study, the DNA methylation level of seven CpG-rich UCNEs were investigated in SCC from different mammalian species, including human, cat, dog, horse, bovine, porcupine, and badger. We compared those data with normal samples from the same species and feline stomatitis. Multilevel comparisons highlighted the presence of epigenetic alterations in several CpGs, reaching statistically-significant values using the Kruskal-Wallis test by the methylation plotter tool. These results were obtained by evaluating SCCs altogether vs. all the normal samples and feline stomatitis (Figure 1), but also considering a species-specific comparison between SCCs and normal tissue ( Figure 2, Figure 3, Figures S1 and S2). In this regard, we think that the latter is the most reliable method to detect a clear epigenetic aberration, as we identified a variable range of methylation levels among different species both for SCCs and normal samples. These fluctuations are shown in Figure S3 for SCCs and in Figure S4 for normal cases, highlighting the aberrant pattern of DNA hypermethylation in all the investigated UCNEs. This pattern was also confirmed evaluating the PCA (Figure 4) and the HeatMap (Figure 5), showing a first cluster (cluster 1) comprising all the feline stomatitis that were close to each other and four feline SCCs; a second cluster (cluster 2) characterized by feline and non-human SCCs located on the left side and highly hypermethylated in almost all the UCNEs, and normal feline and other non-human normal samples on the right; cluster 3, specific for human SCCs with two groups, one on the left and one on the right side, while normal human oral mucosa samples were located in the middle part; finally, a more heterogeneous cluster (cluster 4) including 18 SCCs (15 humans, 3 dogs), and 24 normal samples (16 humans, 5 dogs, 1 horse, 1 bovine) characterized by hypermethylation of uc.270. Evaluating the relationship between clinical parameters (sex, age, and smoking habits) and hierarchical clustering among groups, we found no correlation.
Analyzing data from single species, in human SCCs uc.270, uc.283, and uc.339 were identified as the best discriminative biomarkers, with most of the CpGs being statistically significant (6/7, 10/11, and 4/9, respectively). In cats, we were able to retrieve data from a supplementary group made of stomatitis to be compared with normal tissue and SCC, since these inflammatory lesions were recently reported to be altered from an epigenetic point of view [31]. This may be due to an epigenetically-regulated expression of proinflammatory cytokines and other inflammatory-related genes. Moreover, chronic inflammation triggered by various factors can induce aberrant methylation, which in some cases has a preneoplastic effect in epithelial cells [32,33]. In our cohort, we reported a trend of increased methylation starting from normal tissue, through stomatitis to SCC, where the latter exhibited the highest level of methylation. Interestingly, we found the same trend in all the seven UCNEs investigated, all showing most of the CpGs to be statistically significant. In the dog, uc.283, uc.328, and uc.416 revealed the best discriminatory potential; in contrast to other species, uc.339 were completely unmethylated both for normal tissue and for SCC. Unfortunately, we were able to retrieve from our archives only one equine SCC to be compared with five normal samples from the same species, showing uc.160, uc.328, and uc.416 as the most relevant epigenetic biomarkers. Although only one bovine SCC was compared with two normal samples from the same species, we found hypermethylation in uc.160, uc.270, uc.283, uc.299, and uc.339. One case of badger SCC vs. one case of normal tissue showed hypermethylation in uc.160, uc.283, and uc.299. Finally, one case of porcupine SCC could not be compared with normal tissue of the same species, however, comparing it with all the normal samples available from different species, it exhibited an aberrant methylation in uc.339, uc.270, uc.299, and uc.328.
Taken together, these data indicated a clear hypermethylation status of UCNEs in all the investigated species, with uc.416, uc.339, and uc.283 showing the best AUC values. We also estimated the AUC range of any CpG within each UCNE, since the location of DNA methylation has been reported to have a crucial role in regulating gene expression [18,34]. The best to worse gap varied a little from 0.058 to 0.084 in uc.160, uc.299, uc.339, and uc.328, while we reported higher variations in uc.270, uc.283, and uc.416, ranging from 0.225 to 0.164. Selecting the best CpGs from each UCNE, we were able to calculate an algorithm of choice to better discriminate SCCs from normal samples using a linear discriminant analysis (LDA) approach. The use of a single CpG may be inadequate for this purpose, since the best AUC was detected from uc.416 (AUC: 0.843, coordinate 46670879). The new algorithm slightly increased the performance, since we were able to reach a sensitivity of 0.875, a specificity of 0.750, and an AUC of 0.887, with a threshold of −0.232 for the score. By calculating a score using the developed algorithm, we were able to assign a correct diagnosis in 20 human and 22 animal SCCs, as well as in 13 and 26 normal cases from humans and animals, respectively. However, we reported four human and eight animal false positive results (of which seven were stomatitis) affecting specificity, and five human and one animal false negative cases. Further tests in an independent larger cohort of cases will be necessary to provide strong evidence that these biomarkers can accurately identify SCC. If confirmed, this completely new approach involving the same target regions being present in human and several different mammalian species, could be applied to human and veterinary diagnostics with the same protocol.
One limit of this study was related to the low number of cases for some species which were difficult to retrieve due to their rarity in anatomic pathology archives. Since variation of methylation level also varies within single species, further studies are needed to confirm our preliminary data on cows, horses, badgers, porcupines, and eventually, other mammals, to enrich more comprehensive and powerful results. Most of our data are based on SCCs coming from oral tissue specimens of humans, cats, and dogs, so only a partial epigenetic landscape of UCNE in mammals has been pointed out in our study. A few of the SCCs here investigated had different sites of origin (skin, vulva, mandible, or lung), however recent findings reported that SCCs from various body sites have common epigenetic and genetic determinants, pointing to a unified perspective of the disease and potential new avenues for prevention and treatment [35]. Moreover, the role of DNA methylation in UCNE should be further compared to the expression levels, but this will be feasible using only fresh/frozen tissues which are difficult to retrieve, especially for those species that rarely are admitted to veterinary clinics.

Conclusions
Our findings indicate that UCNEs are hypermethylated in human SCCs, and this behavior is also conserved among different species of mammals. Further comparative studies are needed to investigate the molecular similarities between human and animal SCCs and their potential usefulness to understand and combat this neoplasm.
Supplementary Materials: The following are available online at http://www.mdpi.com/2073-4409/9/9/2092/s1: Figure S1: Methylation plot (left) and boxplot (right) comparing canine SCC and normal oral epithelium in healthy dogs; Figure S2: Methylation plot (left) and boxplot (right) comparing equine SCC and normal oral epithelium in healthy horses; Figure S3: Comparison among SCCs from different species: methylation plot (left) and boxplot (right); Figure S4: Comparison among normal samples from different species: methylation plot (left) and boxplot (right); Table S1: Summary Table with mean, standard deviation, minimum, maximum, and p value of the Kruskal-Wallis test for each position and group of samples comparing all SCCs from all the species together vs. normal samples and feline stomatitis; Table S2: Summary Table comparing human normal donors and SCCs;  Table S3: Summary Table with