Human Endogenous Retroviruses in Glioblastoma Multiforme

Glioblastoma multiforme (GBM) is the most aggressive and deadly brain tumor. It is primarily diagnosed in the elderly and has a 5-year survival rate of less than 6% even with the most aggressive therapies. The lack of biomarkers has made the development of immunotherapy for GBM challenging. Human endogenous retroviruses (HERVs) are a group of viruses with long terminal repeat (LTR) elements, which are believed to be relics from ancient viral infections. Recent studies have found that those repetitive elements play important roles in regulating various biological processes. The differentially expressed LTR elements from HERVs are potential biomarkers for immunotherapy to treat GBM. However, the understanding of the LTR element expression in GBM is greatly lacking. Methods: We obtained 1077.4 GB of sequencing data from public databases. These data were generated from 111 GBM tissue studies, 30 GBM cell lines studies, and 45 normal brain tissues studies. We analyzed repetitive elements that were differentially expressed in GBM and normal brain samples. Results: We found that 48 LTR elements were differentially expressed (p-value < 0.05) between GBM and normal brain tissues, of which 46 were HERV elements. Among these 46 elements, 34 significantly changed HERVs belong to the ERV1 superfamily. Furthermore, 43 out of the 46 differentially expressed HERV elements were upregulated. Conclusion: Our results indicate significant differential expression of many HERV LTR elements in GBM and normal brain tissues. Expression levels of these elements could be developed as biomarkers for GBM treatments.


Introduction
Glioblastoma multiforme (GBM) is the most common and most aggressive type of primary brain tumor, accounting for 47.7% of primary malignant brain tumors [1,2]. GBM is more common in males, and it appears to be sporadic without any genetic predisposition [3]. In the U.S. alone, 12,120 people in 2016, 13,010 in 2018, and 13,010 in 2019 were diagnosed with GBM. People diagnosed with GBM have a 5-year survival rate of less than 6% [1,4]. Risk factors for GBM include exposure to ionizing radiation [5], certain use of electronics [6][7][8], and infection by viruses such as human cytomegalovirus (HCMV) [9][10][11]. The biological mechanism of GBM remains a topic of controversy [12,13]. Therapy for GBM remains challenging due to the lack of efficient biomarkers and drug targets [14]. Searching for new biomarkers and drug targets to treat such a devastating disease is imperative and could have a significant impact on patient survival.
One potential source of biomarkers and drug targets is human endogenous retroviruses (HERVs) and their long terminal repeat sequences (LTRs). HERVs are believed to be relics of exogenous retroviruses integrated into the human genome throughout evolution [15]. They are major contributors to repetitive elements in the human genome. HERVs can be systematically classified into at least five groups (ERV1, ERV2, ERV3, ERV4, and endogenous lentivirus). Among them, only ERV1, ERV2, and ERV3 can be traced in the human genome [16]. HERVs and related retrotransposons account for about 8% of human genomic DNA [17,18]. HERVs are typically composed of GAG, POL, and ENV regions sandwiched between two LTRs [15,19,20], which are 330-1328 bp long [21][22][23]. Throughout the lengthy co-evolution, HERVs can affect host biological processes. Their effects are mediated by various genomic elements, such as alternative splicing sites, enhancers, poly-A signals, promoters, and repressors [24][25][26]. HERVs have been shown to synthesize and express unique proteins or even virus-like particles [27][28][29][30]. In some studies, HERV proteins have been shown to perform important biological functions, such as triggering or regulating host immune responses [31][32][33][34]. The transcripts from HERV-K HML-2 have been found to be associated with a number of cancers, such as melanoma [35], leukemia and lymphoma [36], and tumors of the breast [37,38], testis [37], and ovary [38]. Expression of the HERV-E family of retrotransposable elements has been found to be correlated with prostate, kidney, ovarian, and uterine cancers [39,40]. HERV-H sequences were found to be overexpressed in colorectal carcinogenesis [41]. In addition, the results of more recent studies indicated that HERVs are associated with neurological disorders such as multiple sclerosis [42], amyotrophic lateral sclerosis [43], and schizophrenia [44]. These observations triggered the reevaluation of the importance of HERVs in disease [29,38,45] and their potential role as biomarkers and drug targets [46].
However, despite sporadic reports on the roles of some HERVs in brain tumors [47][48][49][50], a comprehensive understanding of the correlation between HERVs and glioblastoma is lacking. Although HERV LTRs can drive the expression of retroviral proteins that may be involved in various biological processes and can serve as biomarkers unique to GBM, no study has been carried out to focus on understanding the roles of LTRs, especially HERV LTRs, in GBM. The traditional study of repetitive elements via data-driven approaches remains challenging due to the ambiguity of mapping short reads to repetitive genomic sequences. The availability of the vast amount of next-generation sequencing data enabled us to characterize the landscape of the expression of HERVs and repetitive elements at an unprecedentedly comprehensive level [51], and breakthroughs in algorithms have made the understanding of the repetitive element expression landscape possible [52][53][54]. In this work, we analyzed and identified the differential expression of LTRs, including those of HERVs in GBM and normal brain tissues. The upregulated protein-coding HERVs in GBM may generate protein markers that are unique to GBM, suggesting their potential as therapeutic targets for GBM treatment.

Materials and Methods
RNA-Seq data from 111 GBM tissue studies, 30 GBM cell line studies, and 45 normal brain tissues studies were obtained from public databases such as the NCBI Sequence Read Archive (SRA: https://www.ncbi.nlm.nih.gov/sra, accessed on 5 April 2021) and the Gene Expression Omnibus (GEO: https://www.ncbi.nlm.nih.gov/geo/, accessed on 5 April 2021). Sample accessions are listed in Supplementary Table S1.
The reads were first trimmed using Trimmomatic to remove ambiguous nucleotides (N's), extremely short reads (<30 nt), and low-quality bases with a sliding window size of 4 [55]. For quality control, the trimmed reads were mapped to the human genome (hg38) via Bowtie with the following parameters: -chunkmbs 500 −m 1 −S [56]. The resulting Sequence Alignment Map (SAM) files were converted, sorted, and indexed via SAMtools [57]. Indexed SAM files were processed by RepEnrich with default parameters [52], a software specifically designed to identify and quantify the expression of repetitive elements. Next, the EdgeR package [58] was used to analyze the expression of repetitive elements in GBM and normal brain tissues to identify significant differential expression. Significant differential expression was defined as differences with p-value < 0.05 and |fold change| ≥ 2 [59].
HERVs were filtered and analyzed based on the differential expression of their repetitive elements. The corresponding heatmap was plotted with Morpheus [60].
Phylogenetic analyses were conducted using differentially expressed LTR sequences. Sequence alignments were conducted using ClustalW [61] with default parameters. The phylogenetic tree was constructed using MEGA7 software [62] and the maximum likelihood method. The best model GTR + I + G was selected by Prottest 3.2.1 based on alignment results [63]. The phylogenetic tree was evaluated with 1000 bootstrap replicates.
The expressed transcripts within 1 kilobase pairs (kbp) of the differentially expressed LTRs were identified in the UCSC Genome Browser [64]. Among these, those that were differentially expressed were identified and plotted in GEPIA from TCGA [65]. The positions of the differentially expressed LTR elements on the chromosome were plotted using ChromoMap [66]. Genes interacting with differentially expressed repetitive elements via long-range chromatin interactions were identified using the 3D Interaction Viewer and Database (3DIV) [67].
A disease pathway enrichment analysis was performed on the identified neighboring genes using the WEB-based GEne SeT AnaLysis Toolkit (WebGestalt) [68] based on the gene-disease association database Disgenet [69]. The results of the analysis were plotted using the R script sp_enrichmentPlot.sh (https://github.com/Tong-Chen/s-plot/blob/ master/sp_enrichmentPlot.sh, accessed on 5 April 2021). In addition, DrugBank (https: //go.drugbank.com, accessed on 5 April 2021) was used to search approved drugs that target identified differentially expressed genes in proximity to the HERVs that show differential expression in GBM.

Results
By sifting through RNA-Seq data that are mapped to the repetitive "junk DNA" regions of the human genome, which are ignored in a typical RNA-Seq data analysis, we identified 137 repetitive elements differentially expressed in GBM and normal brain tissues. Of these, 48 LTRs were significantly differentially expressed (|fold change| ≥ 2, p-value < 0.05). Furthermore, 46 of the differentially expressed LTRs can be classified as belonging to HERVs, 34 of which are members of the ERV1 superfamily (Figures 1 and 2A,B).
Among the differentially expressed LTRs belonging to HERVs, 44 were upregulated and 4 were downregulated (Table 1, Figure 2C,D). Among these upregulated LTRs, 32 can be classified as ERV1s ( Figure 2D), suggesting that this superfamily could be highly relevant to GBM. The most upregulated ERV1s were MLT1M-int_LTR_ERVL-MaLR, LTR21A_LTR_ERV1, and LTR06_LTR_ERV1. The next most upregulated were elements belonging to the subfamilies of ERV3 (6 of 44) and ERV2 (5 of 44). The large number of ERV1s upregulated in GBM suggests that ERV1s may serve as novel targets in future GBM therapies, especially those with an adequate amount of expression in CPM (counts per million), such as LTR21A_LTR_ERV1 (normal brain (NB) CPM, 48.57; GBM CPM, 249.19; GBM vs. NB fold changes 5.13; p-value < 0.05) or LTR06_LTR_ERV1 (normal brain CPM, 9.29; GBM CPM, 36.37; GBM vs NB fold changes 3.91; p-value < 0.05).
In contrast, a much smaller proportion of LTR elements were expressed at lower levels in GBM than in normal brain tissues. Only four LTR elements, including two ERV1s (LTR1E_LTR_ERV1 and HERV-Fc1_LTR2_LTR_ERV1), were downregulated in GBM tissues. The phylogenetic analysis showed that those downregulated ERV1s are closely related, belonging to the same phylogenetic clade ( Figure 2B). This result indicates a common origin or biological function of those elements. In summary, our results show that the expression levels of LTRs, especially ERV1s, are ubiquitously higher in GBM than in normal brain tissues. These results suggest a potential functional association between the expression of HERVs and GBM. The proteins encoded by these elements could be uniquely expressed and present in GBM, suggesting a future application of those elements in drug and therapy development.   For the differentially expressed LTRs, there are some nearby transcripts whose expression levels also differ in GBM and normal brain tissue. The correlation could cast some light on the impact of LTR elements on the transcription profile of GBM. For LTR elements, LTR21A_LTR_ERV1 was observed to be upregulated ( Figure 3A), and there were seven genes upregulated in the adjacent regions ( Figure 3B). For the downregulated MER11B_LTR_ERVK elements (Figure 3C), 21 adjacent genes were upregulated and 28 were downregulated ( Figure 3D). In addition, the search against the 3VD database showed that several genes or genomic elements potentially interact with those differentially expressed LTR elements through long-distance interaction and may provide some insight into gene expression regulation in GBM (Figure 4). Detailed information about the differentially expressed genes or affected genomic regions adjacent to the target LTR elements are presented in Supplementary Materials S2 and S3.  (E) potential long-range interaction predicted by 3DIV; and (F) the expression level of CNTNAP2, identified through long-range interaction, in GBM (red, n = 163) and normal brain tissue (blue, n = 207). In (C,F), the y-axis represents the normalized gene expression level (log 2 (TPM + 1)). * means the significant differentially expressed element between GBM and normal brain.
We further analyzed the differentially expressed genes by performing a disease pathway enrichment analysis. For the genes adjacent to differentially expressed HERVs such as MER11B_LTR_ERVK or LTR21A_LTR_ERV1, a number of them are related to brain diseases such as amphetamine-related, status epilepticus, or other nerve system disorders ( Figure 5). This observation raised an interesting possibility that those differentially expressed HERVs may be located in genomic regions with potentially important roles in brain dysfunctions.
We also performed a search in DrugBank to identify drugs that target these neighboring genes (Table 2). Interestingly, several of these genes identified from Hi-C data are targeted for brain functions. For example, the solute carrier family 22 member 4 (SLC22A4) is a transmembrane protein targeted by several drugs for brain functions. In addition, carbonic anhydrase 10 (CA10) encodes a carbonic anhydrase and is considered to play a role in the central nervous system, especially in brain development. Furthermore, glutamate ionotropic AMPA type subunit 2 (GRIA2) belongs to a family of glutamate receptors that are considered to be the predominant excitatory neurotransmitter receptors in the mammalian brain and are also targeted for brain function. All these observations point to possible future research directions to decipher whether there is any biological relevance between differentially expressed HERV elements and these neighboring genes that not only share similar gene expression patterns but are also involved in brain functions and therapies in the central nervous system.

Discussion
HERVs have been integrated into the human genome for millions of years. They can encode proteins that are differentially expressed under different disease conditions, making them potential biomarkers or therapeutic targets. Through a comparison of data from 45 non-GBM tissues, 111 GBM tissues, and 30 GBM cell culture samples, we observed that 48 LTR elements of HERVs were expressed at significantly different levels in GBM than in normal brain cells or tissues. Of these, 44 were upregulated and 4 were downregulated. Most of the upregulated HERVs in GBM (34 out of 44) belong to the ERV1 superfamily (Table 1, Figure 2D). This result is in agreement with previous research that demonstrates that ERV1 is vastly upregulated in diseases and associated cells, including psoriatic skin [70], prostate cancer cell lines [52], and osteosarcoma [71]. Thus, ERV1 could be a potential biomarker for future GBM treatments. Interestingly, the most downregulated HERV is HERV-K, which is consistent with a previous study showing that almost no HERV-Ks can be amplified from human astrocytic tumors [50]. In addition, we observed that MER4-int LTR_ERV1 and LTR39_LTR_ERV1 were expressed at different levels in both GBM and a subgroup of healthy patients. While there could be many possible reasons, further analysis would be helpful to understand this phenomenon.
We also found that the genes neighboring LTR elements were also differentially expressed in GBM and control brain samples. For example, the MLT1M-int_LTR_ERVL-MaLR elements on chromosome 7 were downregulated in GBM, along with the downregulation of the adjacent gene, CNTNAP2 (contactin-associated protein 2). CNTNAP2 is an important neurogenesis gene that some studies suggest is a tumor suppressor gene in glioma [72,73]. This observation could provide some clues to the mechanisms and functional implication of differentially expressed elements.
Previous research indicated that brain disorders such as disabilities in cognitive functions are frequently observed and strongly correlated in brain tumor patients [74,75]. Therefore, interfering with the expression of nervous system-related genes may indicate a potential correlation with the risk of brain tumors such as GBM. Furthermore, retroviruses can be inherited just like a regular gene, and their mutations can accumulate throughout evolution. Therefore, our observations of HERVs could cast some light on the etiology of GBM [76]. In our study, we found that a number of differentially expressed genes adjacent to differentially expressed HERVs are related to mental or central nervous system disorders. The observation indicates that differentially expressed HERVs in humans may be located in genomic regions that are important for brain functions. Furthermore, the correlated differential expression between these HERV elements and the genes involved in mental and central nervous system disorders provide a possible direction to further investigate the roles of HERVs in brain dysfunctions and even brain tumors.
Our analysis of differentially expressed genes could also provide insight into their relationship with GBM and other brain diseases such as Alzheimer's disease (AD). Indeed, several differentially expressed proteins have been associated with AD. For example, amyloid-beta precursor protein binding family B member 1 interacting protein (APBB1) and beta secretase 2 have been involved in the modulation of APP processing and activity [77,78]. Neuroligin 1 has been shown to mediate the synaptic and memory deficits associated with AD [79]. Our study comprehensively characterized the landscape of the expression of repetitive elements, particularly HERVs, in GBM, as it differs from that of a normal brain. Our results suggest the potential application of such elements in the development of cancer therapies.

Conclusions
In our study, we comprehensively characterized the expression of repetitive elements differentially expressed in GBM and normal brain tissues. We identified 46 HERV elements, among which 43 were upregulated in GBM. The differentially expressed HERVs were also correlated with other differentially expressed genes or genomic elements nearby or through long-range genome interactions, indicating the potential functional role of HERVs in GBM. Furthermore, upregulated LTR elements of HERVs could express proteins that are unique to GBM. These could be used as future biomarkers or immunotherapy targets for GBM treatment.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/10 .3390/microorganisms9040764/s1, Supplementary Materials S1: Supplementary Table S1. The list of the samples, experiments, and their accessions (SRA or GEO) used for this study. Supplementary Materials S2: The elements that are differentially regulated and related to the differentially expressed LTR21A_LTR_ERV1 in a long-range chromatin interaction. For each interaction map, the heatmap indicates the level of normalized interaction frequencies, and each triangle indicates a topological association domain. The axis indicates the interaction frequency (blue bar graph) and distancenormalized interaction frequency (magenta dots), respectively. The green line indicates the cut-off for the distance-normalized interaction frequency. The significant interactions are linked via blue arcs. Supplementary Materials S3: The elements that are differentially regulated and related to the differentially expressed MER11B_LTR_ERVK in a long-range chromatin interaction. For each interaction map, the heatmap indicates the level of normalized interaction frequencies, and each triangle indicates a topological association domain. The axis indicates the interaction frequency (blue bar graph) and distance-normalized interaction frequency (magenta dots), respectively. The green line indicates the cut-off for the distance-normalized interaction frequency. The significant interactions are linked via blue arcs.
Author Contributions: Z.Y., W.J.Z. and Z.A. conceived the study, designed the experiments, and prepared the manuscript. Z.Y. and Y.Y. performed all data analysis. C.S. provided insight on AD connection. X.J., N.Z., W.J.Z. and Z.A. provided guidance on data selection and analysis. All authors have read and agreed to the published version of the manuscript.
Funding: This work is partly supported by the Cancer Prevention and Research Institute of Texas grant RP170668 (Zheng) and RP150551 and RP190561 (An), the National Institutes of Health (NIH) through grants 1UL1TR003167 and R01AG066749 (Jiang and Zheng), and the Welch Foundation AU-0042-20030616 (An). Data Availability Statement: All data and materials are publicly available as described in the Supplementary Materials S1.