Human Endogenous Retrovirus K Rec Forms a Regulatory Loop with MITF that Opposes the Progression of Melanoma to an Invasive Stage

The HML2 subfamily of HERV-K (henceforth HERV-K) represents the most recently endogenized retrovirus in the human genome. While the products of certain HERV-K genomic copies are expressed in normal tissues, they are upregulated in several pathological conditions, including various tumors. It remains unclear whether HERV-K(HML2)-encoded products overexpressed in cancer contribute to disease progression or are merely by-products of tumorigenesis. Here, we focus on the regulatory activities of the Long Terminal Repeats (LTR5_Hs) of HERV-K and the potential role of the HERV-K-encoded Rec in melanoma. Our regulatory genomics analysis of LTR5_Hs loci indicates that Melanocyte Inducing Transcription Factor (MITF) (also known as binds to a canonical E-box motif (CA(C/T)GTG) within these elements in proliferative type of melanoma, and that depletion of MITF results in reduced HERV-K expression. In turn, experimentally depleting Rec in a proliferative melanoma cell line leads to lower mRNA levels of MITF and its predicted target genes. Furthermore, Rec knockdown leads to an upregulation of epithelial-to-mesenchymal associated genes and an enhanced invasion phenotype of proliferative melanoma cells. Together these results suggest the existence of a regulatory loop between MITF and Rec that may modulate the transition from proliferative to invasive stages of melanoma. Because HERV-K(HML2) elements are restricted to hominoid primates, these findings might explain certain species-specific features of melanoma progression and point to some limitations of animal models in melanoma studies.


Introduction
Endogenous retroviruses (ERVs) are remnants of retroviruses that once infected the germline and became vertically inherited as part of the host genome. Sequences derived from various ERVs account for 8% of the human genome [1], reflecting multiple waves of retroviral invasion in the human lineage. In the human genome, HERV-K(HML2) is the most recently endogenized HERV. Although some HERV-K(HML-2) insertions are polymorphic in the human population [2][3][4][5], suggesting relatively recent integration events, no fully-active provirus has been identified. Due to recombination events and other post-insertional mutations, the vast majority of HERV-K (HML2) members (~950-1200) are either solitary long terminal repeats (solo LTRs) or variously truncated copies [6,7]. A subset of proviral copies (~90) is still capable of expressing retroviral proteins (e.g., Gag, Pro-Protease, Pol-Polymerase, Similar to Env, the expression of Rec and Np9 is significantly increased in several pathological conditions [55], including germ cell tumors and melanoma [12,21,37,42,43,[56][57][58][59][60], but again it remains unclear whether and how these factors could contribute to tumorigenesis and other disease states. To begin answering this question, we investigate the regulatory genomic aspects of LTR5_Hs sequences, as well as the oncogenic properties of HERV-K Rec (Rec thereafter) in melanoma. We used in vitro models as well as high throughput data mining, including single-cell transcriptome analyses of patient samples, to evaluate the impact of Rec expression in the spatiotemporal progression of melanoma. We found that Rec marks the proliferative state of melanoma, and similarly to Env [47] modulates the EMT-like process of cell transformation. However, in contrast to Env [47], Rec appears to inhibit the transition from proliferative to invasive state and as such may represent a protective factor in melanoma.

Quantitative Real-Time PCR and Semi-Quantitative PCR
Total RNA was extracted using the Direct-zol RNA MiniPrep kit (Zymo Research, Irvine, CA, USA) according to the protocol of the manufacturer. Reverse transcription was performed from 1 µg total RNA with the High Capacity RNA-to-cDNA kit (Applied Biosystems , Waltham, MS, USA). The isolated RNA would be treated with additional DNaseI (Invitrogen, Carlsbad, CA, USA), to get rid of potential DNA contamination. Quantitative real-time PCR (qRT-PCR) was carried out on ABI7900 with PowerUp SYBR Green Master Mix kit (Applied Biosystems, Waltham, MS, USA), according to the recommendations of the manufacturer. For quantification of mRNA expression levels, real-time PCR was run in triplicates for each cDNA sample, using GAPDH and 18 s RNA as the internal controls for SKMel-28 and A375 cells, respectively. For primers and annealing temperatures, see Table 1. Data were analyzed using the comparative CT (2 −∆∆CT ) method, which describes relative gene expression. For RNA-seq validation, the qPCR control was transfected with pT2-CAG-GFP instead of the KD constructs.

RNA Sequencing and Data Analysis
The concentration of RNA was measured on NanoDrop Spectrophotometer ND-1000, and the quality of RNA was analyzed using Agilent RNA 6000 Nano Kit on Agilent 2100 Bioanalyzer machine. The RNA sequencing library was prepared from 550 ng of RNA, using Illumina (San Diego, CA, USA) TruSeq Stranded mRNA LT Set A kit (Cat. no. RS-122-2101), according to TruSeq Stranded mRNA Sample Prep LS Protocol. Sequencing was performed on an Illumina HiSeq 2000 platform as 100 bp first strand-specific paired-end reads.
Sample-specific barcoded sequencing reads were de-multiplexed from multiplexed flow cells and by using CASAVA. 1.8.2 BCL files were converted to FASTQ format files. The quality of the raw sequence reads was determined by using the FastQC. Reads with a quality score below 30 were removed. We removed the highly variable two nt from the ends of the remaining sequencing reads and mapped them over the reference genome (Human hg19/GRCh37) and transcriptome model (hg19.refseq.gtf). hg19/GRCh37 and hg19.refseq.gtf that were downloaded from USCS tables (http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/). For mapping the reads, we used our defined settings (i.e., -alignIntronMin 20 -alignIntronMax 1,000,000 -chimSegmentMin 15 -chimJunctionOverhangMin 15 -outFilterMultimapNmax 20) to STAR splice mapper. This approach generated STAR genome/transcriptome indices of the reads and provided their respective RefSeq gtf (genes and rmsk) annotations. Final read alignments having more than two mismatches per 100 bp were discarded. We obtained uniquely mapped read counts over individual gene bodies using featureCounts function from the subread package, at gene and transcript level using RefSeq annotations. Scalable gene expression levels were calculated as transcript per million (TPM) counts over the entire gene (defined as any transcript located between transcription start and end sites). In the datasets generated in different batches, we normalized the batch effect using the 'RUVSeq' package from R to Viruses 2020, 12, 1303 5 of 25 obtain differentially expressed genes (DEGs). Firstly, we removed the extremely low expressing genes (genes with less than five mapped reads) to avoid the unnecessary variation in transcriptomes. We then used DESeq2, an R package, to determine the differentially expressed genes from replicated data. Data were normalized using the default functions of DESeq2. The package "DESeq2" provides methods to test for differential expression by use of generalized linear models; the estimates of dispersion and logarithmic fold changes incorporate data-driven prior distributions. Genes with an adjusted p-value less than 5% (according to the FDR method from Benjamini-Hochberg), Log 2 Foldchange |1| were assigned as differentially expressed. This strategy provided both quantification and statistical inference of systematic changes between conditions (with at least three replicates). The following samples were RNA sequenced: KD-Scr_A375, KD1-Rec_A375, KD2-Rec_A375, KD3-Rec_A375, KD-Scr_SKMel-28, KD1-Rec_SKMel-28, KD2-Rec_SKMel-28, and KD3-Rec_SKMel-28 in two batches. Canonical pathways and biological function of the differentially expressed genes (DEGs) were further subjected to KEGG and Gene Set Enrichment Analyses (GSEA) using the webgestalt tool.
To estimate expression levels for repetitive elements, we calculated the CPM (counts normalized per million of total reads mappable on the human genome) for each Class II transposable element (TE) locus given in the repeat-masked annotations, downloaded from the UCSC data portal. We had removed those repetitive element's locus that overlays with the genic exons/UTRs or in the vicinity of 1 Kb from the Transcription Start Site (TSS) of annotated genes before mapping the sequencing reads over the individual locus. To calculate the family-wise expression of repetitive elements, we considered multi mapping reads only if they were mapping exclusively within a repetitive family. Then counted one alignment per read to calculate the CPM. The expression level of repeat families is calculated as Log 2 (CPM + 1) prior to comparison. The locus-specific information was only calculated for those HML2-HERV-K loci that encode for Rec transcripts. For this, we used "Salmon" that implements the expectation-maximization algorithm to re-distribute and assign the multi-mapping reads to the provided coordinates, based on evidence from uniquely mapped reads.

Single-Cell RNA-Seq Data Processing
We used the single-cell count matrix from GSE72056. We calculated the activity of genes in every cell at TPM expression levels. We considered samples expressing over 1000 genes with expression levels exceeding the defined threshold (Log 2 TPM > 1). We used Seurat_3.1.1, a package from R to normalize the datasets at the logarithmic scale using "scale.factor = 10,000". After normalization, we calculated scaled expression (z-scores for each gene) for downstream dimension reduction. We used the original annotations of datasets to classify the Malignant, Non-malignant, and Heterogenous cell-types. To define cell population clusters, we employed the FindClusters function of "Seurat" using "PCA" as a reduction method. The specific markers for each cluster identified by "Seurat" were determined by the "FindAllMarkers" function, using "roc" as a test for significance. This provided us two lists of gene sets. (1) Malignant vs. Non-Malignant differentially expressed genes and (2) genes differentially expressed between Melanocyte Inducing Transcription Factor (MITF)-High and MITF-Low tumors, which we applied for comparison with Rec-KD RNA-seq DEGs. Feature plots, violin plots, and heatmaps were constructed using default functions, except for the color scale that was set manually. The annotated cells were re-clustered using the methodologies described above and visualized on the UMAP coordinates.

ATAC-Seq and ChIP-Seq Data Analyses
ATAC-seq and ChIP-seq raw datasets in the sra format were downloaded from the listed studies and converted to fastq format using sratools function fastq-dump-split-3. Fastq reads were mapped against the hg19 reference genome with the modified bowtie2 parameters: -very-sensitive-local. The modified settings are the following: -D (number of matches) was set to 30, N (number of mismatches in seed) was set to 1, and -R (number of re-seed) was set to 5. All unmapped reads with MAPQ < 10 and PCR duplicates were removed using Picard and samtools. MACS2 called all the ATAC-seq peaks with the parameters-nomodel -q 0.01 -B. Blacklisted regions were excluded from called peaks (https://www.encodeproject.org/annotations/ENCSR636HFF/). To generate a set of unique peaks, we merged ATAC-seq peaks using the mergeBed function from bedtools, where the distance between peaks was less than 50 base pairs. We then intersected these peak sets with repeat elements from hg19 repeat-masked coordinates using bedtools intersectBed with a 50% overlap. To calculate the enrichment over the given repeat elements, we first extended 5 Kb upstream and 5 Kb downstream coordinates from the left boundary (TSS) of respective elements in a strand-specific manner. These 10 Kb windows were further divided into 100 bps bins and tags (bedGraph output from MACS2) were counted in each bin. Tag counts in each bin were normalized by the total number of tags per million in given samples and presented as CPM per 100 bps. We averaged CPM values for each bin between replicates before plotting the figures. To find the transcription factor (TF) binding motifs, 25 bps sequences were extended from either side of MITF ChIP-seq peak summits. The extended sequences were analyzed using the RSAT tool (http://rsat.sb-roscoff.fr/). The TF binding motifs were calculated from JASPAR libraries of human TF motifs. Note that publicly available datasets from primary melanoma cultures contain short reads that are not suitable to identify locus-specific expression of the proviruses. Thus, we used ChIP-seq datasets to find out which loci were regulated through active histone marks and transcription factors.

Cell Invasion Assay
The cell invasion assays were performed using transwell chambers (8 µm pore size; Corning Costar, New York, USA) according to the vendor's instructions. Briefly, the insert of the wells was first coated with 50 µL BD matrigel. A375 cells were resuspended with RPMI1640 medium containing 0.5% FCS to reach a concentration of 5 × 10 5 cells/mL. Then, 100 µL cells from each group (stable KD-REC clones, pT2-CAG-GFP control) were seeded into the upper chamber of the insert, with adding DMEM containing 10% FCS to the lower chamber. After incubation at 37 • C in a humidified atmosphere of 5% CO 2 for 24 h, non-invaded cells were removed with gentle swabbing. In contrast, the invaded ones were stained, and then images were taken with microscopy. Cell numbers were calculated with ImageJ. The control was transfected with pT2-CAG-GFP instead of the KD constructs.

LTR5_Hs Loci form Active Chromatin in Cancer and Pluripotent Cell Lines
Cis-regulatory elements driving the transcription of the youngest subfamily of HERV-K are embedded within the human-specific LTR5 (LTR5_Hs) [64,65]. Various transcription factors (TFs) have been reported to bind LTR5_Hs, including the pluripotency factors such as OCT4 [66] or the Melanocyte Inducing (also known as MIcrophthalmia-associated) TF (MITF) [65,67]. To examine more systematically the regulation of HERV-K transcription in different cell types, we mined publicly available data produced by the ENCODE consortium [68]. Briefly, we mapped the peaks from 142 ChIP-seq TF datasets from 7 cell lines (ENCODE) over 615 genomic sequences annotated as LTR5_Hs in the human genome (see Methods). Our integrative analysis of various TF occupancy revealed that multiple TFs bound a group of LTR5_Hs loci in a cell-type-specific fashion. Notably, in H1 embryonic stem cells (H1ESCs) and leukemia K562 cells, LTR5_Hs elements tend to be bound by more TFs than in the other cell lines examined (Figure 3.1A). In the leukemia cell line K562, LTR5_Hs elements are enriched for binding of multiple TFs that control cell cycle and proliferation (e.g., cMYC, MAX, JUND, PU1, and P300, see Figure 3.1A), which might reflect the transformed phenotype of these cells. Curiously, NANOG binding appeared particularly prominent when H1 cells were cultured in naïve ("3iL") conditions [69] (Figure 3.1B).    [35]. Note that in the following analyses we used the data of MITF-High and MITF-Low as proliferative (n = 3) and invasive (n = 2). TPM-transcript per million.

MITF-Regulated LTR5_Hs/HERV-K Expression Is a Hallmark of the 'Proliferative' Type of Melanoma
It has been previously reported that in melanoma, the melanocyte-specific isoform MITF(M) is expressed and promotes the expression of HERV-K transcripts likely through direct binding to LTR5_Hs [67]. To further substantiate that MITF drives HERV-K expression in melanoma, we examined the expression levels of various transposable elements (TEs) in MITF-depleted melanocytes and melanoma cells using previously published RNA-seq datasets [70]. Knocking down MITF affected the expression of small subsets of TE families that were differentially expressed in both melanocytes and melanoma, but there was only a 5% overlap in differential TE expression between the two cell types (Figure 3.1C,D). Notably, KD MITF was associated with a significant decrease in HERV-K expression (adjusted p-value < 0.01, Benjamini-Hochberg (BH) correction) in melanoma, but not in melanocytes ( Figure 3.1C), suggesting that MITF may regulate HERV-K expression specifically in melanomas.
MITF(M) is a critical transcription factor associated with melanoma progression [71,72], and is a sensitive marker of melanoma invasiveness: it is highly expressed (e.g., MITF-High) in the melanocytes and the proliferative state of melanoma, but lowly (e.g., MITF-Low) expressed in the invasive state [73]. To investigate how tightly HERV-K is controlled in melanoma by MITF, we mined ChIP-seq and RNA-seq data from melanocytes, various melanoma cell lines, and primary melanoma cultures (Methods). Characterizing primary melanoma cultures according to their MITF levels ( Figure 3.1E) revealed an inverse correlation between LTR5_Hs/HERV-K RNA levels and invasiveness ( Figure 2A). To see if additional TE families respond to melanoma proliferative/invasive status, we performed a more systematic analysis including other TE families. This analysis revealed that the expression of certain ERVs, including LTR5_Hs/HERV-K, LTR2C, and LTR13, was correlated with that of genes characterizing the proliferative state of melanoma ( Figure S1A). When compared to the overall expression of other TE families, LTR5_Hs/HERV-K was the most abundantly expressed TE family and showed the strongest positive correlation with MITF levels (rho = 0.43, corrected p-value < 0.2 × 10 −9 ) in the melanoma primary cultures (N = 10) ( Figure 2B). These data suggested that similarly to MITF, LTR5_Hs/HERV-K expression correlated negatively with melanoma invasiveness. Next, we characterized the regulation of LTR5_Hs activation in the proliferative type of melanoma cells by layering ChIP-seq occupancy signals of MITF and H3K4Me3, H3K27Ac ( Figure 2C and Figure S1B-D). Relative to the invasive state, the proliferative state showed enrichment of LTR5_Hs for these "active" histone marks, coupled with MITF binding ( Figure 2C and Figure S1C,D). We found that MITF binding at LTR5_Hs loci coincided significantly with the presence of the canonical E-box motif (CA(C/T)GTG) (p-value = 0.026, chi-square test of Observed vs. Expected ratio), which is recognized by MITF as well as c-MYC and MAX TFs [74] ( Figure 2C). Overall, our analysis suggests that the upregulation of HERV-K expression in proliferative melanoma is likely driven at least in part by direct binding of MITF over the LTR5_Hs through the E-box motif.

Inhibition of BRAF V600E Mutant Leads to the MITF Binding over LTR5_Hs/HERV-K in Melanoma
We also investigated the transcriptional regulation of HERV-K in the background of the BRAF V600E mutation, resulting in a constitutive activation of the mitogen-activated protein kinase (MAPK) pathway [75]. PLX4032, a commonly used small molecule compound in the clinic for the

Inhibition of BRAF V600E Mutant Leads to the MITF Binding over LTR5_Hs/HERV-K in Melanoma
We also investigated the transcriptional regulation of HERV-K in the background of the BRAF V600E mutation, resulting in a constitutive activation of the mitogen-activated protein kinase (MAPK) pathway [75]. PLX4032, a commonly used small molecule compound in the clinic for the treatment of melanoma, blocks constitutive MAPK signaling via BRAF V600E [76,77]. PLX4032 has been also reported to modify MITF activity and expression in certain melanomas [78], including the highly invasive BRAF V600E mutant COLO829 melanoma line [79], resulting in a MITF-High/AXL-Low phenotype. In COLO829 cells, the MITF global binding profile has a reciprocal pattern at a subset of MITF-binding sites in melanocytes vs. melanoma, while PLX4032 treatment results in re-occupancy along with a switch between invasion types [79]. To examine how PLX4032 treatment may affect HERV-K regulation, we reanalyzed publicly available MITF ChIP-seq datasets of primary melanocytes and COLO829 with and without PLX4032 treatment (GSE50681). We observed that PLX4032 treatment results in a substantial increase of MITF ChIP-seq peaks over LTR5_Hs elements (123 out of 615 loci were bound under PLX4032 treatment but only 19 were bound in the control cells) (Figure 2D), suggesting that MITF binds LTR5_Hs loci, preferentially in BRAF inhibited melanoma cells. Among all TE families in the human genome, the gain of MITF binding under PLX4032 treatment was unique to the LTR5_Hs family (p-value < 0.01, chi-square test of observed vs. expected ratio) ( Figure 2E). While MITF is also expressed and binds to a subset of LTR5_Hs elements in healthy pigmented melanocytes (n = 33), there was no elevated MITF ChIP-seq signal over LTR5_Hs elements upon PLX4032 treatment of those cells ( Figure 2D). Thus, the proliferative state of melanoma is associated with the robust and widespread binding of LTR5_Hs by MITF.
Could MITF-bound LTR5_Hs loci function as enhancers for neighbor genes? To answer, we reanalyzed the set of genes that were (i) reciprocally responding to the transition of melanoma invasion types; (ii) upregulated after the PLX4032 treatment along with MITF; (iii) had an LTR5_Hs in their neighborhood (10 Kb from Transcription Start Site (TSS)) that are bound by MITF. Among them, we identified seven genes (e.g., PRODH, BFAR, PTAR1, NRBP2, NCOA7, SLC35A5, and NDUFAB1) ( Figure 2F,G and Figure S3A-F). Notably, the role of PRODH has been implicated in various cancers [80,81]. Furthermore, the upstream LTR5_Hs is a validated enhancer of the PRODH gene in embryonic carcinoma cells [82]. Our analysis suggested that PRODH was regulated by the neighboring LTR5_Hs enhancer during the reversion of the invasive melanoma phenotype ( Figure 2F,G).
Next, we seek to gain insight into the transcriptional activity of individual HERV-K loci in different melanoma states. Because available RNA-seq datasets were all using short reads, which precludes locus-specific analysis of recent TEs like HERV-K, we turned to chromatin ChIP-seq data (which is more mappable) as a proxy to predict transcriptionally active HERV-K loci [83]. Namely, we used the published dataset for H3K27ac and MITF ChiP-seq (GSE60663) for low passage melanoma cells and also ATAC-seq and H3K27Ac data for the A375 melanoma cell line (GSE82334). Using this approach, we were able to identify twelve loci (nine HERV-K proviruses and 3 solo LTR5_Hs) bound by MITF and/or with active chromatin marks (H3K27ac, ATAC peak) in MITF-High primary cells (but not in MITF-Low cells) (Table S2). For example, Figure 3A presents an illustration of a region on chromosome 7 that includes the ERVK6 locus, which is structured as two nearly-identical tandemly arranged proviruses [84], duplicated around a centrally located third LTR5_Hs, encoding for apparently intact ORFs [28]. The 5 -flanking LTR5_Hs of the tandem proviruses showed a higher H3K27Ac signal in both proliferative MITF-High and A375 cells relative to MITF-Low cells, and also significant ATAC-seq and H3K27ac signals in A375 ( Figure 3A), indicating that these loci are preferentially active in the proliferative type of melanoma. Moreover, genome-wide analyses of ATAC-seq and active histone marks (H3K27ac and H3K4Me1) in A375 cells further support the notion that multiple LTR5_Hs elements (>50) may function as promoters or enhancers in this melanoma cell line ( Figure 3B,C). Collectively, these regulatory genomics data suggest that MITF binding contributes to the activation of multiple LTR5_Hs loci copies (both solo and proviral) in the proliferative type of melanoma. 1, ** p-value < 0.01, and *** p-value < 0.001, t-test). Note that the given mapping strategies of RNA-seq datasets over repetitive sequences the expression level upon KD Rec might not be directly compared with qPCR results (see Figure 3D).

Depleting HERV-K-Rec May Induce an EMT-Like Process in A375 Melanoma Cells
From the nine identified actively transcribed proviruses, five were reported to express Rec, from which three were also detected in various healthy tissues [28]. To gain insight into a potential role of Rec in melanoma, we first used an RNAi approach to deplete Rec expression in two melanoma cell LTR5_Hs-HERV-K sequences at the ERVK6 locus on chr7, encoding for Gag, Env, and known to produce Rec (UniProt). Note (i) that ERVK6 is a duplicated, tandemly arranged provirus, flanked by two LTR5_Hs sequences, duplicated around a centrally located third LTR5_Hs (see also Figure S2 (* p-value < 0.1, ** p-value < 0.01, and *** p-value < 0.001, t-test). Note that the given mapping strategies of RNA-seq datasets over repetitive sequences the expression level upon KD Rec might not be directly compared with qPCR results (see Figure 3D).

Depleting HERV-K-Rec May Induce an EMT-Like Process in A375 Melanoma Cells
From the nine identified actively transcribed proviruses, five were reported to express Rec, from which three were also detected in various healthy tissues [28]. To gain insight into a potential role of Rec in melanoma, we first used an RNAi approach to deplete Rec expression in two melanoma cell lines (A375 and SKMel-28). A375 and SKMEL-28 both carry the BRAF V600E mutation, but A375 has a higher proliferation rate, whereas SKMEL-28, similar to COLO829, is more invasive [85], indicating their distinct invasion phenotypes. SKMel-28 also differs from A375 as it does not express MITF at significant levels [78,86]. To knockdown (KD) Rec, we designed three RNAi constructs ( Figure S2) to target the sequence of the ERVK6 locus, for which we observed marks of high transcriptional activity in MITF-High and A375 melanoma cells ( Figure 3A). Nevertheless, due to the high level of Rec sequence similarity among several HERV-K proviruses [15], we predicted our KD constructs would target Rec transcripts produced by most transcriptionally-competent HERV-K proviruses [15,28]. In addition, we generated control cell lines expressing scrambled sequences. In total, we generated 10 stable cell lines (6 KD and 4 controls) using Sleeping Beauty as a stable vector [62,63]. Real-time qRT-PCR experiments showed that Rec transcript levels were depleted in each of the six independent KD cell lines isolated for A375 and SKMel-28 ( Figure 3D). We also observed that knocking down Rec also led to an increased level of alternative HERV-K transcripts encoding Gag and Env in SKMel-28 (all three KD lines) and in A375 (in the A375-KD1 line only) ( Figure 3E).
To detect transcriptome changes upon Rec KD, we used RNA-seq to compare the stable KD-Rec lines to their respective control lines. While SKMel-28 expresses Rec from three genomic loci [43], we were able to unambiguously identify five Rec-expressing loci in A375 ( Figure 3F). Next, we computed the differentially expressed genes (DEGs, |2|-fold and FDR < 0.05) between the Rec KD and their control lines (Methods). We obtained 171 and 773 DEGs upon Rec KD in SKMel-28 and A375 respectively ( Figure 4A,B, Figure S1F, Table S1, see Methods). In SKMel-28, the 171 DEGs were not significantly enriched for any Gene Ontology (GO) categories. In contrast, the 773 DEGs in A375 showed significant enrichment for several GO terms including pigmentation, cell migration, IL1 production and cell proliferation (FDR < 0.05) ( Figure 4A).
Notably, we identified a set of genes involved in epithelial-mesenchymal transition (EMT)-like process, which plays a critical role in phenotype switching from the proliferative to invasive state [87][88][89]. Intriguingly, this set of DEGs included matrix metallopeptidase 2 (MMP2), a classical marker of cell invasion along with additional genes previously reported to drive the canonical EMT process [90], such as a zinc finger protein (SNAI2), forkhead box C2 (FOXC2), cadherin 2 (CDH2) and goosecoid Homeobox (GSC) ( Figure 4B). Further, the genes upregulated upon Rec KD in A375 cells included factors such as WNT5B and GSK3B, which are known to induce the EMT-like process and are activated when melanoma gets relapsed to metastasis after failed treatment ( Figure 4B) [91,92]. During melanoma metastasis, a TF regulatory network characterized by SNAI2 low /ZEB2 low /ZEB1 high is established [93][94][95]. The same TF expression signature is observed upon KD of Rec in A375 cells ( Figure 4B). Beside canonical EMT markers, we also observe a strong upregulation of several members of the SPANX (sperm protein associated with the nucleus in the X chromosome) family of genes upon Rec KD in A375 cells ( Figure 4B). The expression of SPANX genes is normally confined primarily to the germline, but upregulation of SPANXB1/2 has been previously reported in invasive melanoma [96]. Furthermore, we detected a robust decrease in the transcript level of MITF(M), the master regulator of the melanocyte differentiation program, upon Rec KD in A375 cells ( Figure 4B). In addition to MITF, we also observed a decreased expression of other melanocyte differentiation markers (e.g., TYR, PMEL) [97] (Figure 4C). Together, the transcriptome changes we observed upon Rec KD in A375 cells suggests that Rec expression is part of a regulatory cascade inhibiting the EMT-like process. down ( Figure 4F), indicating that MITF and Rec expression levels are tightly associated in melanoma cells. Following the lead on potential MITF and Rec co-regulation, we also asked whether the MITFtarget genes (ChIP-seq peaks around the TSS) were among the 773 DEGs in our Rec-KD dataset. This approach identified a set of 167 MITF target genes, whose expression was significantly depleted upon KD Rec (p-value = 2.654 × 10 −14 , chi-square test) (Figure 4G), suggesting a reciprocal coregulation of MITF and Rec expression.  To corroborate these results, we chose two of the most DEGs in our transcriptomic analysis (SPANXB1/2 and MITF(M)) and performed RT-qPCR to compare the mRNA levels of these two genes in the three independent Rec KD A375 cell lines and their control lines. The results were consistent with the RNA-seq analysis: SPANXB1/2 was strongly upregulated ( Figure 4D) and MITF(M) was robustly downregulated upon Rec KD ( Figure 4E).
To substantiate the connection between MITF and Rec, we also determined the level of Rec expression in MITF-depleted cells (GSE61966). Importantly, the expression of Rec was downregulated upon knocking down MITF in the melanoma cell lines ( Figure 4F). Our analysis revealed that, upon MITF depletion, the overall expression of Rec-coding loci, including ERVK6, was down ( Figure 4F), indicating that MITF and Rec expression levels are tightly associated in melanoma cells. Following the lead on potential MITF and Rec co-regulation, we also asked whether the MITF-target genes (ChIP-seq peaks around the TSS) were among the 773 DEGs in our Rec-KD dataset. This approach identified a set of 167 MITF target genes, whose expression was significantly depleted upon KD Rec (p-value = 2.654 × 10 −14 , chi-square test) (Figure 4G), suggesting a reciprocal coregulation of MITF and Rec expression.

Depletion of Rec Results an Enhanced Cell Invasion in A375 Melanoma
To further explore the idea that Rec KD in A375 cells recapitulates part of the expression profile of invasive melanoma, we examined the correlation between DEG upon Rec KD and those differentially expressed between invasive and proliferative primary melanoma cultures. We found a significant positive correlation pattern between the two sets of DEGs (Spearman's rho = 0.20, p-value < 0.05, asymptotic t approximation) ( Figure 5A). Overall, these analyses suggest that depletion of Rec in A375 cells may result in an increased invasion phenotype. . *** denote the significance of differential expression (p-value < 0.05, t-test). (G) Box plot showing the differential expression of genes that were bound or not by MITF in PLX4032 treated melanoma within the set of 773 significant DEGs upon Rec-KD (adjusted p-value < 0.05).

Depletion of Rec Results an Enhanced Cell Invasion in A375 Melanoma
To further explore the idea that Rec KD in A375 cells recapitulates part of the expression profile of invasive melanoma, we examined the correlation between DEG upon Rec KD and those differentially expressed between invasive and proliferative primary melanoma cultures. We found a significant positive correlation pattern between the two sets of DEGs (Spearman's rho = 0.20, p-value < 0.05, asymptotic t approximation) ( Figure 5A). Overall, these analyses suggest that depletion of Rec in A375 cells may result in an increased invasion phenotype.
To test whether Rec expression modulates the invasiveness of melanoma cells, we performed a trans-well invasion assay to compare the invasiveness of Rec KD A375 cells to their control lines. The results showed that all three Rec KD lines exhibited significantly elevated invasiveness ( Figure 5B,C). These data support the hypothesis suggested by the transcriptome analyses that Rec may function as a suppressor of the EMT-like transition process, which modulates the invasiveness of melanoma.  To test whether Rec expression modulates the invasiveness of melanoma cells, we performed a trans-well invasion assay to compare the invasiveness of Rec KD A375 cells to their control lines. The results showed that all three Rec KD lines exhibited significantly elevated invasiveness ( Figure 5B,C). These data support the hypothesis suggested by the transcriptome analyses that Rec may function as a suppressor of the EMT-like transition process, which modulates the invasiveness of melanoma.

Comparison of the Rec KD A375 with the Malignant Cells in Melanoma Patients
Next, we sought to examine whether a set of dysregulated genes in Rec KD A375 cells were also differentially expressed in malignant cell types. To assess this, we first performed a single-cell (sc) RNA-seq analysis (n = 4645 cells) derived from melanoma patients (n = 19), which includes both malignant and non-malignant cells [98]. Consistent with the original study, our analysis distinguishes the malignant from the non-malignant cells based on the expression of the most variables genes ( Figure 6A) and identifies four major malignant cell types, marked by MAGEC2, MITF, APOE (Apolipoprotein E), and SPP1 (Osteopontin) expression, respectively ( Figure 6A-C). It is known that high levels of MITF, APOE, and SPP1 mark different aspects of melanoma progression, i.e., melanocyte differentiation, angiogenesis [99], and inflammation [100], respectively. Interestingly, those three genes were all significantly downregulated in the Rec KD A375 cells ( Figures 4B and 7A), suggesting that the KD of Rec in those cells recapitulates multiple phases of melanoma.

Discussion
HERV-K resembles "complex" retroviruses such as HIV-1 by its ability to encode multiple proteins, including Rec. While Env has the typical structure of a transmembrane protein (e.g., extracellular, transmembrane, and cytoplasmic domains), Rec lacks the domains required for a membrane protein function ( Figure 8A). Still, it carries nuclear localization/export signals [8,19]. Although Env and Rec are different in sequence and function, our results suggest that Rec, like Env [47], exerts a modulatory effect on the EMT-like process in cancer progression. However, the two proteins appear to act in opposite direction. While Env overexpression (both RNA and protein) may promote tumor progression through ERK1/2 activation [47], our KD experiments indicate that Rec may inhibit the EMT process and oppose melanoma metastasis. Interestingly, similar to Env, expression of the Gag protein encoded by HERV-K, also increase the tumorigenic potential of Having characterized the expression profile of malignant and non-malignant cells from melanoma patients, we next compared the~700 DEGs ( Figure 7B, see Methods) between non-malignant and malignant cell types to those detected in our Rec KD A375 experiments (n = 773). There were 94 overlapping genes, showing a significant enrichment over a random expectation. Genes showing malignant-specific expression tend to be upregulated in KD-Rec ( Figure 7C).
To substantiate that Rec regulated genes are overrepresented in the obtained cell-types within malignant or non-malignant clusters, we performed an additional analysis, and use only the top 50 marker genes of each cluster (both malignant and non-malignant) that overlaps with differentially expressed genes upon Rec-KD ( Figure 7D). This analysis detects the markers of the MITF overexpressing cell clusters at the highest frequency among the down-regulated genes upon Rec KD in A375 ( Figure 7E).
Lastly, we sought to assess whether genes predicted as downstream targets of MITF from the same scRNA-seq analysis [98], were also affected in our Rec KD experiments. We found that 41 out of the top 100 MITF predicted downstream targets were included in our list of 773 DEGs, a significant enrichment over the random expectation [p-value, test]. Strikingly, 39 out of the 41 overlapping genes were downregulated by Rec KD (p-value < 2.2 × 10 −16 , chi-square test of observed vs. expected). These observations are consistent with the finding that MITF is downregulated in our Rec KD lines, thereby affecting MITF downstream targets (n = 41 out of 100, Wilcoxon-test, p-value < 1.2 × 10 −11 ) ( Figure 7F). This analysis suggests that the downstream targets of MITF might be depleted as the consequence of MITF downregulation upon KD Rec. Thus, our KD Rec A375 model recapitulates MITF-associated aspects of phenotype switching in melanoma patients to some degree.

Discussion
HERV-K resembles "complex" retroviruses such as HIV-1 by its ability to encode multiple proteins, including Rec. While Env has the typical structure of a transmembrane protein (e.g., extracellular, transmembrane, and cytoplasmic domains), Rec lacks the domains required for a membrane protein function ( Figure 8A). Still, it carries nuclear localization/export signals [8,19]. Although Env and Rec are different in sequence and function, our results suggest that Rec, like Env [47], exerts a modulatory effect on the EMT-like process in cancer progression. However, the two proteins appear to act in opposite direction. While Env overexpression (both RNA and protein) may promote tumor progression through ERK1/2 activation [47], our KD experiments indicate that Rec may inhibit the EMT process and oppose melanoma metastasis. Interestingly, similar to Env, expression of the Gag protein encoded by HERV-K, also increase the tumorigenic potential of melanoma cells [46]. Because we observed that KD Rec leads to the upregulation of Gag and Env in one of the three KD lines we established using three different RNAi constructs, we cannot exclude that some of the transcriptomic and phenotypic (invasiveness) changes we observed are due to indirect effects on Gag and Env levels. Importantly, however, we found that all three KD lines behaved similarly in terms of some of the key differences in gene expression (e.g., MITF(M), compare KD lines in Figure 4E) or invasiveness ( Figure 5). Thus, it appears that Rec's oncogenic potential may oppose that of Env or Gag proteins encoded by HERV-K, and when HERV-K is overexpressed, the balance between these different products may be key to maintain cellular homeostasis.
In about 60% of melanomas, the protein kinase BRAF is mutated (BRAF V600E ) [75]. The BRAF V600E mutation happens at an early stage of the oncogenic process, and is already reported from benign naevi [102]. MITF is considered a crucial factor in melanoma invasiveness [103,104] and plays a critical role in controlling BRAF V600E melanoma [105,106]. Accordingly, the expression of MITF has been widely used as a marker to distinguish between the proliferative and invasive types of melanoma cells [86,103,105,107]. Reduced MITF expression generates invasive melanomas, with metastasis-promoting properties [105].
Our data suggest that MITF activates LTR5_Hs in melanoma (not in melanocytes); thus, Rec expression specifically marks the proliferative type of melanoma cells. If Rec expression is involved in maintaining a delicate balance between cell proliferation and invasion, as this study suggests, then Rec activation might be used as a sensitive marker to distinguish between the proliferative vs. invasive phenotypes of melanoma.
Of note, the application of single-cell analysis and advanced genomic technologies has refined our understanding of melanoma progression, and, instead of the invasive and proliferative states, suggested a melanocyte differentiation gradient [104]. It was also observed that the chromatin landscape differs between the phenotypes, and for MITF to drive the transcription of proliferation and differentiation genes, the transition might require additional (e.g., epigenetic) changes [73,108].
In addition to the Rec-expressing loci, MITF targets several solo LTR5_Hs that could also function as cis-regulatory sequences (enhancers, promoters) for nearby genes [64]. Notably, some of these LTRs harbor the E-box CA(C/T)GTG motif, which is also recognized by cMYC and MAX oncoproteins [72], suggesting a role of HERV-K-derived regulatory sequences in other oncogenic processes as well. Because several LTR5_Hs elements are insertionally polymorphic in the human population [2][3][4][5], this type of variation could also have implications in melanoma progression and deserves future work. While the in vitro melanoma cultures are not able the fully reproduce melanoma progression, the various melanoma lines might mimic different aspects of it. The COLO829 line represents an invasive type of melanoma that is still convertible (upon PLX4032 treatment) to a proliferative phenotype, whereas the A375 line is a relatively good model of the MITF expression-associated features of a proliferative type of melanoma.
It has been widely discussed how precisely melanoma can be recapitulated in animal models (reviewed in [110]). Interestingly, the analysis of single-cell transcriptome data from multiple human patients suggests a distinct separation of proliferative and invasive types of cells within a patient [5]. However, these two phases do not co-occur in mice [98,111] As Rec (and HERV-K(HML2)) is Under healthy physiological conditions, apparently, a limited number (n = 18, or often less) of HERV-K loci is active [28], whereas, in melanoma patients and/or cell lines, the number of transcribed HERV-K loci is higher (n~24) [43]. These studies indicate that there are specific HERV-K loci transcribed under pathological conditions. Although forced expression of MITF(M) in non-malignant cells has been shown to enhance the levels of various HERV-K-derived transcripts (e.g., Gag, Env, and Rec) [67], the activated HERV-K loci might differ in both the level of expression and composition of their HERV-K-derived transcripts. Furthermore, a variable set of HERV-K loci seems to be expressed in different melanoma patients [43,109], suggesting that variability in the amount of HERV-K products may contribute to melanoma heterogeneity [37,42], and possibly the progression and outcome of the disease.
Our study indicates that the depletion of Rec in BRAF V600E mutant melanoma cells (A375) results in reduced MITF levels. Thus, our data suggest a model whereby a MITF-dependent regulatory loop is disrupted upon Rec KD through (i) decreased MITF binding to LTR5_Hs, which leads to (ii) reduced Rec levels and (iii) upregulation of EMT markers, contributing to (iv) escalated cell invasion ( Figure 8B). Future research is required to decipher the mechanism by which MITF is downregulated upon Rec KD. It would also be interesting to find out whether Rec overexpression can lead to enhanced MITF expression in invasive melanoma, and revert the invasive phenotype to a proliferative one. This would answer whether the impact of Rec on melanoma proliferation status is correlative or (directly or indirectly) causal.
While the in vitro melanoma cultures are not able the fully reproduce melanoma progression, the various melanoma lines might mimic different aspects of it. The COLO829 line represents an invasive type of melanoma that is still convertible (upon PLX4032 treatment) to a proliferative phenotype, whereas the A375 line is a relatively good model of the MITF expression-associated features of a proliferative type of melanoma.
It has been widely discussed how precisely melanoma can be recapitulated in animal models (reviewed in [110]). Interestingly, the analysis of single-cell transcriptome data from multiple human patients suggests a distinct separation of proliferative and invasive types of cells within a patient [5]. However, these two phases do not co-occur in mice [98,111] As Rec (and HERV-K(HML2)) is hominid-specific, our study might explain certain species-specific features of melanoma progression in humans, and stress the potential limitations of using animal models in melanoma studies.