The Tumor-Specific Expression of L1 Retrotransposons Independently Correlates with Time to Relapse in Hormone-Negative Breast Cancer Patients

Background: Long-Interspersed Nuclear Element (L1) retrotransposons are silenced in healthy tissues but unrepressed in cancer. Even if L1 reactivation has been associated with reduced overall survival in breast cancer (BC) patients, a comprehensive correlation with clinicopathological features is still missing. Methods: Using quantitative, reverse-transcription PCR, we assessed L1 mRNA expression in 12 BC cells, 210 BC patients and in 47 normal mammary tissues. L1 expression was then correlated with molecular and clinicopathological data. Results: We identified a tumor-exclusive expression of L1s, absent in normal mammary cells and tissues. A positive correlation between L1 expression and tumor dedifferentiation, lymph-node involvement and increased immune infiltration was detected. Molecular subtyping highlighted an enrichment of L1s in basal-like cells and cancers. By exploring disease-free survival, we identified L1 overexpression as an independent biomarker for patients with a high risk of recurrence in hormone-receptor-negative BCs. Conclusions: Overall, L1 reactivation identified BCs with aggressive features and patients with a worse clinical fate.


Introduction
The "Junk DNA" definition was coined to describe the large fraction of the noncoding, redundant and repeated human genome [1,2]. The Long-Interspersed Nuclear Element (LINE-1 or L1) sequences are randomly widespread and belong to this portion of human DNA, representing about 18% of the total amount. These repeated sequences (about 500,000 repetitions) belong to the family of transposable elements, with a unique self-induced "copy and paste" mechanism through the transcription and translation of two open reading frames (ORFs) (ORF1 and ORF2) [3][4][5]. In particular, ORF2 exhibits a retro-transcriptase activity, thus leading to the label of "retrotransposons". During their life cycle, L1s are transcribed and translated through the standard cellular apparatus, and the two ORF-encoding proteins lead to the production of a cDNA molecule that is subsequently reintegrated into a genomic site different from the original one. Nevertheless, this cycle rarely closes because of the presence of frequent truncating mutations scattered along most of the L1s or the involvement of different L1-silencing mechanisms, among which promoter methylation is the best known [6].
In healthy adult human cells, this process is negatively monitored, and retro-transpositions are rarely identified. However, in stressing cell conditions such as cancer, the deregulation of global DNA methylation favors L1 reactivation [7,8]. Alongside the "historical" carcinogenic insertions of L1 reported in genes such as APC (in a colon cancer patient) [9] and MYC (in a breast tumor patient) [10], L1 sequences have been widely exploited as a surrogate prognostic biomarker of global methylation in different cancer types [11]. L1 reactivation can be monitored by evaluating mRNA/protein expression of the ORFs [12] or the L1 de novo insertions [13]. Confirming previous evidence [11], L1 de novo insertion was reported in colon, lung and breast cancers (BC) [14]. Accordingly, preclinical studies have also described an increase in L1 expression in BC cells [15,16] and tumors [17], with potential clinical implications. In this regard, the pioneering work by Chen and colleagues clearly showed a correlation between L1 protein expression and aggressive features in a cohort of 95 BCs, together with a lack of expression in tumor-free mammary glands [17]. Despite this, many questions about the clinicopathological characteristics of BC patients with L1 sequence reactivation are still open.
To assess the poorly known correlation between L1 reactivation and the clinical and molecular characteristics of breast carcinomas, we evaluated the expression of these retrotransposons in pre-clinical and clinical settings, investigating BC and non-transformed mammary epithelial cells along with a cohort of 210 BCs and 47 normal mammary glands. To this purpose, we assessed the expression of L1 transcript (mRNA) by using two independent qRT-PCR assays encompassing the ORF1 and ORF2 sequences. The mRNA expression results were then compared with several molecular and clinical data, including promoter methylation, molecular subtyping and disease-free survival (DFS).
The genetic identity was confirmed by short tandem repeat profiling (PowerPlex ® 16 HS System, Promega, Madison, WI, USA) evaluated in September 2021.

Breast Cancer Patient Cohort
We analyzed a total of 210 retrospective BCs collected at the Candiolo Cancer Institute. All patients signed a written informed consent (Ethical committee of FPO-IRCCS Candiolo Cancer Institute, protocol code: PROFILING #001-IRCC-00IIs-10, approved on the 2 November 2011), as reported [18]. Patient clinical and pathological features are reported in Supplementary Table S2. Data about hormone receptor (HR) status (expression of estrogen and progesterone receptors, ER and PgR, respectively), HER2 expression and tumor proliferation levels (Ki67 labelling index), as well as details related to histologic type, stage (pT and pN status) and tumor grade were retrieved from pathology reports. Tumor infil-trating lymphocytes (TILs) were assessed following the International Recommendations of TILs scoring [19]. Information about relapse was obtained for all the patients, allowing the assessment of disease-free survival (DFS). As controls for L1 expression, we analyzed 47 normal mammary gland tissues, derived from prophylactic mastectomy.

Nucleic Acids Extraction
DNA was extracted from cells using the Blood & Cell Cultured Mini Kit (Qiagen, Hilden, Germany), while the RNeasy Mini Kit (Qiagen, Hilden, Germany) was used for RNA extraction. TURBO ™ DNase (ThermoFisher Scientific, Waltham, MA, USA) treatment was additionally performed, according to the manufacturer's protocol, to avoid the problem of genomic contamination in qRT-PCR.
After review with hematoxylin and eosin (H&E), FFPE tumoral sections were dissected to obtain nucleic acids from tissue areas containing at least 100 cells with >50% tumoral cells. Maxwell RSC FFPE RNA Kit (Promega) was used to obtain RNA from the 210 FFPE BC samples and the 47 FFPE BC normal mammary glands. Genomic DNA was purified from a subset of 41/210 FFPE tissues using the GeneRead DNA FFPE Kit (Qiagen, Hilden, Germany). Nucleic acids were quantified with the Qubit ® 2.0 fluorometer (ThermoFisher Scientific, Waltham, MA, USA).

qRT-PCR of L1-ORF1 and ORF2 Expression
One µg of purified RNA was reverse-transcribed in a mix containing both oligo(dT) and random hexamers using SuperScript ™ IV VILO ™ reverse transcriptase (ThermoFisher Scientific, Waltham, MA, USA).
The expression of L1-ORF1 and ORF2 transcripts were evaluated in triplicate by qRT-PCR using two independent couples of primers and specific thermal profiles, reported in Supplementary Table S3. Primers were designed using the L1 reference sequence (GenBank accession number: AH005269.2). Relative expression quantification (RQ) was calculated using GAPDH and 18S as endogenous controls (ECs) and the following formula:

PAM50 Subtyping
Molecular subtyping was performed using the Prosigna ® assay (NanoString Technologies Inc., Seattle, WA, USA) based on the classification algorithm PAM50 [20][21][22][23][24] and the analysis system Dx nCounter on BC cells and on the 210 BC FFPE tissues. Briefly, 250 ng was used for the assay. For each sample, 10 µL of the Hybridization Buffer, 5 µL of the Reporter CodeSet and 5 µL of the ProbeSet were used in the hybridization reaction, which took place at 65 • C overnight. RCC data were shared to NanoString to apply the PAM50 private algorithm and assign the specific BC molecular subtype.

Mutational Analyses
The 41 BCs with available DNA were mutationally characterized with a 147-hotspot breast cancer gene panel using the MassARRAY ® system (Agena Bioscience, San Diego, CA, USA) based on the MALDI-TOF (matrix-assisted laser desorption ionization time-of-flight) method, as previously reported [25].

L1 Promoter Methylation Analysis
L1 methylation was studied for both cancer cells and the 41 BCs with available DNA by performing quantitative bisulfite pyrosequencing in triplicate using the Pyro-MarkQ96 pyrosequencing (Qiagen, Hilden, Germany), as previously reported [26]. Briefly, for each sample, 300 ng of DNA was converted by MethylEdge Bisulfite Conversion System (Promega, Madison, WI, USA). Five µL of each converted DNA sample was amplified with the thermal profile, primers (designed using the L1 reference sequence GenBank accession number: X58075) and amplification mix reported in Supplementary  Table S3. The primed single-stranded DNA templates were subjected to pyrosequencing by using the following primer: L1 seq 5 -GGTGTGGGATATAGTT-3 , using the Pyro-MarkQ96 software (v.2.5), which analyzed the following sequence: TT/CGTGGTGT/CGTT/ CTTTT/CTTAAGTT/CGGTTTGAAAAGT/C through the following dispensation order: A TCAGTGTGTCAGTCAGTCTCAGTCAGTGAGTC. Methylation levels were obtained considering C/T in positions 2, 3, 5 and 6 of the pyrograms. An overall L1 methylation level was calculated as the average of the proportion of C (%) at the 4 CpG sites. The fourth position was adopted as a site of control.

Statistical Analysis
Data analysis was carried out with the SPSS version 20.0 software (IBM, Armonk, NY, USA) by three independent authors (E.B., C.D., S.E.B.).
Student's t-distribution was used to determine the distribution of L1-ORF1 and L1-ORF2 expression and the Fisher's Exact Test to evaluate contingency tables for nominal variables. Survival analysis of disease-free survival (DFS) was evaluated with the Kaplan-Meier method, and groups were compared with the log-rank test. DFS was calculated as the interval between the beginning of the therapy to the time of progression. The Cox regression model (logistic regression, backward) was used to perform the multivariate analysis for tumor relapse. All the available p-values are reported within the text and/or in the tables. p < 0.05 was considered statistically significant. ROC curves were produced with the pROC package (R v4.0.2).

L1 Expression in Breast Cancer Cell Lines
We first assessed L1 expression by evaluating the relative quantification RQ (∆Ct) of ORF1 and ORF2 in twelve BC and two non-transformed mammary gland cell lines.  Figure 1A). ORF1 and ORF2 expression levels were highly correlated (p < 0.01, ρ Spearman = 0.91, Figure 1B). Basal-like cells displayed the highest RQ values; in particular, the HCC1187, MDA-MB468 and HS-578T cells had the highest content of L1-ORFs. Conversely, Luminal cells homogeneously showed small amounts of L1 transcripts. Of note, the two HER2-enriched BC cell lines showed opposite levels of L1-ORF expression (MDA-MB453 were L1-expressive, whereas SKBR3 were L1 non-expressive cells). By dichotomizing cells as basal-like versus non-basal, L1-ORF levels were significantly higher in the basal-like group (p = 0.019, Figure 1C). Interestingly, we identified a trend of L1 enrichment in BC cells with the highest metastatic potential (defined from [27][28][29]) (p = 0.09, Supplementary Table S1).

L1 Expression in Breast Cancer Tissues
To confirm the BC-exclusive expression of ORF1 and ORF2, we compared the RQs None of the tumor samples showed an ORF amount lower than the most expressive normal tissue (ORF1 (RQ) : 0.55, range: 0.11-3.47; ORF2 (RQ) : 1.32, range: 0.09-11.75), supporting the tumor-specific expression of both transcripts (ORF1 and ORF2 both p < 0.01, Figure 2B). In line with BC cells, we detected a significant heterogeneity of L1 mRNA expression within the BC cohort ( Figure 2A). To decipher this complexity, we assessed whether L1 expression was enriched in tumors with specific clinical and pathological fea-tures. To do this, the RQ values were calculated by normalizing the level of ORF1 and ORF2 obtained in BC samples with the baseline amount of the mRNA detected in normal lesions (ORF1 (RQ) mean: 17.79, range: 0.37-108.13; ORF2 (RQ) mean: 27.69, range: 1.24-235. 24).
Subsequently, the cohort of 210 BCs was stratified according to PAM50 intrinsic molecular subtype and histopathological features. Clinicopathological data and statistics are reported in Table 1.  Figure 2C). Luminal BCs (both A and B) showed a reduced expression of L1s, with 82% of Luminal tumors presenting ORF1 and ORF2 values below the mean level of expression (97/118) ( Figure 2C). The HER2enriched subgroup showed the highest variability, with 55% of tumors characterized by L1 expression levels close to the Luminal group (29/52) and 45% with levels superimposable to basal-like BCs (23/52). The mean L1-ORF (RQ) for each subgroup is reported in Table 1. L1-ORF expression was increased in poorly differentiated (G3) BCs compared to G1/2 tumors (ORF1 and ORF2 both p < 0.01) ( Figure 2D). BCs characterized by a high Ki67 level showed an increased expression of both ORFs compared with those tumors with Ki67 < 20% (ORF1 (RQ) p < 0.01 and ORF2 (RQ) p < 0.01) ( Table 1). Following the stratification of BCs into three subsets according to their tumor-infiltrating lymphocytes (TILs) level (low: TILs < 10%, intermediate: 10% < TILs < 30% and high: TILs > 30%), we found significant variability in the L1 mRNA distribution among the subgroups (ORF1 (RQ) p = 0.02, ORF2 (RQ) p = 0.01), with a higher L1 expression in tumors with higher TIL scores ( Figure 2E). The mean L1-ORFs (RQ) for each group is reported in Table 1.

L1 mRNA Enrichment Was Associated with Relapse and Shorter Disease-Free Survival
We collected tumor relapse data for 208/210 BC patients, including the time from the primary treatment to the recurrence, defined as the disease-free survival (DFS). A heatmap for both ORF1 (RQ) and ORF2 (RQ) showed a trending polarization of the patients with recurrence according to high L1 expression (Supplementary Figure S2A). Patients with tumor relapse showed higher expression of both L1-ORFs (p < 0.01, Table 1). Similarly, ROC curves analysis for both ORFs defined a good level of accuracy in identifying patients with relapse (ORF1 AUC = 0.79, Youden index = 11.3; ORF2 AUC = 0.82, Youden index = 15.2) (Supplementary Figure S2B).
We then divided the cohort into ORF1/ORF2 "high"-and "low"-expressors (considering both median and average RQs as cut-offs) to assess whether L1 expression was associated with differential DFS. As shown in Figure 3A, a significantly shorter DFS was detected considering the ORF1 (RQ) median level as the cut-off. The BCs highly expressing ORF1 were characterized by a median of 32 months of DFS compared to 49 months for the low expressors (p < 0.01), with a two-times higher risk (hazard ratio, HR) of relapse. Similarly, high ORF2 values identified patients with an increased risk of relapse (p < 0.01, HR: 1.9, median low: 48 months vs. median high: 36 months, Figure 3B). Data concerning the stratification with the mean levels of ORF1 and ORF2 expression are reported in Table 2. Subsequently, we questioned whether merging the L1-ORF1 and ORF2 results would allow us to simplify the stratification of patients with relapse. The group of "L1 highexpressors" comprised patients with both ORF1/ORF2 (RQ) values over the median level, whereas "L1 low-expressors" included patients with at least one ORF (RQ) below the median level. The derived Kaplan-Meier curve identified a strong level of significance (p < 0.01), with an HR for L1 high expressors of 1.9 (median survival high: 36 months, low: 48 months) for both cut-offs ( Figure 3C and Table 2).
To assess whether L1 can be an independent prognostic factor, we applied the Cox regression model for multivariate analysis. Covariates in this analysis included L1 expression, tumor grade (G1-2 vs. G3), node involvement (0 vs. 1-3 nodes vs. >3 nodes), Ki67 expression (high vs. low), PAM50 subtypes, TILs (low, medium and high) and tumor size. Of note, only L1 expression (both stratified for average and median) and lymph-node involvement retained an independent level of significance (Table 2).
Since we described increased L1 mRNA expression as a negative prognostic factor, we focused on the DFS of L1 high expressors (cut-off: median L1(RQ)). In this subgroup, patients with a basal-like (median: 46 months) or with a HER2-enriched subtype (median: 52 months) had a worse prognosis, whereas Luminal patients with a tumor showing high L1 expressions were characterized by a longer DFS (Luminal A: 92 months, Luminal B: 75 months, p-values, HR and CI95% in Table 3, Figure 4A). Both Luminal A and Luminal B BCs displayed no differences between L1 high and L1 low expressors in terms of DFS ( Figure 4B). Conversely, basal-like or HER2-enriched patients showed a clear DFS stratification considering L1 expression, in which lower ORF expressors displayed a better outcome ( Figure 4C).  Finally, by evaluating the correlation between L1 expression and the type of therapy, we identified a significantly higher L1 amount in patients treated with chemotherapy who experienced tumor relapse (29/50), compared with those without recurrence (21/50) (p = 0.03 for both ORFs). No other correlations were identified between L1 expression and recurrence in other therapeutic approaches (hormonal therapy + chemotherapy, hormonal therapy alone, anti-HER2 drugs, Supplementary Table S5).

Discussion
LINE-1 (L1) sequences are usually silenced in human tissues through several mechanisms, including hypermethylation of their promoter [32]. Deregulation of these mobile elements has been widely investigated [33][34][35] in several neoplasms [10], in particular, their promoter methylation level as a surrogate for global methylation [30,31], the L1 de novo insertion for gene disruption [14] and L1 expression.
Here, we report for the first time a detailed characterization of RNA-based L1-ORF expression in a large cohort of BC patients in the attempt to contextualize L1 transcription among BC patients and to correlate L1 reactivation with clinical outcome. We demonstrated a high level of tumor-specific L1 transcripts in 12 BC cell lines and in a retrospective BC cohort, enhanced in HR-negative BCs. Moreover, the association between ORF expression and a high risk of tumor recurrence, confirmed by multivariate analysis, proposes L1 as a biomarker for BC prognostic stratification.
The tumor-specific activation of L1 mRNA can be considered crucial to define L1 as a biomarker [36]. The absence of L1 expression in the two breast normal cell lines and in the 47 mammary glands analyzed was in line with previously reported higher methylation levels in normal breast samples [37] and the absence of ORF1p/ORF2p IHC staining in both normal and peritumoral tissues [17]. We pursued the assessment of L1 mRNA instead of L1 methylation and protein expression for both technical and biological reasons. The identification of an L1 expression threshold value to distinguish tumor from normal tissue is easier and biologically more robust compared to L1-promoter methylation assays. A 60% methylation is often used as a cut-off between promoter methylation and demethylation [30,31]. However, (i) some expressive tumors may have high methylation levels [38], as confirmed in our study; (ii) there is no clear-cut correlation between demethylation and transcription in tumor tissues [39]; and (iii) the deamination of cytosine into uracil, the central part of the sodium-bisulfite conversion protocol [40,41], represents one of the main artifacts of FFPE tissues [40], leading to potential errors. L1-ORF expression using IHC on FFPE tissue is complex. A pioneering work about L1 expression in BC was based on IHC staining, allowing us to also evaluate the localization of the expression as a marker [17]; however, it was observed that not all L1 transcripts are translated into ORFs [41], and L1 sequences may play alternative roles as non-coding RNAs [42], leading to an underestimation of true L1 levels.
The application of an RNA-based test was not free from issues, mainly associated with the overlapping of L1 DNA-RNA sequences [43] and with the fragmentation of nucleic acids in FFPE tissues. Nevertheless, we applied a further DNase step to reduce the amount of residual DNA to a negligible level. To counteract the impact of fixation artifacts on the RNA and to simplify the L1 biomarker for statistical analysis, we considered as "L1 high-expressors" only patients with both ORFs over the median level of expression. This approach reduced the complexity, retaining the same value of clinical significance.
L1 reactivation in BC has historical proofs: in 1988 the insertion of L1 into the MYC gene showed the correlation between the hypomethylation of these sequences and a retrotransposon-linked carcinogenic effect [10]. Over the years, different L1 analysis approaches (methylation, RNA, alternative transcripts, protein expression and de novo insertions) confirmed the occurrence of L1 reactivation in BC [10,15,42,44].
Our results in the BC cell lines are in line with previous reports [16,45], thus confirming a statistically significant enrichment of L1 in basal-like cells. The association between L1 and BC aggressive features was firstly reported by Chen and colleagues, with a significant increase in protein expression in triple-negative breast cancers (TNBCs), associated with shorter OS [15]. This association between basal-like tumors and L1 overexpression was partially confirmed by McKerrow and colleagues [46], who reported an increased L1 expression in TP53 mutated tumors, recently confirmed in breast, ovarian and colon cancers [46]. The present data further corroborate the correlation of L1 expression with highly proliferative and aggressive lesions [47].
Among all BC pathological factors, we highlighted a possible cross-link between L1 expression and immune-cell infiltration, since L1-high BCs showed high levels of TILs in the stroma. Histologic evaluation of TILs has reached level IB evidence as a prognostic marker in TNBC and level 2A with respect to prediction of response to chemotherapy. Some authors have shown a favorable outcome in TNBCs with TILs > 30% treated by surgery alone [48], thus suggesting a possible role in the de-escalation of treatment in this subset of patients. Nevertheless, at present, TIL levels are not used to withhold chemotherapy in BC patients. In addition, we should acknowledge that the current biomarker used in predicting responsiveness to immunotherapy in TNBC is PD-L1 [49,50], which is almost exclusively expressed in immune cells [51,52]. The role of L1 retrotransposon in the modulation of lymphocyte activation has been recently demonstrated by Marasca and colleagues [53], thus suggesting an implication of L1 expression in the context of cancer-immunology.
We also explored the possible clinical impact of L1 expression levels. High L1 levels significantly correlated with a higher risk of BC recurrence in terms of DFS. In the multivariate analysis, lymph-node involvement and L1 expression were the only two independent/significant variables. Node involvement is recognized as one of the main markers for the risk of relapse in BC [54][55][56]. By confirming the independence of L1 with respect to node involvement as a prognostic marker, retrotransposon expression could represent a high risk of relapse in BC patients.
By stratifying the risk according to the BC subtype, the relationship between HRnegative BCs and L1 expression showed stronger results, whereas HR-positive/L1-high lesions did not show a reduced DFS, suggesting poor biological and prognostic significance of L1 retrotransposons in Luminal tumors. HER2-enriched BCs showed great heterogeneity for L1 mRNA level, both in cell models and in the retrospective cohort. HER2-addicted tumors were also characterized by a significant, albeit less robust, L1 stratification of the risk of relapse.
Interestingly, L1 has also been considered from a therapeutic perspective. The use of reverse-transcriptase inhibitors (efavirenz) has been shown as particularly effective in TNBC preclinical models and is associated with the metabolism of fatty acids [45]. Conversely, resistance to paclitaxel in TNBC cell lines mediated by paclitaxel-induced L1 RNA stabilization was reversed by the use of efavirenz [57]. In our cohort, patients treated with chemotherapy and experiencing tumor relapse showed significantly higher L1 expression levels, and most of these were basal-like BCs.
Our work presents two main issues, both associated with the retrospective design. First, given the heterogeneity of the population, it is impossible to recover data about the overall survival. Second, the small amount of archival material precluded high-throughput DNA analysis for all tumors in the study cohort.

Conclusions
L1 expression could be interpreted as a strong negative prognostic marker, especially for HR-negative BCs. When considering the impact on treatment, our data may lead to a hypothesis generating further studies. Indeed, whether L1 expression may be exploited to modulate the treatment of TNBCs or to predict response to chemotherapy or to the combination of chemotherapy and ICIs warrants further investigation.  Figure S2: (A) Heatmap of ORF1(RQ) and ORF2(RQ) within BC, with recurrence (N = light blue, Y = dark blue) annotation, clustered for the median level (above or below). Green-to-red scale for the log2 expression revealed a sensible increased expression in patients with tumor relapse. (B) Patient relapse ROC curves for both ORF1 and ORF. Plots also reporting the area under the curve (AUC) with range.); Table S1: BC cell lines analyzed in the study, Table S2: BC  cohort features; Table S3: Primers, thermal profiles and protocols for L1 analysis, Table S4: Mutation identified in the 41 BC patients; Table S5: