TCGA Pan-Cancer Genomic Analysis of Alternative Lengthening of Telomeres (ALT) Related Genes

Telomere maintenance mechanisms (TMM) are used by cancer cells to avoid apoptosis, 85–90% reactivate telomerase, while 10–15% use the alternative lengthening of telomeres (ALT). Due to anti-telomerase-based treatments, some tumors switch from a telomerase-dependent mechanism to ALT; in fact, the co-existence between both mechanisms has been observed in some cancers. Although different elements in the ALT pathway are uncovered, some molecular mechanisms are still poorly understood. Therefore, with the aim to identify potential molecular markers for the study of ALT, we combined in silico approaches in a 411 telomere maintenance gene set. As a consequence, we conducted a genomic analysis of these genes in 31 Pan-Cancer Atlas studies from The Cancer Genome Atlas and found 325,936 genomic alterations; from which, we identified 20 genes highly mutated in the cancer studies. Finally, we made a protein-protein interaction network and enrichment analysis to observe the main pathways of these genes and discuss their role in ALT-related processes, like homologous recombination and homology directed repair. Overall, due to the lack of understanding of the molecular mechanisms of ALT cancers, we proposed a group of genes, which after ex vivo validations, could represent new potential therapeutic markers in the study of ALT.


Introduction
Telomeres are nucleoprotein complexes that consist of a tandem 5´-TTAGGG-3´ sequence and protect the ends of eukaryotic chromosomes preventing DNA damage response (DDR), end-to-end fusions and genomic instability 1. Telomeric DNA ranges from 3 to 15 Kb in humans, leaving a 3´-single-strand overhang, usually called the Goverhang 1,2. To avoid this end-replication problem, telomeres are protected by a complex of proteins called shelterin, which is essential in the formation of the t-loop and hides the G-overhang 2. However, with each round of somatic cells cycle, telomeres lose about 200 nucleotides; eventually, this shortening leads to senescence or apoptosis 3. To avoid apoptosis or senescence due to telomere shortening, cancer cells use a set of mechanisms known as Telomere Maintenance Mechanisms (TMM), which includes: telomerase reactivation and the Alternative Lengthening of Telomeres (ALT) 1,3. A high proportion of tumors reactivate the expression of telomerase to maintain its chromosomal ends, however, 10-15% of human cancers use the ALT pathway 4. ALT-positive (ALT+) cells display a common characteristic phenotype. For instance, telomeric DNA in ALT+ cells is constantly elongating, showing that its telomeres rely on recombining mechanisms like homologous recombination (HR) and homology-directed repair (HDR) pathways to be extended 5,6. ALT+ cells also show heterogenous telomere length, abundant extrachromosomal repeats (ECTRs), telomere sister chromatid exchange (T-SCE) 7 and high levels of extrachromosomal telomeric single-stranded DNA known as C-circles, which are markers for ALT+ cells detection 8. The main characteristic of ALT+ telomeres is its association with promyelocytic leukemia (PML) proteins, which altogether are believed to function as platforms for telomere recombination and are known as ALT-associated PML bodies (APBs) 7,9. In fact, it has been shown that disruption of APBs blocks the ALT mechanism 7. Although, many proteins have been implicated in the ALT mechanism, like the loss of alpha thalassemia/mental retardation syndrome X-linked chromatin remodeler (ATRX), the loss of histone chaperone death domain-associated protein (DAXX) and histone H3.3 (H3F3A) and the well-known expression of RAD51/52 complex, the molecular basis through which ALT occurs remain elusive and poorly understood 1,5,10. Many anti-telomerase-based cancer therapies used in cancer treatment, are believed to cause the switching in some tumors from telomerase to ALT 3,11. Indeed, the co-existence of telomerase and the ALT mechanism has been reported in some cancer types 3,12. This switching has converted ALT in a potential target for therapy in the last years, due to the poor prognosis these cells represent 1,7,13. Different drugs have been tested in the last years to target ALT+ cells, however, most of them have been unsuccessful or are on phase 1/2 of clinical trials1,7.
Although several ALT-associated factors have been uncovered, the pathway activation, regulators and functioning require further investigation 7. Therefore, the identification of new potential molecular markers would be of enormous importance in the future design of new strategies for the detection and treatment of ALT+ cancers. To fulfill this need, we evaluated the genomic alterations of 411 telomere maintenance (TM) genes in 9,282 samples from 31 Pan-Cancer Atlas (PCA) studies and applied different in silico approaches with the aim to identify and propose more genes which after validation could be used as potential molecular markers to improve the undersantinding of the ALT pathway.

Genomic alterations
To address the genomic alterations of TM genes, a total of 411 genes from the TelNet database14 (Table S1) were analyzed in the cBioPortal15,16 by selecting 9,282 samples from 31 Pan-Cancer Atlas studies from The Cancer Genome Atlas (TCGA)17-26 (Table   1). A total of 325,936 genomic alterations were identified (Table S2) and a pie chart with the most frequent alterations was constructed after all values were normalized by the number of samples. Figure 1A shows, mRNA upregulation at the top with 65.8%, followed by mRNA downregulation (10.7%), copy number alteration (CNA): amplifications (9.6%), missense mutation (putative passenger) with 7.2%, deep deletion (3.1%) and truncating mutations and fusion genes with less than 2%.
Finally, to better understand the implication of these genes in cancer progression, from primary tumors to metastasis, we analyzed staging ( Figure 1B) (Table S3). However, no significant difference was observed after a Bonferroni correction test (p>0.01). As a result, it can be inferred that TM genes alterations and the ALT pathway across different types of tumors are not dependent on cancer staging.

TM genes validation and TCGA Pan-Cancer studies frequencies
The gene set and each TCGA study were ordered by the highest frequency mean of genomic alterations to the lowest (Tables S4). To narrow down  To further identify which TM genes were the most altered in the 31 PCA studies overall, a boxplot with the frequency means of genomic alterations was constructed for the 411 genes ( Figure 2B). Hence, after the boxplot analysis the following 20 TM genes were selected and proposed as candidate genes to be studied in the ALT pathway: TP53 Finally, in order to elucidate the predominant genomic alteration in the first-quartile of the most altered TM genes across the 31 PCA studies, an oncoprint with the percentage of each alteration is showed in Figure 2D. As expected, mRNA upregulation is the most common alteration in the gene set, with certain exceptions where amplifications, mRNA downregulation and truncating mutations, appear most frequently.

Protein-protein interaction (PPi) network and enrichment analysis
PPi networks are useful resources to understand how proteins interact between them in the cell 27, hence, STRING database 28 was used to observe the interactions of 103 TM proteins. By using an interaction score of highest confidence (0.9)29, a network was constructed, additionally, the most significant pathways (p< 0.001) were selected and marked with different colors in the nodes ( Figure 3A). Subsequently, an enrichment analysis was made using g:Profiler30 (Table S5), Figure 3B shows Overall, the correlation of the most significant pathways observed in the different analyses where the 20 TM candidate proteins are involved, suggests a huge impact of these proteins' alterations in the activation and progression of the ALT mechanism, which will be discussed later.

Discussion
TM is a crucial mechanism in the hallmarks of cancer for indefinite replicative potential, 10 to 15% of cancers do not depend on telomerase to maintain or extend its telomeres, instead, they use the ALT pathway1. In the last years, efforts to identify genes responsible for ALT progression have been made; so far, the loss of ATRX and DAXX 1,10,32-34and expression of RAD51 and RAD52 1,3,4 have been widely described. Nevertheless, ALT mechanism and the molecular basis underlying its progression, switching, detection and treatment remain elusive, therefore, we used simplistic OncoOmics and in silico approaches to identify potential molecular markers to improve the study of the ALT mechanism.
In 2018, TelNet database was introduced, offering more than 2000 human genes associated with TM. Genes are annotated according to TM mechanism, function, significance score and its validation in the literature14. We manually curated the database  Nonetheless, due to the difficulty and lack of sensible diagnostic techniques for ALT+ tumors, those numbers may increase 7,44,45. In addition, we identified which TM genes were the most altered among the 31 PCA studies ( Figure 2B) and in the top altered cancer studies ( Figure 2C). This analysis, linked with the oncoprint showed in Figure 2D, correlates each altered gene with its main genomic alteration. As a result, TP53 is the most altered gene in the majority of cases, which is not odd, due to the evidence of its function as a tumor suppressor. However, it has been reported to be co-mutated with ATRX and histone H3.3 46 and is related with the high expression of TERT (a well-known factor for ALT progression) when truncated 47. Along with TP53, we identified and proposed in this study TM genes as possible Some of the genes proposed have been proven to play an important role in another TM mechanisms which can be ligated to ALT+ tumors, for instance: NSMCE2 recruitment is essential for APBs functioning48, it is also part of the SMC5/SMC6 complex, which its inhibition is known to disrupt APBs formation52. NBN is part of the MRN complex (MRE11/RAD50/NBN) which generates the G-overhang in the lead telomeric strand53; additionally, it promotes the ALT mechanism by recruiting ATM to the telomeres, allowing the invasion of adjacent telomeric DNA to be used as a template for telomere extension54.
Moreover, MCPH1 and ARID1A bind to the telomerase reverse transcriptase (hTERT) and regulate its expression55, the oncoprint in Figure 2D shows these genes to be down regulated or fused across the different cancer types, which can give an insight on their role in the switching from telomerase to ALT. Other genes like LRATD2 is known to be upregulated in cells with short telomeres56. ATR is important in the assembly of the telomerase complex57. TERF1 is overexpressed in cells with long telomeres58 and last but not least, RECQL4 is associated with TERF1 in the formation of the shelterin complex59.
According to the TelNet database, based on their role in the ALT mechanism, TM genes are qualified as enhancers, repressors or ambiguous 14. Only NSMCE2, RFC4, NBN and ATR are recognized as enhancers, while, SENP5, UBR5, RAD21, MCPH1, RECQL4,   ARID1A, CCT5, LRATD2, RAD1, FBH1, RAD54B, SMG7 and TERF1 are qualified as ambiguous. This highlights the importance of an expression analysis of the ALTassociated genes identified in this research, in order to study their role in the switching, progression or maintenance of ALT.
In order to understand the way TM proteins interact and behave in ALT+ cells we performed a protein-protein interaction network using STRING ( Figure 3A). We selected and observed the most significant pathways (p< 0.001) associated with the selected TM proteins, as expected, 35% of the 103 proteins analyzed were involved in telomere organization, maintenance and telomeric DNA binding, while 22% are crucial for HDR and HRR, which altogether with non-homologous end joining (NHEJ) have an important role in the mechanism by which ALT+ cells extend its telomeres60,61.
Moreover, we performed an enrichment analysis for the 103 TM proteins using g:Profiler, which searches a collection of proteins with pathways, networks, gene ontology (GO) and cancer phenotypes62. The GO for molecular functions were DNA binding and telomeric DNA binding, the GO for biological process were telomere maintenance, DNA repair, telomere and chromosome organization. The most significant KEGG signaling pathway was HR, and the most significant REACTOME most significant pathways were DDSB repair and HRR. Finally, the human phenotype related to the proteins analyzed was abnormality of chromosome stability. As expected, the GO analysis and pathways observed were highly related to ALT progression and maintenance mechanisms.
Furthermore, we selected the proposed TM proteins in our study and the most significant pathways from the PPi and enrichment analysis and constructed a CIRCOS plot ( Figure   3C) to observe the main pathways in which our candidate proteins are directly involved.

Gene sets
TelNet database (http://www.cancertelsys.org/telnet/) was downloaded and filtered manually to scan for genes related to the ALT mechanism. The criteria used for filtering were: TM function, significance, phenotype and annotation. ATRX and DAXX genes, already known to be involved in ALT progression were excluded from the study, resulting in 411 TM genes selected for further analysis.

Protein-protein interaction network
In order to predict the interactions among the TM proteins we used the STRING database, with an interaction score of 0.9 (highest confidence). Most significant signaling pathways (p< 0.001) related to TMM and the ALT pathway were selected and differentiated by colors in the network.

Gene set enrichment analysis
The set of genes was analyzed in the g:Profiler (https://biit.cs.ut.ee/gprofiler/gost), the significance threshold selected was Benjamini-Hochberg FDR (p< 0.001) and data sources consulted were GO: Molecular function, GO:Biological process, KEGG, Reactome and Human Phenotype Ontology.