Comprehensive Analysis of Prognostic and Genetic Signatures for General Transcription Factor III (GTF3) in Clinical Colorectal Cancer Patients Using Bioinformatics Approaches

Colorectal cancer (CRC) has the fourth-highest incidence of all cancer types, and its incidence has steadily increased in the last decade. The general transcription factor III (GTF3) family, comprising GTF3A, GTF3B, GTF3C1, and GTFC2, were stated to be linked with the expansion of different types of cancers; however, their messenger (m)RNA expressions and prognostic values in colorectal cancer need to be further investigated. To study the transcriptomic expression levels of GTF3 gene members in colorectal cancer in both cancerous tissues and cell lines, we first performed high-throughput screening using the Oncomine, GEPIA, and CCLE databases. We then applied the Prognoscan database to query correlations of their mRNA expressions with the disease-specific survival (DSS), overall survival (OS), and disease-free survival (DFS) status of the colorectal cancer patient. Furthermore, proteomics expressions of GTF3 family members in clinical colorectal cancer specimens were also examined using the Human Protein Atlas. Finally, genomic alterations of GTF3 family gene expressions in colorectal cancer and their signal transduction pathways were studied using cBioPortal, ClueGO, CluePedia, and MetaCore platform. Our findings revealed that GTF3 family members’ expressions were significantly correlated with the cell cycle, oxidative stress, WNT/β-catenin signaling, Rho GTPases, and G-protein-coupled receptors (GPCRs). Clinically, high GTF3A and GTF3B expressions were significantly correlated with poor prognoses in colorectal cancer patients. Collectively, our study declares that GTF3A was overexpressed in cancer tissues and cell lines, particularly colorectal cancer, and it could possibly step in as a potential prognostic biomarker.


Introduction
According to global cancer statistics, colorectal cancer (CRC) causes more than 700,000 deaths every year, and there will be an estimated 53,200 CRC deaths in United States in 2020 [1,2]. This evidence makes CRC one of the deadliest cancer types, along with lung cancer, liver cancer, and stomach cancer. The incidence has slightly increased year

Oncomine Analysis and GEPIA Datasets
Oncomine is an online database that provides information on microarray cancer data. ONCOMINE (https://www.oncomine.org/, accessed on 25 February 2021) and GEPIA (http://gepia.cancer-pku.cn/, accessed on 25 February 2021) cancer databases also provide details on gene expression in cancer and normal samples [19,25]. The ONCOMINE and GEPIA analyses in this study were used to determine the expression levels of individual members of the GTF3 family in CRC. Statistical testing was performed using Student's t-test comparisons. The p-value was used to make decisions on differences in the gene expressions of GTF3 family members between normal controls and CRC samples. This study used the parameter threshold p-value of <0.05, multiple of change of 2, and top 10% gene ranking as we previously described [26][27][28][29][30].

Cancer Cell Line Encyclopedia (CCLE) Analysis
Over 1100 cell lines representing 37 cancer types were explored in the CCLE database (https://portals.broadinstitute.org/ccle, accessed on 25 February 2021), which provides extensive genomic information, computational analyses, and visualization [31]. This study used the CCLE dataset to examine mRNA expression levels to further verify the participation of these GTF3 family members in cancer cell lines. Expression profile values were log-transformed and then visualized by a heatmap.

Differentially Expressed GTF3 Genes: Prognostic Significance and Expression
To determine the prognostic roles of mRNA members of GTF3 family genes in CRC, this study used the PrognoScan (http://dna00.bio.kyutech.ac.jp/PrognoScan/, accessed on 25 February 2021) and HPA databases (www.proteinatlas.org, accessed on 25 February 2021) [32]. The PrognoScan database was used to generate survival plots, with a p-value threshold of 0.05 [33]. The HPA database provides a wealth of information on sequences, pathology, expressions, and distributions in various cancer tissues. The first version of this database contained more than 400,000 high-resolution images corresponding to more than 700 antibodies to human proteins [34]. This study analyzed the differential status of protein expressions and localization of select members of GTF3 family genes in CRC tissue using this database.

Genomic Alterations Analysis
The c-BioPortal (https://www.cbioportal.org/, accessed on 28 February 2021) facilitates the exploration of multidimensional cancer genomic data by enabling the visualization and cross-gene analyses of samples, data types, and changes in mRNAs and microRNAs [35,36]. Furthermore, altered gene functions can also occur due to gene mutations [37]. Therefore, we analyzed these genomic changes in GTF3 family members that were differently expressed in CRC.

Functional Enrichment Analysis
We used a pathway enrichment analysis to advance our research, which is generally used to identify cancer risk pathways and describe tumorigenesis processes [38]. This study integrated a cohort profile dataset to illustrate the potential of key candidate genes and pathways in CRC as we previously described [39][40][41][42][43]. Briefly, expression profiles of the GSE17536, GSE17537, and TCGA datasets were integrated and analyzed in depth. The first step was to collect GTF3 family gene expressions in TCGA data using Venny vers. 2.1 (https://bioinfogp.cnb.csic.es/tools/venny/, accessed on 28 February 2021). After that, a Cytoscape study was conducted using the shared gene list. Cytoscape (http://www.cytoscape.org/, accessed on 30 February 2021) [44] is used to visualize networks with expression profiles and other molecules. Furthermore, MetaCore software is used to identify the functions and pathways of altered genes, determine biological processes, disease biomarkers, tissues, colorectal neoplasms, and signaling and regulation of regulated pathways. The results of the enrichment analysis provided a pathway with p-value adjustment, and the log transformation was uploaded with a threshold p-value of <0.05.

Expression Pattern of GTF3 Family Genes in CRC
In the current study, the Oncomine database was used to reveal transcript expressions of GTF3 family genes in cancerous and normal tissues and found that the expressions of GTF3 family genes were associated with many types of cancers. Among all GTF3 members, GTF3A was particularly highly expressed in CRC compared to normal tissues, whereas GTF3B, GTF3C1, and GTF3C2 had low expression levels in other types of cancers such as brain and central nervous system cancers, esophageal cancer, and leukemia ( Figure 1). As well as mRNA expression analysis, we also expanded and explored the expression levels of members of the GTF3 family in various cell lines using the Cancer Cell Line Encyclopedia (CCLE) database ( Figure 2). The results showed that CRC cell lines, such as HT29, SW48, and COLO320, exhibited significantly high expression levels of GTF3A, GTF3B, GTF3C1, and GTF3C2 (   Further analysis identified an association between GTF3 messenger (m)RNA levels in CRC patients. We used GEPIA platform to compare GTF3 mRNA expressions in CRC and normal tissues ( Figure 3). Based on GEPIA analysis, the expression level of GTF3A in CRC was higher in tumor tissue relative to normal tissues. In contrast, the expression levels of GTF3B, GTF3C1, and GTF3C2 in CRC were lower in tumor tissue compared to normal tissues ( Figure 3A-F).

Prognostic Values and Protein Expressions of GTF3 Family Genes in CRC
We investigated whether expressions of GTF3 family genes were correlated with prognoses in CRC patients. The impacts of the expressions of members of this family of genes on survival rates were evaluated using the PrognoScan database [33,45]. An analysis of cohort data according to accession number GSE17536 covering 177 CRC samples showed that GTF3A expression significantly influenced the poor prognosis of CRC patients We next examined in situ expressions of GTF3 family genes at the protein level using immunohistochemical (IHC) data in The Human Protein Atlas (HPA) database. IHC was used to explore the protein levels of GTF3 family members in CRC tissues. We found that GTF3A and GTF3C1 proteins were more highly expressed in CRC tissues than in normal tissues ( Figure 5).

Genomic Alterations of GTF3 Family Gene Expressions in CRC
Alterations in gene expressions can occur as a result of amplification, deletion of genes, or irregular transcription regulation. Furthermore, altered gene functions can also occur due to gene mutations. Therefore, we analyzed these genomic changes of the GTF3 gene family that were differentially expressed using the cBioPortal ( Figure 6) and found that genetic alteration rates of GTF3 family genes were in the order of GTF3A (26%), BRF1/GTF3B (6%), GTF3C1 (7%), and GTF3C2 (5%) ( Figure 6A). We also calculated mRNA expression correlations among GTF3 gene family members and found that GTF3A was negatively correlated with GTF3B, GTF3C1, and GTF3C2; GTF3B was positively correlated with GTF3C1 and GTF3C2; and GTF3C1 was positively correlated with GTF3C2 ( Figure 6B).

Pathway Enrichment Analysis
We then constructed a network of gene interactions, Gene Ontology (GO) biological processes, and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment pathway analyses using 80 shared genes by Venny version 2.1.0, and then used this as an input for ClueGO/CluePedia packages in cytoscape software package ( Figure 6C). Statistical options for the ClueGO/CluePedia enrichment analysis were set based on a hypergeometric test that is two-sided with p ≤ 0.05, the Benjamini-Hochberg correction, and a kappa score of ≥2 as the primary criteria. Through Cytoscape analysis (GlueGO and CluePedia), we found that members of the GTF3 gene family have a high correlation with metastatic markers such as ABL1 [46,47], WDR6 [48], ARAP1-AS1 [49,50], DNMT1 [51][52][53][54][55], FASN [56,57] ( Figure 6D).

GTF Coexpressed Genes and Regulated Networks in TCGA Database
To understand how differentially expressed gene (DEG) lists are related to downstream GTF-regulated networks in different biological processes and diseases, we performed an enrichment analysis using MetaCore software. By uploading GTF3A coexpressed genes from the TCGA-CRC dataset into the MetaCore, we revealed that cell cycle-related pathways and networks including "DNA damage_ATM/ATR regulation of G 2 /M checkpoint: cytoplasmic signaling", "Development_Positive regulation of WNT/Beta-catenin signaling in the cytoplasm", "Putative roles of SETDB1 and PLU-1 in melanoma", "Proteolysis_Putative ubiquitin pathway", and "Immune response_BAFF-induced non-canonical NF-κB signaling" (Figure 7).

GTF coexpressed genes and regulated networks in TCGA database
To understand how differentially expressed gene (DEG) lists are related to downstream GTF-regulated networks in different biological processes and diseases, we performed an enrichment analysis using MetaCore software. By uploading GTF3A coexpressed genes from the TCGA-CRC dataset into the MetaCore, we found cell cycle-related pathways and networks such as "DNA damage_ATM/ATR regulation of G2/M checkpoint: cytoplasmic signaling", "Development_Positive regulation of WNT/Beta-catenin signaling in the cytoplasm", "Putative roles of SETDB1 and PLU-1 in melanoma", "Proteoly-sis_Putative ubiquitin pathway", and "Immune response_BAFF-induced non-canonical NF-κB signaling" (Figure 7).  To study the gene networks and signaling pathways that could be affected by the chosen genes, we exported genes expressed with GTF3A and then uploaded them to the MetaCore platform for a path analysis. The MetaCore pathway analysis indicated that the "DNA damage_ATM/ATR regulation of G2M checkpoint cytoplasmic signaling"-related pathway was correlated with colorectal cancer development. GTF3B co-expressed genes from the TCGA-CRC database were correlated with "Chemotaxis_Lysophosphatidic acid signaling via GPCRs", "Oxidative stress_ROS-induced cellular signaling", "Histone deacetylases in prostate cancer", "Development_Negative regulation of WNT/Beta-catenin signaling in the nucleus", and "Cytoskeleton remodel-ing_Regulation of actin cytoskeleton organization by the kinase effectors of Rho GTPases" (Figure 8).
Curr. Issues Mol. Biol. 2021, 1, FOR PEER REVIEW 11 Figure 8. GTF3B differentially expressed genes pathway developed by MetaCore. Experimental data from TCGA were linked to and visualized on maps as thermometer-like figures. To study the gene networks and signalling pathways that could be affected by the chosen genes, we exported genes expressed with GTF3B and then uploaded them to the MetaCore platform for a path analysis. The MetaCore pathway analysis indicated that the "chemotaxis lysophosphatidic acid signaling via GPCRs"related pathway was correlated with colorectal cancer development. Upward (red) thermometers show upregulated signals, while downward (blue) thermometers indicate downregulated gene expression levels. Annotations are listed in supplemental materials.
GTF3B co-expressed genes from the TCGA-CRC database were correlated with "Chemotaxis_Lysophosphatidic acid signaling via GPCRs", "Oxidative stress_ROS-induced cellular signaling", "Histone deacetylases in prostate cancer", "Development_Negative regulation of WNT/Beta-catenin signaling in the nucleus", and "Cytoskeleton re-modeling_Regulation of actin cytoskeleton organization by the kinase effectors of Rho GTPases" (Figure 8). GTF3C1 co-expressed genes from the TCGA-CRC dataset were correlated with "Chemotaxis_Lysophosphatidic acid signaling via GPCRs", "Regulation of lipid metabo-lism_Regulation of lipid metabolism via LXR, NF-Y and SREBP", "Transport_Induction of Macropinocytosis", "Notch signaling in breast cancer", and "Cytoskeleton remodel-ing_Regulation of actin cytoskeleton organization by the kinase effectors of Rho GTPases" (Figure 9). GTF3C2 co-expressed genes from the TCGA-CRC dataset were correlated with "DNA damage_Role of Brca1 and Brca2 in DNA repair", "DNA damage_ATM/ATR regulation of G1/S checkpoint", "DNA damage_p53 activation by DNA damage", "DNA damage_G2 checkpoint in response to DNA mismatches", and "DNA damage_DNAdamage-induced responses" (Figure 10). To study the gene networks and signaling pathways that could be affected by the chosen genes, we exported genes expressed with GTF3B and then uploaded them to the MetaCore platform for a path analysis. The MetaCore pathway analysis indicated that the "chemotaxis lysophosphatidic acid signaling via GPCRs"-related pathway was correlated with colorectal cancer development. GTF3C1 co-expressed genes from the TCGA-CRC dataset were correlated with "Chemo-taxis_Lysophosphatidic acid signaling via GPCRs", "Regulation of lipid metabolism_Regulation of lipid metabolism via LXR, NF-Y and SREBP", "Transport_Induction of Macropinocytosis", "Notch signaling in breast cancer", and "Cytoskeleton remodeling_Regulation of actin cytoskeleton organization by the kinase effectors of Rho GTPases" (Figure 9).  To study the gene networks and signaling pathways that could be affected by the chosen genes, we exported genes expressed with GTF3C1 and then uploaded them to the MetaCore platform for a path analysis. The MetaCore pathway analysis indicated that the "regulation of lipid metabolism_Regulation of lipid metabolism via LXR, NF-Y, and SREBP"-related pathway was correlated with colorectal cancer development.
GTF3C2 co-expressed genes from the TCGA-CRC dataset were correlated with "DNA damage_Role of Brca1 and Brca2 in DNA repair", "DNA damage_ATM/ATR regulation of G1/S checkpoint", "DNA damage_p53 activation by DNA damage", "DNA damage_G2 checkpoint in response to DNA mismatches", and "DNA damage_DNA-damage-induced responses" (Figure 10). Figure 9. GTF3C1 differentially expressed genes pathway developed by MetaCore. Experimental data from TCGA were linked to and visualized on maps as thermometer-like figures. To study the gene networks and signalling pathways that could be affected by the chosen genes, we exported genes expressed with GTF3C1 and then uploaded them to the MetaCore platform for a path analysis. The MetaCore pathway analysis indicated that the "regulation of lipid metabolism_Regulation of lipid metabolism via LXR, NF-Y, and SREBP"-related pathway was correlated with colorectal cancer development. Upward (red) thermometers show upregulated signals, while downward (blue) thermometers indicate downregulated gene expression levels. Annotations are listed in supplemental materials. Figure 10. GTF3C1 differentially expressed genes pathway developed by MetaCore. Experimental data from TCGA were linked to and visualized on pathway maps as thermometer-like figures. To study the gene networks and signaling pathways that could be affected by the chosen genes, we exported genes expressed with GTF3C2 and then uploaded them to the MetaCore platform for a path analysis. The MetaCore pathway analysis indicated that the "DNA damage_Role of Brca1 and Brca2 in DNA repair"-related pathway was correlated with colorectal cancer development.
GTF3 family members play crucial roles as transcription factors, affecting several important biological pathways. Nevertheless, these GTF3 transcription factor-related pathways have not yet been clearly elucidated in cancers. Exploring the potential of GTF3s could be a novel approach in cancer therapy for CRC treatment. However, basic studies showed substantial discrepancies in different GTF3 family members' specific roles in CRC biology [66]. In this research, we systemically represented expression profiles of each member of the GTF3 family, specifically related to CRC, in order to reveal that genes of this family had significant differences by comparing mRNA expressions between CRC tissues and normal colon and rectal tissues. By an integrative analysis with GEPIA, the expression level of GTF3A in CRC was higher in CRC tissue relative to normal tissues. In contrast, the expression levels of GTF3B, GTF3C1, and GTF3C2 in CRC were lower in CRC tissue than those in normal tissues. Hence, among genes of this family, these results confirmed that the GTF3A has distinct mRNA expression in CRC and might imply its effect to this disease.
In order to support these findings at the proteomic level, we examined in situ expressions of genes of the GTF3 family at the protein level using IHC data from the HPA database. We found that GTF3A protein was more highly expressed in CRC tissues than in normal tissues. The above results contradict the previous studies on the protective effect of GTF3A [66,67]. A possible explanation for this conflict may be that GTF3A acts differently in different stages of tumor cells. Further investigations of GTF3A are still needed to resolve these discrepancies.
According to our results, only GTF3A could be considered as a prognostic marker in CRC compared to the other GTF3 family members. Therefore, we further focused on the expression of GTF3A with different clinical parameters. Our Kaplan-Meier analysis via the PrognoScan database revealed that GTF3A expression was positively related to DSS and OS; however, it could not be considered a DFS prognosis marker for CRC patients, with an HR of 0.8 and p-value of 0.399. Interestingly, increased in situ GTF3A expression at the protein level using IHC data in the HPA database showed that there were distinguishable patterns between weak and robust expressions in nuclei. Another important result in this study is that we have used ClueGO/CluePedia to improve the biological interpretation of a large list of genes. Multiple lists of markers can be analyzed simultaneously to underline their general or specific function. ClueGO/CluePedia analysis revealed that members of the GTF3 gene family have a high correlation with several metastatic markers in colorectal cancer, such as ABL1, which is highly expressed in tissue and CRC cells, whose high expression is associated with tumor stage of CRC patients [46,47,49]; WDR6 being a potential target gene for miR-451a in CRC [48], lncRNA ARAP1 antisense RNA 1 (ARAP1-AS1) promotes the epithelial-mesenchymal transition (EMT) processes in CRC via the Wnt/β-catenin signaling pathway [49,50]; DYRK2 expression downregulated through transcription regulation by DNMT1 was found to increase colorectal cancer cell proliferation [52]. A high serum FASN level is a prognosis marker of late stage colorectal cancer patients [57]. This study has also revealed that there is a high correlation of GTF3A and the pathway "the DNA damage_ATM/ATR regulation of G2/M checkpoint: cytoplasmic signaling" in CRC development [68]. When DNA double-strand breaks, ATM is a DNA damage signaling [69], and the reduction of ATM was found in CRC tumors [70], Plk1 was elevated in carcinomas of the non-small cell lung and other types of tumor, and it's also play a crutial role in G2/M checkpoint recovery [71]. The high correlation between GTF3B and the pathway "Chemotaxis_Lysophosphatidic acid signaling via GPCRs" was demonstrated to be involved in CRC development in previous studies [72,73]. Lysophosphatidic acid (LPA) is the smallest bioactive lipid that mediates critical responses such as cell proliferation, migration, and cytoskeletal reorganization by interaction with several G protein-coupled receptors (GPCRs) [74]. LPA signaling promotes cancer development and metastasis by modulating cell proliferation, invasion, adhesion, angiogenesis, and survival [75]. LPA has also been linked to the induction of DNA synthesis and colorectal cancer cell migration [76].
In addition, we also found several genes involved in apoptosis resistance/metastasis through the pathway "DNA_ATM damage/ATR regulation of the G2/M checkpoint: cytoplasmic signaling", as well as Cell Division Cycle (CDC) family genes such as CDC25A, CDC25B, CDC25C, and CCNB1 (Cyclin B1) [77]. CDC25A has a role in apoptosis/metastasis regulator; CDC25A overexpression increases tumorigenesis, and is often observed in various types of cancer [78]. In expansion to DNA damage, hypoxia was reported to influence CDC25A expression in colon cancer cells [79]. CDC25B was recognized as a target of miRNA-148a which may regulate pancreatic ductal adenocarcinoma development [80]. In vulvar squamous cell carcinoma, overexpression of CDC25B, CDC25C, and phosphor-CDC25C is linked to malignant features and aggressive cancer phenotypes [81]. CCNB1 inhibition causes apoptotic death in certain colorectal cancer cells [82]. Furthermore, CCNB1, which is activated by Chk1, plays an oncogenic function in colorectal cancer cells and may be useful in the development of new colorectal cancer therapies [82].

Conclusions
In summary, we explored as well as integrated several databases, and high throughput analysis approach revealed that expressions of GTF3 family members play important roles in malignancy development. These result provide useful evidence for prospective research of CRC associations with GTF3 family genes, and these data also suggested that GTF3A might be a potential prognostic biomarker for CRC, although more investigations are needed to determine comprehensively the role of GTF3A in CRC for further translational research.