Next Article in Journal
The Proteomic Landscape of Parkin-Deficient and Parkin-Overexpressing Rat Nucleus Accumbens: An Insight into the Role of Parkin in Methamphetamine Use Disorder
Previous Article in Journal
Hepatic Lipoprotein Metabolism: Current and Future In Vitro Cell-Based Systems
Previous Article in Special Issue
The Proapoptotic Effect of MB-653 Is Associated with the Modulation of Metastasis and Invasiveness-Related Signalling Pathways in Human Colorectal Cancer Cells
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Transcript PHF19-207 May Be a Long Non-Coding RNA with Tumor-Promoting Role in Colon Cancer

Institute of Molecular Genetics and Genetic Engineering, University of Belgrade, Vojvode Stepe 444a, 11000 Belgrade, Serbia
*
Authors to whom correspondence should be addressed.
Biomolecules 2025, 15(7), 957; https://doi.org/10.3390/biom15070957
Submission received: 12 May 2025 / Revised: 26 June 2025 / Accepted: 30 June 2025 / Published: 2 July 2025

Abstract

Recent pan-cancer transcriptome analysis has revealed differential activity of two alternative PHF19 gene promoters in malignant versus non-malignant gut mucosa. One of these promoters upregulated in colon cancer leads to the expression of the PHF19-207 transcript, suggesting its potential role in tumor promotion. The objective of this study was to investigate the function of PHF19-207 using in silico tools and publicly available data, as well as to assess its expression in colon cancer. Expression analyses were conducted via qPCR and RNA sequencing on RNA extracted from the immortalized colonic epithelial cell line HCEC-1CT, as well as a series of colon cancer cell lines cultured in both 2D and 3D environments. The expression of PHF19-207 was found to be elevated in all malignant cell lines compared to the non-malignant HCEC-1CT cell line in both culture conditions, with the most prominent increase observed in cell lines derived from advanced stages of the disease and in the HCEC-1CT cell line overexpressing KRAS. Furthermore, the PHF19-207 transcript was detected in exosomes derived from malignant cells. These findings suggest that PHF19-207 holds potential as a diagnostic biomarker. In addition, in silico analyses indicate that this transcript may function as a long non-coding RNA involved in the regulation of gene expression. Further functional investigations are required to elucidate its precise role in colon carcinogenesis.

Graphical Abstract

1. Introduction

According to 2022 data (https://gco.iarc.who.int/en, accessed on 15 November 2024), colorectal cancer (CRC) ranks as the third most commonly diagnosed cancer in both men and women, and is the second leading cause of cancer-related deaths globally. According to the World Health Organization (WHO), of all colorectal cancer cases, 70% are localized in the colon.
One of the hallmarks of cancer, including colon cancer, is its dynamic transcriptional landscape and usage of alternative promoters. Expression of many human protein-coding genes is regulated by alternative promoters. Recent findings from an extensive pan-cancer transcriptome analysis revealed differential expression of two alternative PHF19 gene promoters in malignant versus non-malignant gut mucosa [1].
The promoter upregulated in colon and rectal cancer gives rise to the PHF19-207 transcript, suggesting a potential tumor-promoting function. The PHF19 gene encodes PHD finger protein 19, a component of the polycomb repressive complex 2 (PRC2), which is involved in H3K27 methylation, a chromatin modification linked to transcriptional repression [2]. The target genes of PHF19 protein are implicated in processes such as proliferation, differentiation, angiogenesis, and the organization of the extracellular matrix [3]. The role of PHF19 protein in malignant transformation has been demonstrated in several malignancies, with its tumor-promoting role in colorectal cancer revealed only recently [4,5].
The PHF19 gene is located at 9q33.2 and encompasses 39245bp. A set of 14 transcripts was identified from this gene, with major transcript PHF19-202 (ENST00000373896) encoding a 580-amino-acid-long protein. According to the Ensembl database, the majority of other transcripts are either truncated or have an undefined coding sequence. Elements of the non-coding transcriptome are increasingly recognized as key contributors to the complexity of the genome; however, their specific roles remain largely unexplored.
The objective of our study was to examine the expression of PHF19-207 in colon cancer, assess its potential as an early biomarker for colorectal cancer, and evaluate its functional implications using in silico tools, as the function of this transcript has not been previously characterised.

2. Materials and Methods

2.1. In Silico Analysis of the PHF19 Gene Promoters

The promoter sequences of the PHF19 gene were defined as 1kb regions both upstream and downstream of the two transcription start sites (TSSs) identified as differentially active in colorectal cancer [1]. These promoter sequences were retrieved in FASTA format from the human GRCh38.p13 assembly using the Ensembl genome browser. To analyze characteristic motifs within these sequences, the Motif Finder tool of the Integrative Genomics Viewer (IGV) program was employed [6]. The distribution of GC boxes was examined using the MethPrimer 2.0 tool (http://www.urogene.org/methprimer/) [7].
Four available bioinformatic tools were utilized to predict the presence of consensus sequences for potential transcriptional regulator binding within the PHF19 gene promoter: Alggen PROMO 2.0 (https://alggen.lsi.upc.es, https://bio.tools/alggen), AliBaba 2.1 (https://gene-regulation.com/pub/programs/alibaba2/), CiiDER (https://ciider.com, https://bio.tools/CiiiDER), and TFBIND (https://tfbind.hgc.jp) [8,9,10]. Each of these four tools employs different algorithms, and their combined use enhances prediction robustness and allows for cross-validation results. Default query parameters and human libraries were applied in these analyses to ensure optimal performance of the mentioned tools and reproducibility. Only the positive results obtained from at least two algorithms for each transcriptional regulator were considered. The expression levels of the identified regulators in colon cancer and normal gut mucosa were analyzed using the Gene Expression Profiling Interactive Analysis 2 (GEPIA) tool (http://gepia.cancer-pku.cn/).
The list of genetic variants in the PHF19 gene promoter sequences was extracted from the Ensembl database (global MAF:0.005-0.5, class: SNP, clinical consequences: all, consequences: all) to map variants occurring in the predicted binding sites of transcriptional regulators.

2.2. In Silico Analysis of PHF19-207

The sequence of the PHF19-207 transcript was retrieved as a FASTA file using the human GRCh38.p13 assembly from the Ensembl genome browser (ENST00000456291).
The coding potential of the transcript was assessed using the LCG Coding Potential Prediction 2.0 tool (https://ngdc.cncb.ac.cn/lgc/) and Coding Potential Calculator 2 (http://cpc2.gao-lab.org/), while the prediction of its secondary structure was conducted with the RNAfold tool. To evaluate the expression of PHF19-207 in normal and tumor colon tissues, the AnnoLnc2 tool was utilized, based on publicly available expression data. Additionally, the transcript’s cellular localization was predicted using AnnoLnc2, with further validation performed using the lncLocator 1.0 (http://www.csbio.sjtu.edu.cn/bioinf/lncLocator/) and lncLocator2 (http://www.csbio.sjtu.edu.cn/bioinf/lncLocator2/) tools [11].
Homologue miRNA sequences were extracted using miRbase 22.1 (http://mirbase.org) and the mirDB 6.0 custom prediction tool (http://mirdb.org) [12,13]. Potential interactions between detected miRNAs and PHF19-207 were predicted using RNA22 v2 tool (http://cm.jefferson.edu/rna22/Interactive) and BiBiServ2 RNAhybrid 2.1.2 tool (http://bibiserv.cebitec.uni-bielefeld.de/rnahybrid, https://bio.tools/rnahybrid) [14,15].

2.3. Cell Cultures

The cell lines utilized in this study were derived from human colon tissue, including the immortalized colonic epithelial cell line HCEC-1CT (CVCL_AQ45) isolated from healthy tissue (Evercyte GmbH, Wien, Austria) and a panel of colon cancer cell lines: HCT 116 (CVCL_0291), HT-29 (CVCL_0320), CaCo-2 (CVCL_0025), SW480 (CVCL_0546), DLD-1 (CVCL_0248), and SW620 (CVCL_0547) (ATCC, Manassas, VA, USA). All cell lines were cultured in Dulbecco’s Modified Eagle Medium (DMEM; Capricorn Scientific, Ebsdorfergrund, Germany) supplemented with 10% fetal bovine serum (FBS; Capricorn Scientific, Germany) and 1% antibiotic/antimycotic solution (Capricorn Scientific, Ebsdorfergrund, Germany) in a 5% CO2 atmosphere at 37 °C. Cells were subcultured once they reached 70–80% confluence using 1× trypsin/EDTA (Capricorn Scientific, Ebsdorfergrund, Germany). To ensure biological relevance, cells were cultivated in triplicate. All cell lines were confirmed to be free from mycoplasma contamination.
Non-malignant (HCEC-1CT) and malignant cell lines representing different tumor stages according to the Dukes’ classification (HCT116, DLD-1, and SW620) were cultured in 3D as spheroids. To generate the spheroids, adherent cells were detached using 1× trypsin/EDTA (Capricorn Scientific, Ebsdorfergrund, Germany) and counted with a standard hemocytometer. Approximately 2 × 105 cells per well were seeded in a 24-well Nunclon™ Sphera™ Dish (Thermo Fisher Scientific, Waltham, MA, USA), designed for low cell attachment, containing 1 mL of the complete culture medium described earlier. Spheroids were cultured for 7 days in a humidified incubator at 37 °C with 5% CO2. To maintain nutrient levels and remove dead cells, media changes were performed every 2–3 days. During incubation, spheroids were monitored daily under a phase-contrast microscope for shape, growth and compactness. Compact spheroids were defined based on the following morphological criteria: spherical morphology with clearly defined borders, absence of fragmented edges or loosely attached cells, and uniform in size distribution within biological replicates. Compact spheroids were collected under a microscope to ensure that only live cells, free from debris, were selected for subsequent total RNA extraction.

2.4. Cell Transfection

HCEC-1CT cells were seeded at a density of 2 × 105 and cultured for 24h in DMEM medium without antibiotic/antimycotic solution. Cells were counted using a standard hemocytometer. Transient transfection with the EGFP-KRAS-G12V plasmid (#164925, Addgene, Watertown, MA, USA) and control plasmid was performed using LipofectamineTM 3000 (Thermo Fisher Scientific, Waltham, MA, USA) according to the manufacturer’s protocol. Transfection was performed in triplicate. KRAS expression was confirmed by the presence of GFP fluorescence, and cells were collected after 24 h.

2.5. RNA Extraction

Total RNA was extracted from approximately 8 × 106 adherent cells (2D cultured cells) and spheroids (3D cultured cells) from two 24-well plates using the PureLink™ RNA Mini Kit (Thermo Fisher Scientific, Waltham, MA, USA) following the manufacturer’s protocol. RNA from transfected cells was isolated using the same procedure. For RNA isolation from the HCEC-1CT and SW620 cell compartments, the Cytoplasmic & Nuclear RNA Purification Kit (Norgen Biotek Corp., Thorold, ON, Canada) and the Cell Culture Media Exosome Purification and RNA Isolation Midi Kit (Norgen Biotek Corp., Thorold, ON, Canada) were employed, adhering to the respective manufacturer’s protocols. The concentration and purity of the extracted RNA were assessed by measuring absorbance at 260 nm and 280 nm using a BioSpec-nano spectrophotometer (Shimadzu, Kyoto, Japan).

2.6. RNA Sequencing

High-throughput next-generation RNA sequencing was conducted by Novogene (UK) Company Limited (Cambridge, UK). Total RNA from the cultivated spheroids underwent quality control (QC), which included 1% agarose gel electrophoresis, Nanodrop spectrophotometry to assess RNA concentration and purity, and Agilent 2100 analysis to evaluate the RNA Integrity Number. Library preparation involved ribosomal RNA depletion, facilitating RNA enrichment for gene expression profiling of both coding and non-coding transcripts. Sequencing was performed using the Illumina NovaSeq6000 platform, generating paired-end 150 bp reads. Bioinformatics analysis comprised quality control, mapping of the reads to the GRCh38 human reference genome, and quantification of gene expression levels using Novogene’s established pipeline, which provided raw counts. A Sashimi plot of the Binary Alignment Map (BAM) files was generated using the Integrative Genomics Viewer (IGV).

2.7. Quantitative Real-Time PCR (qRT-PCR)

Reverse transcription of total RNA, isolated from both 2D and 3D cultured cells (2 μg) and cell compartments (0.1 μg), into complementary DNA (cDNA) was carried out using the High-Capacity cDNA Reverse Transcription Kit (Applied Biosystems, Waltham, MA, USA), following the manufacturer’s protocol. The reaction conditions included 10 min at 25 °C, 120 min at 37 °C, and 5 min at 85 °C.
Relative expression of PHF19-207 was quantified in triplicate using quantitative real-time PCR (qRT-PCR) with Power SYBR Green PCR Master Mix (Thermo Fisher Scientific, Waltham, MA, USA). The specificity of the amplification products was confirmed by performing a melting curve analysis. Glyceraldehyde-3-phosphate dehydrogenase (GAPDH) served as the endogenous control for all measurements. The forward primer was designed to bind to the retained intron of the PHF19-207 transcript, and the reverse primer targeted the retained intron–exon junction to ensure reaction specificity. The sequences of the primers used for relative quantification are provided in Table 1.
qRT-PCR was conducted using the 7500 Real-Time PCR System (Applied Biosystems, Waltham, MA, USA), and relative quantification was calculated using the 2-dCt method. The reaction conditions consisted of 2 min at 50 °C, 10 min at 95 °C, followed by 40 cycles of 15 s at 95 °C and 1 min at 60 °C. The expression levels of PHF19-207 were determined and normalized to the endogenous control.

2.8. Data Used in the Study

Publicly available high-throughput RNA sequencing data (GSE152562, GSE164541, and GSE254832) were obtained from the National Center for Biotechnology Information’s Gene Expression Omnibus database, NCBI GEO (https://www.ncbi.nlm.nih.gov/geo/) [16,17,18]. The corresponding raw sequencing files in FASTQ format were downloaded from the Sequence Read Archive (SRA) (https://www.ncbi.nlm.nih.gov/sra) under accessions SRP267478, SRP301216, and SRP487433, respectively. Downloads were facilitated using the SRA Explorer web application (https://sra-explorer.info/). The sequencing reads were aligned to the GRCh38 human reference genome, and the expression levels of PHF19 transcripts were quantified using the HISAT2 and StringTie tools [19]. The expression of PHF19-207 in clinical samples was assessed using the UCSC Xena Browser web server (https://xenabrowser.net/), comparing The Cancer Genome Atlas (TCGA) colon adenocarcinoma and Genotype-Tissue Expression (GTEx) colon datasets, with data accessed on 15 November 2024 [11]. Additionally, PHF19-207 expression was evaluated across other solid tumor types using TCGA data available through the UCSC Xena Browser web server.

2.9. Statistical Analysis

Statistical analysis was conducted using GraphPad Prism v9 software. Data are presented as percentages and as the mean ± standard deviation. The distribution of the data was assessed using the Shapiro–Wilk test. Differences between groups were analyzed using the independent samples t-test and analysis of variance (ANOVA), followed by Dunnett’s post hoc test. A p-value of ≤0.05 was considered statistically significant.

3. Results

3.1. In Silico Analysis of the PHF19 Gene Promoters

The analysis of characteristic elements revealed an atypical structure in both analyzed PHF19 gene promoters, located at the genomic positions chr9:120867881-120869881(-1) for the downregulated promoter and chr9:120875433-120877433(-1) for the upregulated promoter. Both promoters were found to be TATA-less, with a variable number of CCAAT and GC boxes. The presence of CpG islands was predicted in both promoters (Figure 1).
Prediction of transcription regulator binding motifs identified the differential presence of the CTF/NFI binding motif in PHF19 gene promoters. A protein family, CTF/NFI, containing a member with dual roles in cell growth, is predicted to have a binding motif in the promoter upregulated in cancer. According to GEPIA, the CTF transcription factor shows lower expression in tumor tissue, with a log2FC value of −2.209.
Of the genetic variants mapped in the promoter regions (9 in the first and 13 in the second promoter), no variants were present in the regions where the transcriptional regulator binding was predicted.

3.2. In Silico Analysis of the PHF19 Gene Transcripts

The PHF19-207 transcript, produced by the promoter upregulated in cancer, was examined using various in silico tools to assess its potential function. Based on the Coding Potential Calculator and LCG Coding Potential Prediction tools, PHF19-207 was classified as non-coding. The RNAfold tool predicted the secondary structure of the PHF19-207 transcript with no repeated elements (Figure 2A). The AnnoLnc2 and lncLocator2 indicated nuclear localization of the PHF19-207 transcript, while the lncLocator predicted its localization mainly in exosomes. The RNA22 tool, used for the prediction of interactions with miRNA, identified several miRNAs that could bind to PHF19-207, mostly within the retained intron (Figure 2).

3.3. Quantification of PHF19 Gene Transcripts by RNA Sequencing in Colon Cell Lines Cultivated in 3D

RNA sequencing identified 10 PHF19 transcripts, each detected in at least one of the analyzed cell lines, with four transcripts deemed non-expressed based on the FPKM threshold of 0.3 (Table 2) [20].
The majority of the transcripts were present at low quantities (<1 FPKM). Transcripts PHF19-201 and PHF19-202 were moderately expressed in all analyzed cell lines (FPKM between 1 and 10). Their expression was increased in cell lines representing late-stage colon cancer (DLD-1 and SW620), but their expression in the cell line representing early-stage colon cancer (HCT 116) was similar to that of a non-malignant cell line (Figure 3). The other two transcripts that showed an increase in expression in colon cell lines were PHF19-207 and PHF19-210. They were elevated in all colon cancer cells, with the elevation more prominent towards the late stage of the disease. RNA sequencing also illustrated differential splicing events in sequenced cell lines (Figure 4).
A comparison of upregulated transcripts showed an alignment in coding sequences between PHF19-207, PHF19-201 and exons 2, 3, and 4 of the referent PHF19-202 is shown in Figure 5. However, transcript PHF19-210 has a coding sequence containing exons 11, 12, and 13 (Figure 5).

3.4. Analysis of Publicly Available Sequencing Data

GSE152562 is the dataset consisting of HCEC-1CT and HCEC-1CT APC knockdown sequencing triplicates. An analysis of the GSE152562 dataset revealed no expression of the PHF19-207 transcript, either before or after knockdown of the APC gene (Figure 6A). However, an analysis of the expression of this transcript in mucosa, adenoma, and tumor colon tissues (dataset GSE164541) showed a prominent increase in expression of PHF19-207 (Figure 6B). This dataset also showed moderate upregulation of PHF19-201 and PHF19-202 in adenoma and tumor tissue in comparison to mucosa. However, no trend was observed in the expression of PHF19-210 among healthy mucosa, adenoma and tumor tissues. GSE254832 is the dataset consisting of RNA sequencing data of the HCT 116 colorectal cancer cell line and its KRAS knockdown transfectant. The HCT 116 cell line and HCT 116 KRAS knockdown transfectant showed different but not statistically significant changes in the expression of PHF19-207 transcript (Figure 6C). The UCSC Xena Browser web server indicated down-regulation of PHF19-207 in GTEx normal colon tissue and its upregulation in TCGA colon cancer tissue samples, with a p-value of 1.442 × 10−9. According to an evaluation of PHF19-207 expression in other types of solid tumors from TCGA data, this transcript is mostly expressed in colon and rectum tumor tissues with mean FPKM values of 0.371 and 0.351, respectively. The TCGA data from cancer tissues originating from the cervix, esophagus, stomach, thymus, testis, lung, brain, skin, bile duct, and pancreas exhibit lower mean FPKM values, ranging from 0.2 to 0.1.

3.5. Expression Analysis of PHF19-207 in Colon Cell Lines Cultivated in 2D and 3D by qPCR

The relative abundance of the PHF19-207 transcript was analyzed in both human non-malignant and malignant colon cell lines cultured in 2D and 3D. In the 2D-cultured colon cell lines, the expression of PHF19-207 was relatively low in the non-malignant HCEC-1CT cell line, while it was elevated in all malignant cell lines (Figure 7A,B). Notably, the increase in PHF19-207 expression was more pronounced in cell lines representing advanced stages of colon cancer. In 3D-cultured colon cell lines, RNA sequencing results confirmed this trend, with malignant cell lines exhibiting higher expression levels compared to non-malignant cell lines. The relative abundance of PHF19-207 in HCEC-1CT cells with overexpressed KRAS showed significant expression of PHF19-207 compared to HCEC-1CT cells (p = 0.0032) (Figure 7C). Transcript PHF19-207 was detected in the nucleus of both HCEC-1CT and SW620, and the exosomes of SW620.

4. Discussion

Alterations in alternative transcription initiation have been observed in various pathologies, including cancer [21,22]. The diagnostic and prognostic potential of alternative promoters and transcripts has been established in several cancers, including colorectal cancer, multiple myeloma, prostate cancer, and hepatocellular carcinoma [23,24,25,26]. This study was conducted using cell lines, high-throughput sequencing and computational tools to evaluate the potential of the transcript PHF19-207 as a biomarker for early colon cancer and to explore its possible role in tumor promotion. The hypothesis on its involvement in the early stages of colon cancer was derived from a previous comprehensive study that had screened for the deregulation in the genes’ promoter activity between tumor and non-tumor tissue and found deregulation in the activity of the PHF19 gene promoters [1]. The promoter down-regulated in colon cancer tissue was found to be upregulated in glioblastoma. The other promoter was found to be upregulated in colon and rectal cancer, kidney cancer, stomach cancer, and chronic lymphocytic leukemia.
The presence of characteristic motifs was similar in the two analysed promoters of the PHF19 gene. According to in silico predictions, both promoters are located within CpG islands. A previous study investigated transcriptional activity of gene promoters in colon cancer using the H3K4me3 mark [4]. Results of that study overlap with findings that the upregulated promoter is located in a transcriptionally active region of the PHF19 gene.
The potential binding of transcriptional regulators to the promoter sequences was assessed using four distinct bioinformatics tools. Regulators predicted by at least two of these tools to bind to either of the promoters were considered for further analysis. Based on the presence of their binding motifs in the promoter sequences, the CTF family was predicted to bind to the promoters upregulated in colon cancer. Data from GEPIA showed that the expression level of CTF is lower in colon cancer compared to the healthy gut mucosa. Lower expression of CTF proteins in combination with upregulation of the second promoter in colon cancer can be explained by their dual activity, since the family includes both activators and repressors. Since NFI/CTF transcription factors have both oncogenic and tumor suppressor potential, depending on the type of carcinoma, their role in regulating PHF19 gene promoters should be further investigated [27].
Transcript PHF19-207 is 888 nucleotides long and classified as protein-coding according to Ensembl. Its computationally mapped protein isoform consists of 106 amino acids. In silico evaluation of PHF19-207 suggests that this RNA may be non-coding rather than coding. It has low coding probability according to the LCG Coding Potential Prediction tool and Coding Potential Calculator tool, and the AnnoLnc2 tool indicates its localization in the nucleus. Although the localization data are not available for colon cell lines, they are consistent for a variety of other tissues, indicating that this transcript is predominantly retained in the nucleus regardless of the tissue. Another bioinformatic tool, lncLocator, predicts the localization of this transcript in exosomes. In silico data also point to the upregulation of PHF19-207 in colon cancer tissue samples in comparison to normal colon mucosa. Overall, in silico data indicate that PHF19-207 may be a long non-coding RNA involved in gene regulation and/or signalling. Additionally, its role in communication between colon cancer cells and the tumor microenvironment can include the RNA fluorescence in situ hybridization method.
In silico data also predict binding of nine microRNA molecules for the transcript PHF19-207. These microRNAs are predicted to bind mainly towards 5′ (within intron 1) and 3′ ends of the transcript, and only a couple of them have overlapping binding sites. Most of the miRNAs bind to the predicted loops of the RNA secondary structure. For some of the microRNA molecules predicted to bind to PHF19-207, anti-tumor roles were demonstrated, while others are not yet characterized [28,29]. These data suggest that the retained intron of the PHF19-207 transcript may act as a microRNA sponge, which is in line with its proposed tumor-promoting role in colon tumorigenesis. MicroRNA hsa-6721-5p has four binding sites in the PHF19-207 sequence. Simultaneous binding of multiple miR-6721-5p may modulate transcript secondary structure and stability, suggesting a possible anti-tumor role of this miRNA. As such, the retained intron of the PHF19-207 transcript could be involved in the regulation of colon carcinogenesis, and further study of functional properties is required for understanding its role.
However, according to the Ensembl database, this transcript is protein-coding. The translated protein of this transcript is suggested to be 106 amino acids long. In comparison with the reference transcript, which transcribes a protein with 580 amino acids, this protein might have different roles in cells. Recently, studies suggested the existence of small open reading frames (sORF) on long non-coding RNA molecules that are engaged by ribosomes [30]. Micropeptides originating from these ORFs deviate from canonical peptide sequences and are, on average, about 100 amino acids long. PHF19-207 predicted peptide fits this description. The hypothesis that PHF19-207 is a protein-coding long non-coding RNA should be further investigated. Confirmation of this hypothesis would require analysis of ribosome recruiting on this transcript, mass spectrometry, in vivo translation, and custom-made antibodies for Western blot [31].
The results of the public data showed no expression of PHF19-207 in either the wild-type or APC knockdown HCEC-1CT cell line [16]. It can be assumed that this transcript is not the result of a first-ever genetic alteration in the canonical colorectal carcinogenesis pathway. However, we have to consider that the technique used in that study, Illumina NextSeq 500, does not have the same sequencing depth as the NovaSeq 6000 used in our study. This may explain why we quantified the lowly expressed PHF19-207 transcript in the HCEC-1CT cell line and observed upregulation in cell lines representing other stages of tumor development.
Results of public sequencing data also showed no statistically significant changes in PHF19-207 expression between HCT 116 cell lines with wild-type and KRAS knockdown [18]. Within that study, RNA of wild-type and transfected cell lines was sequenced in duplicates. Considering that, we overexpressed GFP-labeled KRAS G12V mutant peptide in the normal colon mucosa cell line HCEC-1CT. Results showed statistically significant changes in PHF19-207 expression in cell lines with overexpressed, mutated KRAS versus without overexpressed KRAS. This experiment suggests that the KRAS mutation could be one of the first genetic deregulations that drive upregulation of the PHF19-207 transcript.
Analysis of publicly available sequencing data showed a slight increase in the expression of PHF19-207 in the tumor in comparison to normal tissue, but without statistical significance [17]. The study that produced this public data analyzed triplicate tissue samples from five patients with colorectal cancer. A comparison between TCGA and GTEx data suggested that there is a difference between tumor and normal gut mucosa. However, this data did not include adenomas and showed a clear difference in tumor staging. We suggest further research on the expression of this transcript in clinical samples in larger patient groups with tumors in different stages and more sensitive methods, such as ddPCR.
The PHF19-201 transcript is shown to be significantly expressed in the HCEC-1CT APC knockdown cell line. Our data showed similar expression between HCEC-1CT and the HCT 116 cell line. However, its expression in DLD-1 and SW620 was significantly upregulated. Considering that the HCT 116 cell line has a functional APC gene, we can conclude that this transcript could be a marker of APC deregulation in colon cancer.
The results of the transcriptional profiling confirm biomarker potential and are also in line with the proposed tumor-promoting role of the PHF19-207 transcript. The expression analysis of cells cultured in 2D, conducted using qPCR, revealed that PHF19-207 expression was elevated in all malignant cell lines compared to the non-malignant HCEC-1CT cell line (by 2 to 5-fold). A more significant increase in expression was observed in the cell lines derived from advanced stages of colon tumors (HCT 116 Dukes’ A category; HT-29, CaCo-2 and SW480 Dukes’ B category; DLD-1 and SW620 Dukes’ C category). Similar results were obtained when the expression of PHF19 gene transcripts was analysed in cells cultivated in 3D using RNA sequencing, where PHF19-207 expression was elevated 2 to 7.5-fold in the malignant cells vs. the non-malignant cell line. Also, aberrant splicing events are occurring more in cells representing later stages of colon cancer. Cell lines were cultured in 3D to ensure that the resulting transcriptomes accurately represent those of cells in their native environment. PHF19-207 was detected in the nucleus of both HCEC-1CT and SW620 cell lines, and SW620 exosomes using qPCR, which also validated the in silico prediction results of transcript localization. Exosomes are one of the regulators of cell-to-cell communication [32]. Their role in cancer development and aggressiveness is demonstrated in breast cancer [33]. Increased secretion of PHF19-207 via exosomes in colon cancer could elucidate its mechanism of action in cancer development and should be further explored.
Transcript PHF19-210 showed a similar expression pattern as PHF19-207 in expression data from cell lines, and considering its undefined coding sequence and length (588 nucleotides), it may also be considered by future studies as non-coding RNA with a potential role in tumorigenesis. Its expression from clinical NGS data does not suggest it could be used as a biomarker.
The results of this study provide us with detailed data on PHF19 expression through the development of colon cancer and suggest the potential use of PHF19-207 as a biomarker of early colon cancer. However, several limitations should be acknowledged. First, the findings of this study rely primarily on data from established cell lines and publicly available RNA sequencing data, which may not fully capture the complexity or heterogeneity of primary tumor tissues. Additionally, while differential expression and splicing patterns of PHF19-207 are clearly demonstrated, functional validation experiments are required, and, therefore, the biological role of this isoform remains speculative. Although PHF19-207 contains a retained intron, its consistent and elevated expression across samples suggests that it is not subject to effective nonsense-mediated decay (NMD), supporting the potential functional relevance of this isoform. Previous studies illustrate that non-coding transcripts originating from loci of protein-coding genes could have roles in different molecular processes in a malignant cell [34]. High-risk patients undergoing screening for colorectal cancer could benefit the most from the implementation of early colon cancer biomarkers, such as PHF19-207. With further functional characterization and description of transcript behavior in tumor cells under therapeutics, we could estimate the different aspects of biomarker potential.

5. Conclusions

This study has demonstrated the potential of the transcript PHF19-207 for early colon cancer detection and proposes a dual role of PHF19-207 that should be further investigated. Its retained intron could function as an miRNA sponge, and interaction partners of this transcript should therefore be analyzed. The presence of the sORF in its sequence suggests the potential existence of a micropeptide, which requires experimental validation. The consistent differential expression between malignant vs. non-malignant cell lines and tumor vs. normal tissue samples confirms its biomarker potential. Further studies should aim to investigate the functional relevance of the PHF19-207 transcript in colorectal cancer.

Author Contributions

Conceptualization: D.P. and A.N.; Data curation: D.P., S.I., K.P. and S.D.; Formal analysis: D.P., T.B., S.I., K.P., S.D. and A.N.; Funding acquisition: A.N.; Investigation: D.P., T.B., S.I., K.P. and S.D.; Supervision: A.N.; Visualization: D.P.; Writing—original draft: D.P., T.B., S.I., K.P. and S.D.; Writing—review and editing: D.P., T.B., S.D. and A.N. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Science Fund of the Republic of Serbia, PROMIS, #6052315, SENSOGENE.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The dataset is available upon request from the authors.

Conflicts of Interest

The authors declare they have no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
PHF19PHD finger protein 19
CRCColorectal cancer
PRC2Polycomb repressive complex 2
TSSTranscription start site
IGVIntegrative Genomics Viewer
miRNAMicroRNA molecule
DMEMDulbecco′s Modified Medium
FBSFetal bovine serum
bpBase pairs
GAPDHGlyceraldehyde-3-phosphate dehydrogenase
NCBI GEONational Center for Biotechnology Information’s Gene Expression Omnibus
ANOVAAnalysis of variance
NFI/CTFNuclear Factor I/CAAT box transcription factor
sORFSmall open reading frame

References

  1. Demircioğlu, D.; Cukuroglu, E.; Kindermans, M.; Nandi, T.; Calabrese, C.; Fonseca, N.A.; Kahles, A.; Lehmann, K.-V.; Stegle, O.; Brazma, A.; et al. A Pan-Cancer Transcriptome Analysis Reveals Pervasive Regulation through Alternative Promoters. Cell 2019, 178, 1465–1477.e17. [Google Scholar] [CrossRef] [PubMed]
  2. Lu, R.; Wang, G.G. Tudor: A versatile family of histone methylation ‘readers’. Trends Biochem. Sci. 2013, 38, 546–555. [Google Scholar] [CrossRef]
  3. Jain, P.; Ballare, C.; Blanco, E.; Vizan, P.; Di Croce, L. PHF19 mediated regulation of proliferation and invasiveness in prostate cancer cells. Elife 2020, 9, e51373. [Google Scholar] [CrossRef] [PubMed]
  4. Li, Q.-L.; Lin, X.; Yu, Y.-L.; Chen, L.; Hu, Q.-X.; Chen, M.; Cao, N.; Zhao, C.; Wang, C.-Y.; Huang, C.-W.; et al. Genome-wide profiling in colorectal cancer identifies PHF19 and TBC1D16 as oncogenic super enhancers. Nat. Commun. 2021, 12, 6407. [Google Scholar] [CrossRef] [PubMed]
  5. Li, P.; Sun, J.; Ruan, Y.; Song, L. High PHD Finger Protein 19 (PHF19) expression predicts poor prognosis in colorectal cancer: A retrospective study. PeerJ 2021, 9, e11551. [Google Scholar] [CrossRef]
  6. Robinson, J.T.; Thorvaldsdóttir, H.; Winckler, W.; Guttman, M.; Lander, E.S.; Getz, G.; Mesirov, J.P. Integrative genomics viewer. Nat. Biotechnol. 2011, 29, 24–26. [Google Scholar] [CrossRef]
  7. Li, L.-C.; Dahiya, R. MethPrimer: Designing primers for methylation PCRs. Bioinformatics 2002, 18, 1427–1431. [Google Scholar] [CrossRef]
  8. Farré, D.; Roset, R.; Huerta, M.; Adsuara, J.E.; Roselló, L.; Albà, M.M.; Messeguer, X. Identification of patterns in biological sequences at the ALGGEN server: PROMO and MALGEN. Nucleic Acids Res. 2003, 31, 3651–3653. [Google Scholar] [CrossRef]
  9. Gearing, L.J.; Cumming, H.E.; Chapman, R.; Finkel, A.M.; Woodhouse, I.B.; Luu, K.; Gould, J.A.; Forster, S.C.; Hertzog, P.J.; Helmer-Citterich, M. CiiiDER: A tool for predicting and analysing transcription factor binding sites. PLoS ONE 2019, 14, e0215495. [Google Scholar] [CrossRef]
  10. Tsunoda, T.; Takagi, T. Estimating transcription factor bindability on DNA. Bioinformatics 1999, 15, 622–630. [Google Scholar] [CrossRef]
  11. Cao, Z.; Pan, X.; Yang, Y.; Huang, Y.; Shen, H.-B.; Hancock, J. The lncLocator: A subcellular localization predictor for long non-coding RNAs based on a stacked ensemble classifier. Bioinformatics 2018, 34, 2185–2194. [Google Scholar] [CrossRef]
  12. Chen, Y.; Wang, X. miRDB: An online database for prediction of functional microRNA targets. Nucleic Acids Res. 2020, 48, D127–D131. [Google Scholar] [CrossRef] [PubMed]
  13. Liu, W.; Wang, X. Prediction of functional microRNA targets by integrative modeling of microRNA binding and target expression data. Genome Biol. 2019, 20, 18. [Google Scholar] [CrossRef] [PubMed]
  14. Miranda, K.C.; Huynh, T.; Tay, Y.; Ang, Y.-S.; Tam, W.-L.; Thomson, A.M.; Lim, B.; Rigoutsos, I. A pattern-based method for the identification of MicroRNA binding sites and their corresponding heteroduplexes. Cell 2006, 126, 1203–1217. [Google Scholar] [CrossRef]
  15. Rehmsmeier, M.; Steffen, P.; Höchsmann, M.; Giegerich, R. Fast and effective prediction of microRNA/target duplexes. RNA 2004, 10, 1507–1517. [Google Scholar] [CrossRef]
  16. Choi, J.; Gong, J.R.; Hwang, C.Y.; Joung, C.Y.; Lee, S.; Cho, K.H. A Systems Biology Approach to Identifying a Master Regulator That Can Transform the Fast Growing Cellular State to a Slowly Growing One in Early Colorectal Cancer Development Model. Front. Genet. 2020, 11, 570546. [Google Scholar] [CrossRef] [PubMed]
  17. Hong, Q.; Li, B.; Cai, X.; Lv, Z.; Cai, S.; Zhong, Y.; Wen, B. Transcriptomic Analyses of the Adenoma-Carcinoma Sequence Identify Hallmarks Associated with the Onset of Colorectal Cancer. Front. Oncol. 2021, 11, 704531. [Google Scholar] [CrossRef]
  18. Martins, F.; Machado, A.L.; Ribeiro, A.; Oliveira, S.M.; Carvalho, J.; Matthiesen, R.; Backman, V.; Velho, S. KRAS silencing impacts chromatin organization and transcriptional activity in colorectal cancer cells. Res. Sq. 2024. [Google Scholar] [CrossRef]
  19. Kim, D.; Paggi, J.M.; Park, C.; Bennett, C.; Salzberg, S.L. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat. Biotechnol. 2019, 37, 907–915. [Google Scholar] [CrossRef]
  20. Mortazavi, A.; Williams, B.A.; McCue, K.; Schaeffer, L.; Weld, B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat. Methods 2008, 5, 621–628. [Google Scholar] [CrossRef]
  21. Davuluri, R.V.; Suzuki, Y.; Sugano, S.; Plass, C.; Huang, T.H.-M. The functional consequences of alternative promoter use in mammalian genomes. Trends Genet. 2008, 24, 167–177. [Google Scholar] [CrossRef]
  22. Li, S.; Hu, Z.; Zhao, Y.; Huang, S.; He, X. Transcriptome-Wide Analysis Reveals the Landscape of Aberrant Alternative Splicing Events in Liver Cancer. Hepatology 2019, 69, 359–375. [Google Scholar] [CrossRef] [PubMed]
  23. Valcárcel, L.V.; Amundarain, A.; Kulis, M.; Charalampopoulou, S.; Melnick, A.; Miguel, J.S.; Martín-Subero, J.I.; Planes, F.J.; Agirre, X.; Prosper, F. Gene expression derived from alternative promoters improves prognostic stratification in multiple myeloma. Leukemia 2021, 35, 3012–3016. [Google Scholar] [CrossRef] [PubMed]
  24. Lévesque, E.; Labriet, A.; Hovington, H.; Allain, É.P.; Melo-Garcia, L.; Rouleau, M.; Brisson, H.; Turcotte, V.; Caron, P.; Villeneuve, L.; et al. Alternative promoters control UGT2B17-dependent androgen catabolism in prostate cancer and its influence on progression. Br. J. Cancer 2020, 122, 1068–1076. [Google Scholar] [CrossRef] [PubMed]
  25. Dong, Y.; Liu, X.; Jiang, B.; Wei, S.; Xiang, B.; Liao, R.; Wang, Q.; He, X. A Genome-Wide Investigation of Effects of Aberrant DNA Methylation on the Usage of Alternative Promoters in Hepatocellular Carcinoma. Front. Oncol. 2021, 11, 780266. [Google Scholar] [CrossRef]
  26. Thorsen, K.; Schepeler, T.; Øster, B.; Rasmussen, M.H.; Vang, S.; Wang, K.; Hansen, K.Q.; Lamy, P.; Pedersen, J.S.; Eller, A.; et al. Tumor-specific usage of alternative transcription start sites in colorectal cancer identified by genome-wide exon array analysis. BMC Genomics. 2011, 12, 505. [Google Scholar] [CrossRef]
  27. Chen, K.-S.; Lim, J.W.; Richards, L.J.; Bunt, J. The convergent roles of the nuclear factor I transcription factors in development and cancer. Cancer Lett. 2017, 410, 124–138. [Google Scholar] [CrossRef]
  28. Ulusan Bağcı, Ö.; Caner, A. miRNA Expression Profile in Ileocecal Adenocarcinoma Cells Infected with Cryptosporidium. Mikrobiyol. Bul. 2022, 56, 449–465. [Google Scholar] [CrossRef]
  29. Matuszyk, J. MALAT1-miRNAs network regulate thymidylate synthase and affect 5FU-based chemotherapy. Mol. Med. 2022, 28, 89. [Google Scholar] [CrossRef]
  30. Patraquim, P.; Magny, E.G.; Pueyo, J.I.; Platero, A.I.; Couso, J.P. Translation and natural selection of micropeptides from long non-canonical RNAs. Nat. Commun. 2022, 13, 6515. [Google Scholar] [CrossRef]
  31. Hofman, D.A.; Prensner, J.R.; van Heesch, S. Microproteins in cancer: Identification, biological functions, and clinical implications. Trends Genet. 2024, 41, 146–161. Available online: https://www.cell.com/trends/genetics/fulltext/S0168-9525(24)00211-7 (accessed on 15 November 2024). [CrossRef]
  32. McAndrews, K.M.; Kalluri, R. Mechanisms associated with biogenesis of exosomes in cancer. Mol. Cancer 2019, 18, 52. [Google Scholar] [CrossRef] [PubMed]
  33. Hoshino, D.; Kirkbride, K.C.; Costello, K.; Clark, E.S.; Sinha, S.; Grega-Larson, N.; Tyska, M.J.; Weaver, A.M. Exosome secretion is enhanced by invadopodia and drives invasive behavior. Cell Rep. 2013, 5, 1159–1168. [Google Scholar] [CrossRef] [PubMed]
  34. Babic, T.; Ugrin, M.; Jeremic, S.; Kojic, M.; Dinic, J.; Djeri, B.B.; Zoidakis, J.; Nikolic, A. Dysregulation of transcripts SMAD4-209 and SMAD4-213 and their respective promoters in colon cancer cell lines. J. Cancer 2024, 15, 5118–5131. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Schematic representation of analyzed promoter sequences with positions relative to TSS of predicted CpG islands and CTF/NFI transcription factors binding site. (A) Downregulated promoter sequence showing two CpG islands. The common transcription start site (TSS) of four transcripts is shown within one CpG island. (B) Upregulated promoter shows two CpG islands and a CTF/NFI binding site near TSS, giving rise to the PHF19-207 transcript.
Figure 1. Schematic representation of analyzed promoter sequences with positions relative to TSS of predicted CpG islands and CTF/NFI transcription factors binding site. (A) Downregulated promoter sequence showing two CpG islands. The common transcription start site (TSS) of four transcripts is shown within one CpG island. (B) Upregulated promoter shows two CpG islands and a CTF/NFI binding site near TSS, giving rise to the PHF19-207 transcript.
Biomolecules 15 00957 g001
Figure 2. (A) Predicted secondary structure of the PHF19-207. The retained intron of the PHF19-207 transcript is highlighted with pink dots. The miRNA binding positions are illustrated with black lines. (B) Schematic representation of the microRNA predicted binding sites in the PHF19-207 transcript primary sequence.
Figure 2. (A) Predicted secondary structure of the PHF19-207. The retained intron of the PHF19-207 transcript is highlighted with pink dots. The miRNA binding positions are illustrated with black lines. (B) Schematic representation of the microRNA predicted binding sites in the PHF19-207 transcript primary sequence.
Biomolecules 15 00957 g002
Figure 3. Expression of the PHF19 transcripts that exhibit a pattern of increased expression in colon cancer cell lines (early-stage colon cancer HCT 116 and late-stage colon cancer DLD-1 and SW620) compared with the non-malignant HCEC-1CT colon cell line, cultivated in 3D, measured by RNA sequencing.
Figure 3. Expression of the PHF19 transcripts that exhibit a pattern of increased expression in colon cancer cell lines (early-stage colon cancer HCT 116 and late-stage colon cancer DLD-1 and SW620) compared with the non-malignant HCEC-1CT colon cell line, cultivated in 3D, measured by RNA sequencing.
Biomolecules 15 00957 g003
Figure 4. Sashimi plot of the PHF19 gene region showing increased utilization of retained intron sequence of the PHF19-207 transcript in colon cancer cell lines. The retained intron of the PHF19-207 is highlighted with *.
Figure 4. Sashimi plot of the PHF19 gene region showing increased utilization of retained intron sequence of the PHF19-207 transcript in colon cancer cell lines. The retained intron of the PHF19-207 is highlighted with *.
Biomolecules 15 00957 g004
Figure 5. The sequence structure of transcripts with an observed increased expression during colon cancer stages, with their shared exons (E) and retained introns (I). The transcripts are aligned with the main PHF19-202 transcript.
Figure 5. The sequence structure of transcripts with an observed increased expression during colon cancer stages, with their shared exons (E) and retained introns (I). The transcripts are aligned with the main PHF19-202 transcript.
Biomolecules 15 00957 g005
Figure 6. Expression of PHF19 transcripts measured with RNA sequencing. (A) PHF19 transcripts expressed in HCEC-1CT and HCEC-1CT APC knockdown cells (GSE152562), (B) PHF19 transcripts in normal, adenoma and tumor tissue (GSE164541), and (C) HCT116 and HCT116 KRAS knockdown cells (GSE254832).
Figure 6. Expression of PHF19 transcripts measured with RNA sequencing. (A) PHF19 transcripts expressed in HCEC-1CT and HCEC-1CT APC knockdown cells (GSE152562), (B) PHF19 transcripts in normal, adenoma and tumor tissue (GSE164541), and (C) HCT116 and HCT116 KRAS knockdown cells (GSE254832).
Biomolecules 15 00957 g006
Figure 7. Expression of PHF19-207 in colon cell lines cultivated in 2D (A), 3D (B), and HCEC-1CT cells with overexpressed KRAS (C) measured by qPCR. Statistical significance is shown with symbols: **—p ≤ 0.01, ***—p ≤ 0.001, ****—p ≤ 0.0001.
Figure 7. Expression of PHF19-207 in colon cell lines cultivated in 2D (A), 3D (B), and HCEC-1CT cells with overexpressed KRAS (C) measured by qPCR. Statistical significance is shown with symbols: **—p ≤ 0.01, ***—p ≤ 0.001, ****—p ≤ 0.0001.
Biomolecules 15 00957 g007
Table 1. The primer sequences used for qRT-PCR.
Table 1. The primer sequences used for qRT-PCR.
TargetForward PrimerReverse Primer
PHF19-2075′-GATAGTCACAACACCAGGTGCC-3′5′-CTTCCCCTGACACTGGCTCC-3′
GAPDH5′-GTGAAGGTCGGAGTCAACG-3′5′-TGAGGTCAATGAAGGGGTC-3′
Table 2. IDs and names (Ensembl GRCh38.p13 assembly) of the identified transcripts using the RNA sequencing of the 3D cultivated cell lines.
Table 2. IDs and names (Ensembl GRCh38.p13 assembly) of the identified transcripts using the RNA sequencing of the 3D cultivated cell lines.
Transcript IDTranscript Name
ENST00000312189PHF19-201
ENST00000373896PHF19-202
ENST00000436309PHF19-204
ENST00000439674PHF19-205
ENST00000456291PHF19-207
ENST00000462229PHF19-208
ENST00000464712PHF19-209
ENST00000467266PHF19-210
ENST00000474402PHF19-211
ENST00000487555PHF19-213
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Pavlovic, D.; Babic, T.; Ignjatovic, S.; Pavlovic, K.; Dragicevic, S.; Nikolic, A. Transcript PHF19-207 May Be a Long Non-Coding RNA with Tumor-Promoting Role in Colon Cancer. Biomolecules 2025, 15, 957. https://doi.org/10.3390/biom15070957

AMA Style

Pavlovic D, Babic T, Ignjatovic S, Pavlovic K, Dragicevic S, Nikolic A. Transcript PHF19-207 May Be a Long Non-Coding RNA with Tumor-Promoting Role in Colon Cancer. Biomolecules. 2025; 15(7):957. https://doi.org/10.3390/biom15070957

Chicago/Turabian Style

Pavlovic, Dunja, Tamara Babic, Sofija Ignjatovic, Katarina Pavlovic, Sandra Dragicevic, and Aleksandra Nikolic. 2025. "Transcript PHF19-207 May Be a Long Non-Coding RNA with Tumor-Promoting Role in Colon Cancer" Biomolecules 15, no. 7: 957. https://doi.org/10.3390/biom15070957

APA Style

Pavlovic, D., Babic, T., Ignjatovic, S., Pavlovic, K., Dragicevic, S., & Nikolic, A. (2025). Transcript PHF19-207 May Be a Long Non-Coding RNA with Tumor-Promoting Role in Colon Cancer. Biomolecules, 15(7), 957. https://doi.org/10.3390/biom15070957

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop