The World of Pseudogenes: New Diagnostic and Therapeutic Targets in Cancers or Still Mystery Molecules?

Pseudogenes were once considered as “junk DNA”, due to loss of their functions as a result of the accumulation of mutations, such as frameshift and presence of premature stop-codons and relocation of genes to inactive heterochromatin regions of the genome. Pseudogenes are divided into two large groups, processed and unprocessed, according to their primary structure and origin. Only 10% of all pseudogenes are transcribed into RNAs and participate in the regulation of parental gene expression at both transcriptional and translational levels through senseRNA (sRNA) and antisense RNA (asRNA). In this review, about 150 pseudogenes in the different types of cancers were analyzed. Part of these pseudogenes seem to be useful in molecular diagnostics and can be detected in various types of biological material including tissue as well as biological fluids (liquid biopsy) using different detection methods. The number of pseudogenes, as well as their function in the human genome, is still unknown. However, thanks to the development of various technologies and bioinformatic tools, it was revealed so far that pseudogenes are involved in the development and progression of certain diseases, especially in cancer.


Pseudogene Transcripts
The pseudogene is a copy of a gene that has lost its original function due to the accumulation of mutations, such as frameshift and the presence of premature stop-codons and relocation of genes to inactive heterochromatin regions of the genome [1]. The first study about these molecules was performed by Jacq et al. when they reported the existence of a group of untranscribed genomic sequences homologous to the 5S DNA in Xenopus laevis [2]. After that, pseudogenes have been identified to be widely present in the genomes of most organisms, ranging from prokaryotes to eukaryotes [3,4]. At first, they were Life 2021, 11, 1354 2 of 53 branded as non-coding, "junk DNA". However, experimental data obtained during recent years indicate that 10% of approximately 16,000 identified pseudogenes are transcribed, and roughly 19% of known human lncRNAs are the products of pseudogene transcription [5][6][7]. Pseudogenes are divided into two large groups according to their primary structure and origin: processed and unprocessed. The first ones are formed by integration into new genome sites of cDNAs produced by the reverse transcription of parental genes. Due to this reason, processed pseudogenes do not contain introns. The majority of these molecules have a poly(A) sequence at the 3 end due to the mRNA 3 end polyadenylation process. In addition, such pseudogenes are flanked by duplicated integration sites 5 to 20 bp in length. Dong et al. identified a subgroup of processed pseudogenes that are a result of circ-RNA transcription. Such pseudogenes usually lack the 3 end poly(A) sequences. Moreover, they feature the reverse order of introns as compared to the original mRNAs [8].
The second group of pseudogenes, in comparison to processed pseudogenes, contain in their sequence introns and can be unitary (orphan) or duplicated. Unitary pseudogenes are derived from single-copy functional genes, which accumulated spontaneous mutations during evolution and have lost their primary functions. Therefore, unitary pseudogenes have no paralogs in the same genome but may have orthologs in the relative species [9]. Duplicated pseudogenes arise from tandem duplications of genes during an unequal crossing-over process. The duplicated gene can undergo further mutations, which convert it into a completely new pseudogene. Because of the mechanism of origin, duplicated pseudogenes are situated on the same chromosomes as their parental genes [10]. The origin of the pseudogenes in the genome is shown in Figure 1.

Pseudogene Functions
Pseudogene transcripts were thought to be non-functional transcription noise. One of the probable reasons for this perception of pseudogene functions was the assumption that these regions are in principle non-functional, which meant that they were not studied in this regard [11]. However, as is often the case in science, random results or the insight of researchers have led to more and more data pointing to the functionality of pseudogenes. It is known that some pseudogenes take part in many different important biological processes such as immunological response, catalytic reactions, signaling pathway regulations, in the process of architecture changes of chromatin or genome, and functions as transcription and translation factors, elements of gene conversion, dimerization factors, stabilizing elements, or structural proteins [11]. All of these underline that pseudogenes are important elements of the genome regulatory network. We now know that pseudogenes perform their functions at different levels, which include interaction at the RNA, DNA, and protein levels. The schematic illustration of pseudogenes regulatory function is shown in Figure 2A. The first functional level is interaction and regulation of RNAs molecules. As mentioned earlier, 10% of all pseudogenes are transcribed into RNAs (psRNAs), and that RNAs participate in the regulation of parental gene expression at both transcriptional and translational levels through senseRNA (sRNA) and antisense RNA (asRNA). sRNA regulates the expression of their parental gene mRNA through competition for miRNA. Due to the significant similarity, they share miRNA binding sites, whose binding to miR-NAs ensures the regulatory functions of these RNA molecules in both the nucleus and the cytoplasm [12]. The higher the pseudogene transcription activity, the higher the number of miRNA molecules that bind to its sRNA, which depletes their intracellular pool and reduces suppression of the parental gene expression [13].
psRNAs can compete for the binding not only of miRNAs but also various regulatory proteins and protein complexes, including RNA-binding proteins and transcription factors. In this case, psRNAs can act as decoys. For example, reduced expression of the high mobility group A1 protein (HMGA1) associated with type 2 diabetes may be caused by upregulated transcription of the HMGA1p pseudogene, which competes with the 3 UTR of HMGA1 gene for the protein factor αCP1 critical for the stability of its mRNA [14].
asRNAs are involved in many regulatory mechanisms of their parental genes, Figure 2B. For example, asRNAs can form duplexes with their parental gene sRNAs, which may give rise to siRNAs [15][16][17]. Recently, asRNAs were found to interact with PIWI proteins (piRNA) in animal spermatozoa and germline cells [18,19]. The main function of typical piRNAs is inhibition of transposon activity in germline cells, e.g., at the transcription level, by heterochromatinization of the corresponding genetic loci through methylation of DNA or histones [19]. asRNAs can also enhance the transcription process, e.g., one of six expressed pseudogenes of POU5F1, OCT4pg5, generates asRNA that transports histone methyltransferase to the POU5F1 gene promoter. This process is accompanied by trimethylation of histone H3 Lys27 on the chromatin surrounding the promoter and inhibition of the gene transcription [20]. While POUF5F1 has several pseudogenes, PTENP1 pseudogene can be universal. PTENP1 has three transcripts: one sRNA and two overlapping asRNA isoforms, α and β. Isoform α causes heterochromatinization and repression of PTEN gene promoter, sRNA competes with PTEN mRNA for miRNA, i.e., represents typical ceRNA and positive gene regulator function and isoform β stabilizes sRNA via interaction of its 3 end with the 5 end sequence of the sRNA [21].
Another function of pseudogenes is production of long non-coding RNAs (lncRNAs). These transcripts are long non-coding RNA molecules without protein products but in some cases, short peptides are generated. lncRNAs function as regulators of transcription by activation of specific genes, modulators of protein factors and chromatin, guides for specific ribonucleoprotein complexes as well as scaffolds for specified ribonucleoproteins [22]. It is also postulated that lncRNAs function as molecular sponges for miRNA, e.g., ZFAS1 lncRNA, which regulates miR-150-5p in HNSCC [23]. lncRNAs could probably be used as biomarkers in oncology, but the role of some of these transcripts is not fully understood [22,[24][25][26][27]. Detailed information about lncRNAs is described by us elsewhere [24,28].
It should be emphasized that some evidence is in opposition about the function of pseudogenes as the elements of the ceRNA network and it is postulated that they are true but at unphysiological levels [29,30].
The second type of regulation is the ability to modulate DNA, which is manifested by random insertion of a pseudogene sequence into the parental or other host gene as well as causing DNA sequence exchange between the pseudogene and parental gene [31]. The insertion of pseudogene sequence can cause different biological effects: (i) epigenetic silencing, (ii) initiation of transcription, (iii) genetic fusion, or even (vi) mutagenesis. These modifications induce changes in expression level of specific genes or cause alternative functions of them, which could induce carcinogenesis [32][33][34][35]. Another possibility is exchanging DNA sequences between the pseudogene and parental gene. In this case, the conversion as well as recombination is possible [36,37]. One of the examples of this is the rearrangements between the BRCA1 gene and BRCA1 pseudogene that causes origin of mutated alleles, which lack promoter, are changes in the exons and lack the initiation codon [37]. Exchanging DNA sequences between pseudogene and parental gene strongly influences the genome and could lead to inactivation of suppressor genes or activation of oncogenes [36,37].
The last pseudogene function is the possibility of influencing the genome and transcriptome by protein or peptide. Paradoxically, some pseudogenes such as some lncRNAs have open reading frames and encode proteins or peptides and these products could play a regulative function in a cell. These pseudo-proteins or -peptides could have parental gene-like or -unlike functions, cooperate with parental genes or even activate immune response [31]. One of the examples is PGAM3 pseudogene with protein product with unknown function in humans and classified as processed pseudogene. Another example is OCT4 pseudogenes, which are highly similar to OCT4 gene [38]. Recent studies indicated that the OCT4pg1 protein is involved in changes in cancer phenotype in triple-negative breast cancers by activation of the Notch pathway [39]. Suo et al. observed that OCT4 pseudogenes, Oct4pg5 and Oct4pg1, are transcribed in cancer and regulates the OCT4. Moreover, these pseudogenes probably generated artifactual results about OCT4 [38]. Similar results obtained by Zhao et al. demonstrated that OCT4 pseudogenes, OCT4pg1, OCT4pg3 and OCT4pg4, are transcribed and translated in glioma and breast without OCT4 products [39]. These observations underline the need for further examination and verification of some results and define the role of pseudogenes' proteins. To make it even more interesting, some pseudogenes code not proteins similar to the parent genes, but their truncated forms in the form of peptides. BRAF pseudogene 1 (BRAFP1) has many stop codons and shortened peptides are generated in contrast to translated protein from BRAF gene. Pseudo-BRAF peptide was described in the context of thyroid cancer and activates the MAP kinase signaling pathway, leading to tumorigenesis. Moreover, it was indicated that BRAF pseudogene 1 transcripts were negatively correlated with BRAF mutation [40]. However, other studies indicated that BRAFP1 functions as a competitive endogenous RNA [41]. The last example is the antigen-like function of pseudo-proteins/peptides which possesses the capability of simulation of the immune system. Moreau-Aubrey et al. indicated that the processed pseudogene NA88-A codes for a new antigen recognized by a CD8(+) T cell clone on melanoma. Interestingly, the NA88-A parental gene, HPX42B, codes for hemoprotein and is transcribed in a variety of normal tissues [42].
All of these examples clearly show that pseudogenes are functional molecules which were missed in investigations or naturally deeply hidden in the wide network of cellular interactions between DNA, RNA and protein molecules.

Involvement of Pseudogenes in Cancers
Thanks to the incredible development of next-generation sequencing technology and bioinformatics tools, a large number of pseudogenes have gradually been discovered. As mentioned earlier, pseudogenes can interact in various ways with DNA, RNA, and proteins participating in the modulation of target gene expression, particularly their parental genes. Therefore, these molecules are involved in the development, and progression of certain diseases, especially cancer [43]. Although comprehensive pseudogene studies have just been started, they revealed the broad participation of pseudogenes in cancer development and diagnostics.
Based on available literature data and public databases, selected pseudogenes can be classified as the predictor, inheritance, or prognostic biomarkers. Chosen pseudogenes whose expressions are noticeably changed in the group of cancers located in the abdomen and bones, chest, and head and neck area are presented in Figure 3.

Cancers Located in the Abdomen and Bones
In the abdomen and bones area, 73 pseudogenes in such cancers as bladder carcinoma, cervical carcinoma, colorectal cancer, osteosarcoma, and more, in tissue, plasma, blood, and urine samples have been indicated. In the tissues of acute myeloid leukemia patients, BMI1P1A, OCT4, and POU5F1B are three gene signatures that divide individuals into high-risk and low-risk groups [44]. PA2G4P4 is overexpressed in bladder cancer patient tissues and cell lines [45]. GBP1P1 and PTTG3P were observed in microarray analysis and validated by qRT-PCR in tissues of cervical carcinoma [46]. FTH1P3 and POU5F1B are upregulated in cervical cancer patient samples and cell lines [47,48]. In colon cancer tissues, DUXAP8, RP11-54H7.4, and RP11-138J23.1 show elevated expression in advanced tumor stages [49]. In colorectal cancer tissues, increased KCNQ1OT1 (as well as PNN) is associated with shorter DFS of individuals in stage III treated with 5-FU adjuvant therapy [50]. REG1CP, TPTE2P1, and DUXAP8 are upregulated in colorectal cancer patient samples and cell lines [51][52][53].
In tissue and blood samples from endometrial hyperplasia and carcinomas patients, PTENP1 was methylated in all analyzed tissues, except for the peripheral blood. No differences were determined between the EC and EH groups [54]. In gastric adenocarcinoma patient tissues, PMS2L2 and SFTA1P were found to be downregulated [55,56]. Additionally, three pseudogenes, KRT19P3, ARHGAP27P1, and SFTA1P, had decreased expression levels [56][57][58].
PDIA3P is highly expressed in multiple myeloma (MM) and is associated with the survival rate of patients. PDIA3P regulates MM growth and drug resistance through Glucose 6-phosphate dehydrogenase (G6PD) along with the pentose phosphate pathway (PPP) [77].
New signatures of four pseudogenes, RP11-326A19.5, RP4-706A16.3, RPL7AP28, and RPL11-551L14.1, for osteosarcoma were found, which is a promising independent survival predictor and serves as an important biomarker for clinical treatment of osteosarcoma to improve patient management [78]. MSTO2P is upregulated in osteosarcoma patient samples. We found that individuals with low MSTO2P levels lived longer than those with increased expression. Moreover, individuals with higher stages of osteosarcoma (stage III þ IV) showed elevated expression levels of MSTO2P [79].
In ovarian cancer, decreased expression of SLC6A10P was associated with longer time to recurrence (TTR) [80]. SDHAP1 was found to be overexpressed in patient tissues and cell lines [81]. Both DUXAP8 and DUXAP10 are upregulated in pancreatic carcinoma samples [82,83]. SUMO1P3 expression was increased in pancreatic tissues compared with the corresponding adjacent normal tissues. Additionally, the data indicated that the elevated expression of SUMO1P3 is significantly associated with tumor progression and the poor survival of individuals with pancreatic cancer. SUMO1P3 knockdown may suppress the proliferation, migration, and invasion of pancreatic cancer cells. Furthermore, downregulation of SUMO1P3 suppressed the epithelial-mesenchymal transition (EMT) process and not only increased the expression of epithelial cadherin but also decreased the expression of neuronal cadherin, vimentin, and β-catenin [84]. The unique feature of the KLK4-KLKP1 fusion gene is the conversion of the non-coding KLKP1 pseudogene into the gene encoding the protein and its unique expression in about 30% of high-grade Gleason prostate cancer [85]. All pseudogenes with diagnostic potential are summarized in Table 1.

Cancers Located in the Chest Area
In the cancers located in the chest area, 47 pseudogenes based on analysis of plasmaderived exosomes and tissue samples are described. Higher expression of STXBP5, GALP, and LOC387646 indicated an unfavorable prognosis for breast cancer (BC) patients. We also found that increased CTSLP8 and RPS10P20 along with decreased HLA-K pseudogene expression indicates a poor prognosis. Pseudogene-gene interaction between GPS2-GPS2P1 is prognostic even though neither the gene nor the pseudogene alone is prognostic of survival. miR-3923 was predicted to target GPS2 using miRanda, PicTar, and TargetScan, implying modules of gene-pseudogene-miRNAs that are potentially functionally related to patient survival [86]. Pseudogene HLA-DPB2 and its parental gene HLA-DPB1 are overexpressed and correlated with better BC patient prognosis. The HLA-DPB2/HLA-DPB1 axis is strongly connected with immune-related biological functions. It is associated with high immune infiltration abundance of CD8+ T cells, CD4+ T cells, Tfh, Th1, and NK cells, along with elevated expression of majority biomarkers of monocytes, NK cell, T cell, CD8+ T cell, and Th1 in BC and its subtypes. It clearly indicates that HLA-DPB2 influences the abundance of tumor-infiltrating lymphocytes in the tumor microenvironment. Additionally, HLA-DPB2 and HLA-DPB1 expression is positively correlated with the expression of PD-1, PDL-1, and CTLA-4 [87].
A group of pseudogenes, RP11-480I12.5-004, PCNAP1, PTTG3P, CRYβB2P1, CYP4Z2P, and PDIA3P, was found to be upregulated in BC patients' tissue and cell lines. Knockdown of RP11-480I12.5 reduces cell proliferation and colony formation, induces cell apoptosis, and inhibits tumor growth in vivo. Only overexpression of RP11-480I12.5-004 enhances cell growth both in vitro and in vivo [88]. Knockdown of PCNAP1 suppresses the migration and invasion of cells. It also functions as a competing endogenous ceRNA for miR-340-5p and influences its target SOX4, leading to migration and invasion regulation [89]. PTTG3P in patients with lung adenocarcinoma (LUAD) is connected with shortening the metaphase to anaphase transition in mitosis, increasing cell viability after cisplatin or paclitaxel treatment, and facilitating tumor growth. In addition, it is associated with a poor survival rate of individuals who received chemotherapy. Knockdown of PTTG3P reduces cell mitosis, proliferation, and sensitivity to drugs such as paclitaxel or cisplatin [90]. PTTG3P is associated with BC, and it is negatively correlated with estrogen receptor (ER) and progesterone receptor (PR) status and positively related to basal-like status, triple-negative BC status, Nottingham prognostic index (NPI), and Scarff-Bloom-Richardson grade. It was indicated that its higher expression is associated with an unfavorable prognosis [91]. CRYβB2P1 and CRYβB2 in BC patients enhance tumorigenesis by promoting cell proliferation. Overexpression of CRYβB2 increases invasive cellular behaviors, tumor growth, IL6 production, immune cell chemoattraction, and the expression of metastasis-associated genes [92]. Upregulation of CYP4Z2P-3 UTR or CYP4Z1-3 UTR activates signaling pathways regulating the pluripotency of stem cells, epithelial cancer stem cells, and cell cycle-related genes, and increases the CD44+/CD24− population [93,94]. Knockdown of PDIA3P suppresses cell viability, promotes apoptosis, and inhibits migration and invasion. PDIA3P negatively regulates miR-183 and influences its target ITGB1, thus inducing the activation of FAK/PI3K/AKT/β-catenin signals and affecting tumor growth and metastasis [95].
PTENP1 is downregulated in patient samples and cell lines, especially in advanced and more aggressive forms of BC. It regulates cell proliferation, invasion, tumorigenesis, and chemoresistance to Adriamycin (ADR). CKS1BP7 is amplified in 28.8% of all BC patients, while IGF1R is amplified in 24.2% [96]. PTENP1 activates the phosphatidylinositol-3 kinase (PI3K)/AKT pathway, and PI3K inhibitor LY294002 or siAKT prevents cancer progression [97]. FTH1P3 is upregulated in paclitaxel-resistant BC tissue and cell lines. Knockdown of FTH1P3 decreases the 50% inhibitory concentration value of paclitaxel, induces cell cycle arrest at the G2/M phase, and suppresses tumor growth of paclitaxelresistant BC cells as well as ABCB1 protein expression in vivo [98].
It was found that UGT1A1 and BAIAP2L1 are differentially expressed between LUAD and benign lung disease [99]. PTTG3P and SLC6A10P are upregulated in LUAD patient samples. PTTG3P interacts with the transcription factor FOXM1 to regulate the transcriptional activation of BUB1B. Moreover, it is connected with shortening the metaphase to anaphase transition in mitosis, increasing cell viability after cisplatin or paclitaxel treatment, facilitating the tumor growth, and a poor survival rate for those who received chemotherapy [91]. SLC6A10P is an independent prognostic factor for LUAD individuals. Its higher expression is associated with lymph node metastasis, more advanced tumor stage, and shorter overall survival in non-small cell lung cancer (NSCLC) and LUAD [100].
LINC00908, WWC2-AS2, and CYP2B7P are independent prognostic risk factors for OS, and WWC2-AS2 with SIGLEC17P are independent prognostic risk factors for RFS [101]. SUMO1P3 is upregulated in lung squamous cell carcinoma (LUSC) and LUAD patient samples. It is co-expressed with SUMO1, where higher SUMO1 or SUMO1P3 expression is associated with reduced RFS in the case of individuals with LUAD; however, only SUMO1P3 is the independent prognostic factor. It is also correlated with late clinical stage, lymph node metastasis, distant metastasis, and a poorly differentiated degree [102,103].
A group of pseudogenes, DUXAP8, WTAPP1, FTH1P3, and PDIA3P1, was found to be upregulated in NSCLC tissue samples. DUXAP8 expression is positively related to the cancer grade, and it influences miR-409-3p expression in a sponging-dependent manner and promotes HK2 as well as LDHA expression. Downregulation of DUXAP8 inhibits tumor growth in vivo [104,105]. WTAPP1 is negatively correlated with HAND2-AS1. In contrast to HAND2-AS1, overexpression of WTAPP1 promotes invasion and migration [106]. Higher expression of FTHP3 is closely correlated with worse patient prognosis due to promoting proliferation and invasion. Additionally, knockdown of FTH1P3 represses the tumor growth in vivo [107,108]. Increased expression of PDIA3P1 is connected with an advanced TNM, lymph node metastasis, and shorter DFS time. Knockdown of PDIA3P suppresses the proliferation and invasion as well as reduces tumor growth in vivo [109].
Higher expressions of PMPCAP1 and SOWAHC are associated with unfavorable LUSC patient prognosis. It should be noted that PMPCAP1, as well as SOWAHC and ZNF454, are involved in gene expression and transcription pathways [110]. Pseudogenes described as changes in the cancers located in the chest area are listed and described in Table 1.

Cancers Located in the Head and Neck Area
In the case of cancers located in the head and neck area, only 37 pseudogenes have been described to date. Expression levels of Annexin 2 pseudogenes, ANXA2-P1, ANXA2-P2, ANXA2-P3, and ANXA2, were significantly increased in diffuse glioma. Meanwhile, among four glioma subtypes, it was found that ANXA2P1, ANXA2P2, and ANXA2 are preferentially expressed in the mesenchymal subtype and less expressed in the proneural subtype [111]. ANXA2P2 is upregulated in patient tissues and cells. It was indicated that miR-9 has a negative correlation with the ANXA2P2 mRNA target, and overexpression of this miRNA suppresses the cell proliferation and aerobic glycolysis of glioma cells by binding to LDHA 3 UTR. Knockdown of ANXA2P2 reduces cell proliferation and aerobic glycolysis and downregulates protein levels of glycolysis markers such as GLUT1, HK2, PFK, and LDHA [112].
LGMNP1 was found to be upregulated in glioblastoma tissues. Its high expression enhances proliferation and invasion, which leads to a more aggressive phenotype in cells overexpressing LGMNP1. This pseudogene functionally targets miR-495-3p, in a RISCdependent manner, which targets LGMN (legumain, encodes a cysteine protease that has a strict specificity for hydrolysis of asparaginyl bonds) [113]. DUXAP8 was found to be positively related to the tumor stage in neuroblastoma and is negatively associated with patient survival rate. Its knockdown reduces proliferation, colony formation, cycle, and motility [114]. In glioma and glioblastoma, MT1JP is downregulated in patient tis-sues and cell lines. Its lower expression is associated with cancer progression and poor survival. Overexpression of MT1JP, on the other hand, reduces proliferation and invasion [115]. PDIA3P1 is overexpressed and its expression is connected with tumor degree and transcriptome subtype. Its increased level is correlated with unfavorable patient outcomes, as well as enhanced migration and invasion. PDIA3P1 functions as a ceRNA by sponging miR-124-3p to modulate RELA expression and activate the downstream NF-κB pathway. HIF-1 is confirmed to directly bind to the PDIA3P1 promoter region and activate its transcription [116].
RPSAP52 is upregulated in patient samples, and its elevated expression is connected with shorter survival. The expression level of RPSAP52 is positively correlated with TGF-β1, leading to its upregulation, while silencing of RPSAP52 leads to a decrease in CD133+ cells, which seem to describe the phenotype of cancer-initiating cells [117].
Another five pseudogenes, ANXA2P2, EEF1A1P9, FER1L4, HILS1, and RAET1K, are connected with glioma. They can be used to establish the patient risk signature. The risk signature genes are involved in regulating proliferation, migration, adhesion, ECM receptor interaction, angiogenesis, response to hypoxia (HIF-1 signaling pathway), PI3K/AKT signaling pathway, and apoptosis. Additionally, increased expression of ANXA2P2, FER1L4, HILS1, and RAET1K, as well as lower levels of EEF1A1P9 are connected with unfavorable prognosis [119].
HERC2P2 is positively correlated with survival and negatively associated with the clinical grade of glioma. Overexpression of HERC2P2 reduces migration and colony formation abilities and reduces tumor growth in vivo [120]. FTH1P3 is upregulated in patient samples and cell lines. Overexpression of FTH1P3 promotes glioma cell proliferation and inhibits apoptosis. Additionally, FTH1P3 inhibits miR-224-5p expression, which in turn negatively regulates TPD52 expression. It has been proven that the FTH1P3/miR-224-5p/TPD52 axis is responsible for glioma progression [121]. It was indicated that PTENP1 is downregulated in glioma patient samples. However, overexpression of PTENP1 suppresses cell proliferation, decreases the numbers of S-phase cells, invasion, migration abilities, induces the expression of p21 protein, and suppresses the p38 signaling pathway [122].
AGPG is highly expressed in many cancers. Its elevated expression levels are correlated with poor prognosis. AGPG is a transcriptional target of TP53, and loss or mutation of TP53 induces upregulation of AGPG. It was shown that AGPG protects PFKFB3 from proteasomal degradation and leads to the accumulation of PFKFB3, which activates glycolytic flux and promotes cell cycle progression. In esophageal squamous cell carcinomas (ESCC), knockdown of AGPG results in tumor growth in patient-derived xenograft models [123].
Another group of five pseudogenes in head and neck squamous cell carcinoma (HN-SCC), LILRP1, RP6-191P20.5, RPL29P19, TAS2R2P, and ZBTB45P1, can be used as prognostic or predictive markers. Signatures of these five pseudogenes can distinguish the low-risk and high-risk individuals, predicting prognosis with high sensitivity and specificity. This group is associated with the immune system and cancer-related biological processes. LILRP1 and RP6-191P20.5 are involved in immune regulation, PRL29P19 in metabolism regulation, and TAS2R2P and ZBTB45P1 have multiple functions, and also in various pathways enriched in the high-risk group such as EMT process, angiogene-sis, metastasis, proliferation, extracellular matrix receptor, focal adhesion, and PI3K/AKT pathways [124].
Another marker in HNSCC is PTTG3P. It is upregulated in patient samples, its expression depends on the type of mutation in the TP53 gene, and it correlates with genes from the TP53 pathway. Patients with low expressions of PTTG3P have longer DFS time. Furthermore, expression levels of PTTG3P depend on T-stage, grade, and HPV p16 status. Interestingly, the PTTG3P high-expressing group of patients have the most dysregulated genes connected with DNA repair, oxidative phosphorylation, and peroxisome pathways [125].
A double homeobox A pseudogene 10 (DUXAP10) can be used as a marker in both oral squamous cell carcinoma (OSCC) and ESCC. A total of 4462 DEGs and 76 differentially expressed lncRNAs were screened between the three groups, and 200 DEGs and only DUXAP10 were screened among the three groups. A total of 1662 interactions of 46 lncR-NAs and their coexpressed target genes were predicted, and 38 pairs of lncRNA-lncRNA coregulated 843 target genes. The coregulated target genes are significantly enriched in the antigen adaptive immune response, activation of phagocytosis receptor signaling, or mast granule NF-κB inflammation. Overall, lncRNAs were differentially expressed in OSCC and dysplasia. The target genes might play an essential role in the carcinogenesis and development of OSCC. These results improve our understanding of the lncRNA-based pathogenesis and identify potential targets for early diagnosis of malignant transformation from dysplasia to OSCC. DUXAP10 was certified to be upregulated in ESCC tissues and cells. Additionally, it was positively correlated with a short survival time. Moreover, the down-expression of this pseudogene contributed to decreased cell proliferation and metastasis. Silenced DUXAP10 led to increased apoptosis rate and stagnation of the cell cycle. Results of mechanistic 196 experiments suggested that DUXAP10 motivated ESCC progression through recruiting enhancer of zeste homolog 2 (EZH2) to the promoter of p21 [126].
FKBP9P1 is upregulated in patient tissues, as well as cell lines, and its elevated level is correlated with advanced T-stage, N-stage, and clinical stage, and it is connected with a shorter OS and DFS time. Knockdown of FKBP9P1 reduces proliferation, migration, and invasion by reducing the PI3K/AKT signaling pathway activity. FTH1P3 is upregulated in ESCC patients' samples. It was indicated that higher FTH1P3 expression is connected with a worse prognosis. Overexpression of FTH1P3 increases cell proliferation, migration, and invasion, and inhibits cell apoptosis. It is also positively correlated with poorer differentiation, increased T classification, lymph node metastasis, and advanced clinical stage [127]. FTH1P3 is also upregulated in OSCC and ESCC patient samples and cell lines. The expression level of FTH1P3 was significantly upregulated in OSCC tissues and cell lines. Increased expression of FTH1P3 in OSCC tissue was associated with T classification, N classification, and TNM stage. Furthermore, Kaplan-Meier survival analysis proved that the prognosis of individuals with low FTH1P3 expression was much better than for those with high expression. Cox regression analysis showed that FTH1P3 expression was an independent prognosis-predicting factor for individuals with OSCC. Loss-function assay indicated that knockdown of FTH1P3 significantly suppressed the proliferation, migration, and invasion of OSCC cells. Mechanistically, we found that knockdown of FTH1P3 significantly reduced the activation of PI3K/AKT/GSK3β/Wnt/βcatenin signaling [128].
The last one, TUSC2P, is downregulated in patient samples and cell lines. Its elevated expression is associated with better patient survival. TUSC2P-3 UTR regulates the expression of miR-17-5p, miR-520a-3p, miR-608, and miR-661 in a sponging-dependent manner and protects TUSC2 mRNA from being regulated by these miRNAs [129,130]. All pseudogenes with diagnostic potential are summarized in Table 1. • BMI1P1A and OCT4 and POU5F1B make up a three-gene signature that divides patients into high-risk and low-risk groups • the three-gene signature is a more valuable signature for distinguishing between patients and controls than any of the three genes • the three-gene signature was a prognostic factor: high-risk patient group has shorter leukemia-free survival (LFS) OS than the low-risk group

Conclusions
Even 40 years after the discovery of pseudogenes, knowledge of these genomic components is relatively poor. Hopefully, thanks to the rapid development of the new sequencing technologies, we will be able to identify new pseudogenes and learn more about those already characterized. Silva-Malta et al. recently presented a molecular strategy for the detection of the RHD pseudogene (RHDψ) based on a real-time polymerase chain reaction (PCR) assay [147]. However, just a certain number of transcriptomes have been covered. Furthermore, while most proposals have led to discovering a targeted algorithm, mainly used for detection, few computational pipelines were designed following a comprehensive approach addressing the identification and quantification of transcriptional activity within a unifying methodological frame. Standard pipelines mainly use the R language and pseudogene databases. Some of them are agnostic, which means that they apply computational tools in a de novo fashion to optimize the detection power, and in turn, to retrieve as many pseudogenes as possible, either annotated or putative ones [148]. Such a four-step pipeline includes (a) mapping RNA-seq samples to the human reference using the spliced-read aligner TopHat; (b) assembling genes and transcripts into putative candidates with Cufflinks [149] and Scripture [150] and comparing them to existing annotations from Ensembl, UCSC, and GENCODE; (c) screening candidate pseudogenes against a collection of features; and (d) appraising putative pseudogenes by using classification algorithms, namely Samtools and Perl.
Although several studies have been performed to date, the extent to which pseudogenes contribute to organismal biology remains largely unclear. The previous obstacles in exploring pseudogenes have been caused by the a priori assumption that they are functionless. Their systematic study has also been hindered by the lack of robust methodologies capable of distinguishing between the biological activities of pseudogenes and the functions of the genes they are derived from. Similarly, lncRNAs were initially dismissed as "junk DNA" or as transcriptional noise, mostly due to their definition as non-protein-coding and generally lower and more restricted expression patterns than mRNAs [131,151]. Future work should seek to explain if pseudogene activation is one of the crucial carcinogenesis factors, or the result of the carcinogenesis process in the situation when no mutation changes in the "driver genes" are observed. [132]. Moreover, some results should be analyzed because some pseudogenes, due to their high similarity to parental genes, give false results, as presented by Zhao et al., who described this problem with the pseudogenes OCT4pg1, OCT4pg3, and OCT4pg4 and their parental gene OCT4 [39]. All of this makes pseudogenes more mysterious than we thought, and they uncover hidden or missed networks of interactions in a cell. We are convinced that through the advancement of technology, genome-wide studies, and detailed biochemical analyses, pseudogenes will be broadly recognized, along with their regulatory potential.  Data Availability Statement: All data are available online with common access. The analyzed data during the current study are available from the corresponding author on reasonable request.