Gene Co-Expression Analysis of Human RNASEH2A Reveals Functional Networks Associated with DNA Replication, DNA Damage Response, and Cell Cycle Regulation

Simple Summary RNASEH2A is the catalytic subunit of the ribonuclease (RNase) H2 ternary complex that plays an important role in maintaining DNA stability in cells. Recent studies have shown that the RNASEH2A subunit alone is highly expressed in certain cancer cell types. Via a series of bioinformatics approaches, we found that RNASEH2A is highly expressed in human proliferative tissues and many cancers. Our analyses reveal a possible involvement of RNASEH2A in cell cycle regulation in addition to its well established role in DNA replication and DNA repair. Our findings underscore that RNASEH2A could serve as a biomarker for cancer diagnosis and a therapeutic target. Abstract Ribonuclease (RNase) H2 is a key enzyme for the removal of RNA found in DNA-RNA hybrids, playing a fundamental role in biological processes such as DNA replication, telomere maintenance, and DNA damage repair. RNase H2 is a trimer composed of three subunits, RNASEH2A being the catalytic subunit. RNASEH2A expression levels have been shown to be upregulated in transformed and cancer cells. In this study, we used a bioinformatics approach to identify RNASEH2A co-expressed genes in different human tissues to underscore biological processes associated with RNASEH2A expression. Our analysis shows functional networks for RNASEH2A involvement such as DNA replication and DNA damage response and a novel putative functional network of cell cycle regulation. Further bioinformatics investigation showed increased gene expression in different types of actively cycling cells and tissues, particularly in several cancers, supporting a biological role for RNASEH2A but not for the other two subunits of RNase H2 in cell proliferation. Mass spectrometry analysis of RNASEH2A-bound proteins identified players functioning in cell cycle regulation. Additional bioinformatic analysis showed that RNASEH2A correlates with cancer progression and cell cycle related genes in Cancer Cell Line Encyclopedia (CCLE) and The Cancer Genome Atlas (TCGA) Pan Cancer datasets and supported our mass spectrometry findings.


Introduction
The development of high-throughput tools to monitor gene expression levels in a specific cell type and tissue has allowed the characterization of gene expression patterns throughout the human body. Such an approach was utilized in several studies to identify molecular signatures of tissues and cells [1][2][3]. Genes that share a molecular pathway in a given tissue or cell should be co-expressed in a spatial and a temporal manner. Therefore, examining genes that show high co-expression correlation in multiple tissues can be used to identify a shared molecular pathway between those genes. The Genotype-Tissue Expression portal (GTEx) [3] has been used as a suitable platform to identify gene co-expression in the human body based on the availability of the full transcriptome in 53 different human tissues. Gene ontology analysis of the co-expressed genes might shed light on genes with uncharacterized functions and help cover additional functions of genes for which their role is partially known.
RNASEH2A, together with RNASEH2B and RNASEH2C, composes the holoenzyme ribonuclease (RNase) H2 [4,5]. RNASEH2B and RNASEH2C are used as the scaffold for RNASEH2A, which serves as the catalytic subunit of the RNase H2 complex [6]. The function of RNase H2, together with RNASEH1, is to cleave RNA of RNA-DNA hybrids, which can be formed during transcription [7], DNA replication [8], and repair [9]. RNA-DNA hybrids, if not repaired, can harm the genomic DNA by different mechanisms, such as modifying the DNA structure, blocking DNA replication and transcription, causing hyperrecombination and mutation, or chromosome loss [4,[10][11][12][13]. Phenotypically, it has been postulated that the RNA-DNA hybrids mimic the infection of nucleic acids of viral origin, thus activating the innate immune response [14]. An example of a pathologic condition is the Aicardi-Goutières Syndrome (AGS), a severe autosomal recessive neurological disorder with symptoms similar to in-utero viral infection [15,16]. Genetic studies have linked AGS to mutations in the genes composing RNase H2, and it has been shown that mutations silencing intracellular RNases affect microRNA turnover, resulting in the severe clinical consequences in the brain that characterize the clinical feature of AGS [17]. However, no evidence was found that RNASEH1 is linked to the AGS pathology [14], highlighting a marked difference between the function of these two enzymes. Another distinctive feature of RNASEH2A comes from examining its expression levels in cancer compared to normal tissues. A previous study showed that, in human mesenchymal stem cells transformed by the over-expression of several oncogenes, RNASEH2A was among the genes with the highest and the earliest up fold change in their expression [18]. In addition, increased levels of RNASEH2A in cancer compared to normal tissues and cells were also reported in cervical cancer [19], prostate cancer [20], colorectal carcinoma [21], and triple negative breast cancer [22]. In support of this finding, a recent study by Luke's group [23] highlights a differential cell-cycle regulation for the activity and the level of the RNASEH2A orthologous gene in yeast Saccharomyces cerevisiae (RNH201), peaking at S and G2 phases of the cell cycle, supporting the hypothesis that RNASEH2A plays a role in cancer progression. This is in contrast with the yeast RNASEH1 ortholog, RNH1, activity and level of which are not altered throughout the cell cycle [23]. With the goal to better understand the biological role of RNASEH2A in cancer, we analyzed the prevalence of patients with copy number alterations in the RNASEH2A gene in The Cancer Genome Atlas (TCGA) Pan Cancer studies in different cancer tissues, hypothesizing that identifying patterns of its expression can shed new insights on its function.

Co-Expression Correlation Analysis
We obtained expression data from GTEx portal v7 [3] (https://gtexportal.org/home/ datasets; accessed date 21 September 2019) which were obtained through RNA-Seq for 53 different tissues. The dataset contained total 56,202 transcripts, out of which 42,548 transcripts had greater than zero transcripts per million (TPM) values in at least one tissue of the 53 different tissues. We calculated Pearson correlation coefficients for every transcript with RNASEH transcripts data using R [24]. Then, we analyzed the gene ontology (GO) term of the top 2% or 5% co-expressed genes using Gene Ontology enRichment anaLysis and visuaLizAtion tool (GOrilla) [25].

Co-Expression Correlation Analysis Verification with STRING Database
To validate the correlation co-expression coefficients of different genes with RNASEH2A in GTEx tissues, we used protein co-expression data obtained from STRING protein-protein association network v11 [26]. We looked at top correlated genes including RNASEH2A in GTEx dataset, which also had at least 20 exclusively co-expressed proteins in STRING protein association network. We used the default filtering criteria of medium confidence score in STRING database, selecting only co-expressed proteins. With 10 such transcripts (including RNASEH2A) and 20 co-expressed proteins for each from the STRING database, we ended up with a list of 200 proteins. We removed any duplicate proteins listed and counted how many proteins were in the top five percentile of the Pearson's correlation coefficient distribution from the co-expression correlation analysis that was to be validated.

Gene Expression Analysis in Cancer
Data were obtained from Cancer RNA-Seq Nexus (CRN) database providing phenotypespecific coding-transcript/lncRNA expression profiles, and mRNA-lncRNA co-expression networks in cancer cells [27] were used to examine the expression level of RNASEH genes in different type of cancers compared to normal tissues and in cancers at different stage of progression.

RNASEH2A Copy Number Alterations Analysis in TCGA Pan Cancer Dataset
To analyze copy number alterations (CNAs) in RNASEH2A, we used The Cancer Genome Atlas (TCGA) Pan Cancer studies [28] involving 32 studies and 10,967 patients through cBioPortal [29,30]. After filtering samples containing both copy number alterations generated by Genomic Identification of Significant Targets in Cancer (GISTIC) and RNA-Seq data, processing, and normalizing by RNA-Seq by expectation maximization (RSEM), data for 9889 patients were used for final analysis.
For transient transfection, HEK293 cells were plated at 1-3 × 10 6 per 10 cm plate. The following day, 5 µg of the plasmids were incubated with 25 µL of the Lipofectamine transfection reagent (Fisher Scientific, Hampton, NH, USA, cat # 11668019) for 10-15 min at room temperature in Opti-MEM media (Fisher Scientific, Hampton, NH, USA, cat # 31985062), and the plasmid-reagent mix was distributed to the cells. The next day, transfected cells were fed with fresh media and harvested 48 h after transfection.

Co-Immunoprecipitation
Whole cell lysate (1 mg) from HEK293 cells transfected with either the eGFP or the pEGFP-RNASEH2A plasmid was incubated with 30 µL of GFP-Trap Magnetic Agarose beads (ChromoTek, Planegg-Martinsried, Germany, gtma-10 lot # 80912001MA) at 4 • C with end to end rotation for 1 h. Then, the magnetic beads-GFP protein complex were separated from the rest of the protein extract using a magnetic stand . The beads were  washed twice with wash buffer I (50 mM Tris-HCL pH 7.5, 150 mM NaCl, 0.25% NP-40)  and an additional 2 washes with wash buffer II (50 mM Tris-HCL pH 7.5, 150 mM NaCl). Then, the beads were resuspended in SDS loading buffer and boiled at 95 • C for 10 min. Finally, the beads were centrifuged at 13,000 rpm for 5 min at room temperature, and the eluted proteins were collected and used for either Western blot or mass spectrometry analysis.

Mass Spectrometry Analysis
The gel lanes were excised from the gel, and the proteins were reduced, alkylated, and digested with trypsin as previously described [31]. The peptides were analyzed by nano-LC-MS/MS and peptide identification as previously described [32]. The raw files were searched using the Mascot algorithm (v2.5.1) against a protein database constructed by combining the bovine reference database (Uniprot.com, 37,941 entries, downloaded 22 May 2019), the human reference database (Uniprot.com, 73,911 entries, downloaded 22 May 2019), the GFP-RNASEH2A protein, and a contaminant database (cRAP, downloaded 21 November 2016 from http://www.thegpm.org) via Proteome Discoverer 2.1. A 1% false discovery rate (FDR) ("High Confidence") was used for both the peptides and the proteins. At least 3 peptide sequences for the RNASEH2A protein were considered for the analysis.
Spectral count data were analyzed for mass spectrometry runs on two sets of immunoprecipitation experiments described previously in the methods. Counts in each run were normalized based on mean of total spectral counts in all runs in a respective set of the experiment. We then filtered hits specific to Homo sapiens, added a count of 1 and log2 transformed the spectral counts. We then used these transformed data to get differentially present protein candidates in immunoprecipitation product of HEK293 with GFP-RNASEH2A vs. HEK293 with GFP using linear fitting and empirical Bayes method from limma, a bioconductor package [33][34][35]. To get the final list of protein candidates, we used the filtering criteria of log fold change > 1.5 and FDR adjusted p value < 0.05.

RNASEH2A Correlation with Cancer Progression and Cell Cycle Related Genes in CCLE and TCGA Pan Cancer Dataset
To validate the results of our study, we further made use of 2 large cancer related datasets to investigate the correlations between RNASEH2A and 39 other genes, including common cancer proliferative markers with few genes involved in each cell cycle phase as well as genes upregulated and downregulated in cancer as controls. We also included the genes for the proteins we identified as putative binding partners from mass spectrometry analysis. We used Broad Institute Cancer Cell Line Encyclopedia (CCLE) dataset containing RNA-Seq expression levels in Reads Per Kilobase of transcript, per Million mapped reads (RPKM) in 1019 cancer cell lines from 26 different tissues of origin [36]. The second dataset we used was the TCGA Pan Cancer Study containing batch normalized RNA-Seq expression levels quantified by RNA-Seq by Expectation Maximization (RSEM) in 32 studies from 10,071 cancer patients. [29]. We added a count of 1 to the values and log transformed the expression counts in both the datasets. These transformed data were used to draw correlations between gene expressions with rcorr function from Hmisc package in R [37]. We also used hierarchical clustering the group the genes in the correlation plot.

RNASEH2A Is Co-Expressed with Genes that Function in Cell Cycle Regulation
To study RNASEH2A function based on co-expressed genes, we used an approach ( Figure 1) that utilizes data obtained from GTEx database [3] consisting of RNA-seq output of 42,548 nuclear and mitochondrial transcripts including coding, non-coding, and isoforms (raw data in Table S1).

Process GO Term
Description Biological Cellular processes components After analysis of the top 2% genes (851) with highest correlation coefficient, using the GOrilla analysis tool [25], we observed a significant enrichment of genes involved in the biological processes of cellular response to DNA damage stimulus (p = 2.77 × 10 −5 ) and DNA replication (p = 2.66 × 10 −4 ), compatible with the known functions of RNASEH2A. In addition, we found enrichment of genes involved in mitotic cell-cycle process (p = 2.80 × 10 −14 ), regulation of microtubule cytoskeleton organization (p = 1.39 × 10 −7 ), and chromosome segregation (p = 9.00 × 10 −4 ) ( Table 1 and full data in Table S2). For molecular function, we found enrichment of catalytic activity on DNA (p = 9.59 × 10 −6 ) but also microtubule binding (p = 4.25 × 10 −5 ) and kinase binding (p = 3.28 × 10 −4 ) ( Table 1 and full data in Table S2). For cellular component, we found gene enrichment in chromosomal part (4.06 × 10 −8 ), spindle pole (p = 1.13 × 10 −6 ), as well as nucleoplasm (p = 3.52 × 10 −6 ), midbody (p = 5.01 × 10 −6 ), microtubule organizing center (p = 5.13 × 10 −6 ), and condensed chromosomes outer kinetochore (p = 8.72 × 10 −6 ) ( Table 1 and full data in Table S2). Based on these results, we suggest that RNASEH2A is involved, among others, in three main functional networks related to DNA replication, DNA damage repair, and regulation of chromosome segregation in mitosis. Table 1. Gene ontology (GO) term analysis of the top 2% co-expressed genes of RNASEH2A reveals the possible role of RNASEH2A in cell cycle regulation. The analysis shows RNASEH2A functional networks based on process (blue panel), function (orange panel), and component (green panel). The analysis was made by Gene Ontology enRichment anaLysis and visuaLizAtion tool (GOrilla). Statistic is shown by both p-value and false discovery rate (FDR).

Process GO Term
Description To validate the correlation of transcripts in GTEx tissues with RNASEH2A at the protein expression level using the STRING protein-protein association network, we looked at 10 transcripts having high correlation with RNASEH2A (including RNASEH2A) that also have at least 20 known and characterized co-expressed proteins in the STRING network (full list of proteins in Table S3a). Out of the list of 200 proteins (Table S3a), 97 were unique ( Table S3b), out of which 87 have Pearson's correlation coefficient, R > 0.696 falling in the top five percentile (Table S3c), and 68 have Pearson's correlation coefficient, R > 0.842 falling in the top one percentile (Table S3c) of correlation coefficient distribution previously represented in Figure 1.
Analysis of the top 2% co-expressed correlated genes with RNASEH1 revealed functional network of genes that are involved in RNA binding (Table S4 for functional network  analysis and Table S5 for complete list of RNASEH1 co-expressed genes). After filtering top 5% correlated genes, we could identify processes such as regulation of telomerase RNA and telomeres maintenance (Table S4), which were previously reported [38,39]. When we analyzed the top 2% co-expressed correlated genes with RNASEH2B, we found the functional network of genes involved in DNA replication (Table S6 for functional network  analysis and Table S7 for complete list of RNASEH2B co-expressed genes), as expected for this RNase H2 subunit. For RNASEH2C, we could not find any functional network either in the 2% or the 5% analyses, as there were no GO terms with an enrichment p-value below the specified value p < 0.001 (data not shown).

RNASEH2A Expression Is Increased in Actively Cycling Cells and Tissues
Next, we studied the expression levels of RNASEH genes using the GTEx v.7 database, which allowed us to compare the expression levels of RNASEH1, RNASEH2A, RNASEH2B, and RNASEH2C genes in 53 different human tissues, including two cell lines (Table 2). We noticed that tissues that are part of the reproduction system (such as testis, uterus, and cervix) as well as transformed lymphocytes and transformed fibroblasts tend to have a higher expression level of RNASEH genes, while low levels were found mainly in tissues with low proliferation and cell turnover capacity, such as most regions of the brain, the kidney, and the heart ( Table 2). We then examined the expression level of the RNASEH genes in cancer using data obtained from Cancer RNA-seq Nexus (CRN) [27] in 29 different randomly selected tumors in different tissues at different stages of cancer progression compared to their healthy controls (complete list of tumors in Table S8 and raw data in Table S9). To validate our approach, we also examined the expression level of MYBL2 and SCARA5 genes, which are reported to be upregulated and downregulated in cancer, respectively [40]. As expected, MYBL2 was mostly upregulated in cancers such as triple negative breast cancer, bladder urothelial carcinoma stages 2 and 4, and lung adenocarcinoma stages 2 and 4, while SCARA5 was downregulated in cancers such as bladder urothelial carcinoma stages 2 and 4; thyroid carcinoma stage 1, and rectum adenocarcinoma stage 2A (Figure 2). RNASEH2A was the only RNASEH gene that showed increased expression levels in all 29 tumors relative to the control tissues ( Figure 2).   To study the levels of RNASEH genes throughout the progression of cancer, we examined the expression levels of the genes at different stages of lung squamous adenocarcinoma, breast invasive carcinoma, and bladder urothelial carcinoma. In all these cancers, the expression of RNASEH2A changed drastically from a normal stage to early cancer stages and remained high with the progression of the disease (Figure 3 and Table  S10 and raw data in Table S11). No other gene beside MYBL2 was upregulated at the different stages studied. These results imply that RNASEH2A might have a role in cancer etiology or transformation from normal tissue to cancer tissue, that is dependent on its expression level.

RNASEH2A Gene Amplifications Has Higher Prevalence in Multiple Cancer Types
The analysis of RNASEH2A copy number alterations and mRNA expression in various types of cancer (prevalence of RNASEH2A CNAs in Table 3, expression summary for CNA types and raw data in Table S12) revealed that the average expression of RNASEH2A is higher in patients with amplifications of the RNASEH2A gene and lower in patients with deep deletions when compared to normal diploid samples. This correlates with previous findings in mRNA versus CNA type expression levels in the TCGA Pan Cancer dataset [41]. We also observed that, in 16 out of 35 cancer types/subtypes, percentage of patients with amplifications was higher than percentage of patients with deep deletions, with 10% cancer types having >1% amplifications as copy number alteration. The maximum percentage of patients having RNASEH2A amplifications in a given type of cancer was 8.14% for ovarian epithelial tumor, followed by 3.29% in endometrial and 2.63% in adrenocortical carcinoma (Table 3 and raw data in Table S12).
After confirming the successful pulldown of RNASEH2A protein, we analyzed the eluted proteins that were interacting with RNASEH2A via mass spectrometry. The analysis revealed a short list of five human proteins with at least three peptide spectrum matches (PSMs) interacting with RNASEH2A (Table 4 and full data in Table S13a,b). RNASEH2B and RNASEH2C were interacting with RNASEH2A confirming the validity of the Co-IP experiment. Interestingly, among the proteins interacting with RNASEH2A, we identified the T-complex protein 1 subunits theta Chaperonin Containing TCP1 Subunit 8 (CCT8) and DnaJ homologue subfamily A member 1 (DNAJA1) and apoptosis-inducing factor mitochondrion-associated 1 (AIFM1) proteins with a gene correlation coefficient with RNASEH2A of 0.84, 0.585, and 0.378 respectively. These three potential RNASH2A binding partners have shown to play a role in cell cycle regulation and mitosis [42][43][44][45]. interacting with RNASEH2A (Table 4 and full data in Table S13a,b). RNASEH2B and RNASEH2C were interacting with RNASEH2A confirming the validity of the Co-IP experiment. Interestingly, among the proteins interacting with RNASEH2A, we identified the T-complex protein 1 subunits theta Chaperonin Containing TCP1 Subunit 8 (CCT8) and DnaJ homologue subfamily A member 1 (DNAJA1) and apoptosis-inducing factor mitochondrion-associated 1 (AIFM1) proteins with a gene correlation coefficient with RNASEH2A of 0.84, 0.585, and 0.378 respectively. These three potential RNASH2A binding partners have shown to play a role in cell cycle regulation and mitosis [42][43][44][45].

RNASEH2A Expression Positively Correlates with Cancer Proliferation Markers and Cell Cycle Genes
To further explore the involvement of RNASEH2A in cancer, we examined the correlations between RNASEH2A and 39 additional genes in CCLE and TGCA Pan Cancer datasets (Table 5 and Table S14a,b). The correlation analysis uncovered a positive correlation of RNASEH2A with genes up-regulated in cancer and a negative correlation with genes that are instead down-regulated. In addition, RNASEH2A showed a higher correlation with cluster of genes known to play a role in cancer proliferation as compared to RNASEH2B, RNASE2C, and RNASEH1. We also observed a similar trend for the three putative binding partners of RNASEH2A that we identified by mass spectrometry, showing a correlation comparable with that found in the GTEx dataset. The analysis of cell cycle related genes showed a high correlation between RNASEH2A and genes involved in Gap 1/Synthesis (G1/S), Gap 2 (G2), G2/Mitosis (M), M, and M/G1 phases and a comparatively lower or no correlation with genes unique to G1 and S phases ( Figure 5 and raw data in Figure S15a,b). Table 5. List of genes related to cancer proliferation and cell cycle phases used for correlation analysis in Cancer Cell Line Encyclopedia (CCLE) and TCGA Pan Cancer datasets.

Discussion
Co-expression correlation analysis of genes is a simple approach to suggest functional network of genes. We anticipate that this approach is more accurate for analyzing genes sharing a higher correlation value. In addition, we assume that the genes analyzed should have a functional network involved in a process that is dependent on a spatiotemporal gene expression profile. Cell cycle regulation is a good example for such a process. In fact, the analysis of CDK1, a regulator of cyclin B implicated in cell cycle control,

Discussion
Co-expression correlation analysis of genes is a simple approach to suggest functional network of genes. We anticipate that this approach is more accurate for analyzing genes sharing a higher correlation value. In addition, we assume that the genes analyzed should have a functional network involved in a process that is dependent on a spatio-temporal gene expression profile. Cell cycle regulation is a good example for such a process. In fact, the analysis of CDK1, a regulator of cyclin B implicated in cell cycle control, shows enrichment of genes involved in mitotic cell cycle regulation, microtubule cytoskeleton organization, and regulation of G1/S transition of mitotic cell cycle (Table S16 and complete list of CDK1 co-expressed genes in Table S17). This approach can be applied to other processes such as, for example, RNA regulation in stress granules, which is a well-orchestrated process dependent on time and stress conditions. In this context, G3BP1, a RNA helicase that is one of the key assemblers of the stress granules, shows enrichment of co-expressed genes involved in molecular function of RNA binding and RNA helicase activity in a ribonucleoprotein cellular component (Table S16 and complete list of G3BP1 co-expressed genes in Table S18).
Among the three RNASEH genes, we found that only RNASEH2A was involved in the functional network of mitotic cell cycle regulation. One possibility is that RNASEH2A has a function independent of RNASEH2B and RNASEH2C, as its distribution in the cell was shown not to be limited to the interaction with RNASEH2B and RNASEH2C [6]. This notion was also suggested by others showing that knocking out RNASEH2A in cervical cancer HeLa cells results in an increased sensitivity to ataxia telengiectasia and Rad3-related (ATR) inhibitors compared to knocking out RNASEH2B [48]. We propose that RNASEH2A levels constitute an additional layer of regulation of RNASEH2A activity, which is not dictated by RNASEH2B or RNASEH2C levels, and this is why the co-expression correlation analysis did not show a functional network of mitotic cell regulation for RNASEH2B and RNASEH2C genes. Following this notion, it would be intriguing to examine the levels of the three RNase H2 genes throughout the cell cycle in human cells and study how perturbing their levels affects RNase H2 function and/or cell cycle progression. Similar studies have been performed on yeast showing a cell cycle dependent expression of RNH201 (orthologous gene of RNASEH2A), peaking at S and G2 phases [23,49], and being recruited to chromatin/telomeres at the G2 phase [49].
Rejins et al. [50], using an anti-mouse antibody for all proteins of the RNase H2 complex, demonstrated an increase in the expression level of RNase H2 in the mouse blastocyst in all three embryonic layers during gastrulation and showed, in newborns and adults, that the expression becomes restricted to highly proliferative tissues such as intestinal crypt epithelium and testes [50]. Moreover, they reported that RNase H2 levels correlate with the proliferation marker Ki67 [50]. These data are consistent with our findings showing an increase in RNASEH2A level in actively cycling tissues as well as different cancer tissues compared to normal ones.
Using our co-expression correlation approach, we identified three main functional networks in which RNASEH2A is involved: DNA replication, DNA damage repair, and regulation of chromosome segregation in mitosis. Hyper-proliferation of cells results in excessive addition of RNA Okazaki fragments into the genome during DNA replication, increases the number of events when replicative DNA polymerases insert ribonucleotides instead of deoxyribonucleotides into the genomic DNA [51,52], and increases the rate of chromosomes reduction from 4N to 2N by symmetric segregation into two new daughter cells. All these events correspond to the functional networks identified in this study for RNASEH2A and link RNASEH2A activity to cell cycle regulation. In support of this notion, we identified several cellular pathways among the highest RNASEH2A co-expressed genes such as the minichromosome maintenance (MCM) complex, a replicative eukaryotic helicase that, in yeast, has been demonstrated to be activated upon accumulation of RNA-DNA hybrids at the G2-M checkpoint. This finding suggests that targeting the RNASEH2A level and/or activity could prevent the DNA damage occurring during replication, which leads to mitotic catastrophe and cell death [53,54]. The other identified complexes, such as the NDC80 complex, the CENP family, and the KIF family, have all been shown to function in cell cycle regulation through regulation of the microtubule filaments with the kinetochore [55][56][57].
Our analyses of expression levels focused on the RNASEH genes (RNASEH1, RNASEH2A, RNASEH2B, and RNASEH2C) using the GTEx v.7 database and revealed a marked contrast in RNASEH2A expression levels between low and high proliferative tissues. RNASEH2A showed low expression levels in low proliferative tissues such as brain, kidney (cortex), and heart and high expression levels in highly proliferative tissues such as skin, esophagus, small intestine, testis, cervix, as well as in transformed fibroblasts and transformed lymphocytes. Moreover, in line with the initial study by Flanagan et al. [18], showing elevated expression levels of RNASEH2A in transformed mesenchymal stem cells, our analysis of 29 different randomly selected tumors in different tissues compared to their healthy controls revealed that RNASEH2A is the only RNASEH gene displaying increased expression levels. Further analyses of RNASEH expression levels throughout different stages of lung squamous adenocarcinoma, breast invasive carcinoma, and bladder urothelial carcinoma again highlighted specific upregulation of RNASEH2A from normal tissues to early stages and more advanced stages of these cancers. It is interesting to note that Dai et al. reported on a possible role of RNASEH2A in glioma cell proliferation. Their findings suggested a role of RNASEH2A upregulation in cell growth and apoptosis, contributing to glioma-genesis and cancer progression [58]. More recently, RNASEH2A overexpression was associated with cancer cell resistance to chemotherapy in vitro and with aggressiveness and poor outcomes in breast cancers of estrogen receptor (ER)-positive subtypes [59]. The analysis of copy number alterations of the RNASEH2A gene that we performed in our study further supports the role of RNASEH2A in cancer development. We demonstrated that RNASEH2A gene amplifications have higher incidence in multiple cancer types when compared to deep deletions and normal diploid samples suggesting RNASEH2A as a target for cancer diagnosis and therapy. Interestingly, cancers from tissues with high cell turnover, such as the reproductive system, showed a maximum percentage of patients with RNASEH2A amplification, suggesting a possible role of adult stem cells in the overexpression of RNASEH2A. In fact, deregulation of the adult stem cell niche is considered a key event in the etiology of cancer [60].
Our mass spectrometry analysis shows five protein candidates that specifically interact with RNASEH2A. The top two proteins observed were RNASEH2B and RNASEH2C, confirming the authenticity of the results. Additional proteins that we identified in the context of RNASEH2A biology and mitosis regulation were CCT8, DNAJA1, and AIFM1. CCT8 is part of the CCT/TRiC complex that was reported to regulate telomerase function by mediating its trafficking from the cytoplasm to the telomeres [36]. In addition, it was shown to promote chromosome segregation by disassembling checkpoint complex from sister chromatids in the initial phase of mitosis [37], supporting the RNASEH2A functional network in mitosis regulation. DNAJA1 belongs to the family of DnaJ heat shock protein family (Hsp40); it has been shown to be activated by E2F transcription factor 1 (E2F1) and to promote cell cycle by stabilizing cell division protein 45 (CDC45) [38]. Of interest, the induction of RNASEH2A by E2F1 has been also reported in human papillomavirus cervical cancers [19]. AIFM1 is a well know factor that plays a key role in the apoptosis signaling pathways. A recent study has uncovered a new role of AIFM1 on the proliferation of hepatoma cells and cell cycle arrest [39].
The high correlation of RNASEH2A expression with genes that play a role in cancer development and cell cycle progression validate our study, showing that RNASEH2A plays a role in cancer transformation and progression. In addition, our final analysis supports the mass spectrometry findings and indicates CCT8, DNAJA1, and AIFM1 as binding partners of RNASEH2A.
The analysis and the experimental approaches used in this works have some limitations. In this respect, we made use of the Cancer RNA-Seq Nexus database for which patient information is not available. Factors such as age, gender, and medical treatment versus surgery could have influenced the results of our in-silico analysis. In addition, although mass spectrometry is a highly specific and sensitive technique, further investigation is needed to confirm the specific interactions between the RNASEH2A subunit and the candidate proteins found in our experiments. In our in vitro experiments, we used the HEK293 cell line. Depending on a number of factors, this cell line in culture can either show the phenotype of a normal or of a cancer cell line [61]. Although we were able to support our in vitro finding by in-silico analysis, the characterization of the biological role of CCT8, DNAJA1, and AIFM1 in concert with RNASEH2A requires a more appropriate model system.

Conclusions
Overall, our study suggests an emerging role of RNASEH2A in cell cycle regulation that appears independent from its function as part of the RNase H2 whole enzyme. The presented findings stimulate new research directions such as investigating the differential expression of RNASEH2A during cell cycle phases to better understanding and characterizing the function of RNASEH2A in cell proliferation in healthy, adults stem cells and cancer cells and to support further exploring RNASEH2A as a target for cancer diagnosis and therapy.
Supplementary Materials: The following are available online at https://www.mdpi.com/2079-7 737/10/3/221/s1. Table S1: List of RNASEH2A co-expressed correlated genes. Table S2: Gene ontology of the 2% co-expressed correlated RNASEH2A genes using the GOrilla analysis tool. Table  S3: Protein network association data obtained from STRING protein-protein association network applied to RNASEH2A and its top co-expressed genes (3a-c). Table S4: Gene ontology of the 2% and the 5% co-expressed correlated RNASEH1 genes using the GOrilla analysis tool showing a functional network of genes involved in RNA binding and processes of regulation of telomerase RNA and telomeres maintenance. Table S5: List of RNASEH1 co-expressed correlated genes. Table S6: Gene ontology of the 2% co-expressed correlated RNASEH2B genes using the GOrilla analysis tool showing functional network of genes involved in DNA replication. Table S7: List of RNASEH2B co-expressed correlated genes. Table S8: Expression level of the RNASEH genes in cancer using data obtained from Cancer RNA-seq Nexus (CNR) in 29 different randomly selected tumors. Table S9: Expression level of the RNASEH genes in cancer using data obtained from Cancer. RNA-seq Nexus (CNR) in different tissues at different stage of cancer progression. Table S10: Level of the RNASEH genes throughout the progression of cancer showing constant upregulation of RNASEH2A in lung, breast, and bladder tumors. Table S11: Level of the RNASEH genes at different stages of cancer. Table S12: (a) Average expression of patients with different RNASEH2A copy-number alterations from GISTIC in TCGA Pan Cancer studies; (b) raw data for RNASEH2A mRNA expression with copy number alteration type in TCGA Pan Cancer studies. Table S13: Mass spectrometry results of first run (4a) and second run (4b) of immunoprecipitation experiments. Table S14: (a) Count and percentage of Cancer Cell line types in CCLE database; (b) count and percentage of cancer tissue types/subtypes in TCGA Pan Cancer Dataset. Table S15: (a) Correlation Matrix (Pearson's R and p-value) for RNASEH2A and 39 subset list of genes in CCLE; (b) correlation Matrix (Pearson's R and p-value) for RNASEH2A and 39 subset list of genes in TCGA. Table S16: Gene ontology of the 2% co-expressed correlated CDK1 genes and G3BP1 using the GOrilla analysis tool showing enrichment of genes involved in cell cycle regulation and in molecular binding of RNA. Table S17: List of CDK1 co-expressed correlated genes. Table S18: List of G3BP1 co-expressed correlated genes.