Bioinformatics Screening of Potential Biomarkers from mRNA Expression Profiles to Discover Drug Targets and Agents for Cervical Cancer

Bioinformatics analysis has been playing a vital role in identifying potential genomic biomarkers more accurately from an enormous number of candidates by reducing time and cost compared to the wet-lab-based experimental procedures for disease diagnosis, prognosis, and therapies. Cervical cancer (CC) is one of the most malignant diseases seen in women worldwide. This study aimed at identifying potential key genes (KGs), highlighting their functions, signaling pathways, and candidate drugs for CC diagnosis and targeting therapies. Four publicly available microarray datasets of CC were analyzed for identifying differentially expressed genes (DEGs) by the LIMMA approach through GEO2R online tool. We identified 116 common DEGs (cDEGs) that were utilized to identify seven KGs (AURKA, BRCA1, CCNB1, CDK1, MCM2, NCAPG2, and TOP2A) by the protein–protein interaction (PPI) network analysis. The GO functional and KEGG pathway enrichment analyses of KGs revealed some important functions and signaling pathways that were significantly associated with CC infections. The interaction network analysis identified four TFs proteins and two miRNAs as the key transcriptional and post-transcriptional regulators of KGs. Considering seven KGs-based proteins, four key TFs proteins, and already published top-ranked seven KGs-based proteins (where five KGs were common with our proposed seven KGs) as drug target receptors, we performed their docking analysis with the 80 meta-drug agents that were already published by different reputed journals as CC drugs. We found Paclitaxel, Vinorelbine, Vincristine, Docetaxel, Everolimus, Temsirolimus, and Cabazitaxel as the top-ranked seven candidate drugs. Finally, we investigated the binding stability of the top-ranked three drugs (Paclitaxel, Vincristine, Vinorelbine) by using 100 ns MD-based MM-PBSA simulations with the three top-ranked proposed receptors (AURKA, CDK1, TOP2A) and observed their stable performance. Therefore, the proposed drugs might play a vital role in the treatment against CC.


Introduction
Cervical cancer (CC) is a type of malignancy that arises from the cervix (lower part of the uterus) and is characterized by expulsion and irregular bleeding in the vagina, pelvic pain, and pain during sexual intercourse [1]. It has been reported that the human papillomavirus (HPV) infection causes CC in almost all cases [2]. Currently, CC is placed as the fourth most common type of malignancy among females with a high mortality rate worldwide and the second most prevalent cancer among females in middle-and lowincome countries (MLICs) [3,4]. According to the 2018 Globocan report, about 569,847 new CC cases with 311,365 mortalities are identified annually [3]. About 14,065 females are diagnosed with CC every year in the United States, causing 5266 deaths [5]. In China, CC is considered the eighth most common cancer in women but, surprisingly, the second most common cancer in women between 15 and 44 years old. About 43% of CC patients are <45 years of age, and 20-28% are <40 years of age [6]. However, cervical cancer has been relatively well controlled for several decades in many high-income countries due to the cervical screening initiatives and effective cancer treatment services, but it remains the most prevalent cause of cancer-related mortality among women in MLICs [4,7,8]. There is still debate about the precise molecular pathways between chronic high-risk HPV infection and the CC pathological phase [9]. Growing research has shown that the irregular expression of multiple genes is convoluted in the pathogenesis of CC [10]. Since tumorigenesis involves various complex genetic and epigenetic events, including the overexpression of oncogenes or the inactivation of suppressor genes [11], the revolution of dysregulated genes in oncogenic pathways might highlight the molecular events underlying tumor formation, which helps insight into CC treatment strategies. Therefore, it is crucial to elucidate the potential molecular mechanisms underlying CC to offer novel therapeutic targets and prognostic biomarkers of CC [12,13].
However, de novo (new) drug discovery is a tremendously challenging, time-consuming, and expensive task due to several steps involved in this process, from the target-based drug selection to clinical validation. Drug repurposing (DR) is a promising approach to overcome many of those obstacles in discovering and developing new drugs by exploring the new therapeutic applications of approved drugs that are established for different diseases [14]. It is considered as a supporting process to the conventional drug discovery. Exploring more suitable repurposable drugs for a new disease requires identifying appropriate target proteins associated with the disease. Hub genes/study genes mediated proteins have been considered as the key drug target receptors. Transcriptomics analysis is a widely used popular approach to explore genomic biomarkers [15][16][17][18][19][20][21]. By this time, several authors have suggested several sets of hub genes/study genes to explore molecular mechanisms and pathogenetic processes of CC [22][23][24][25]. CDK1 and TOP2A might play a critical role in controlling the genetic network related to cervical cancer incidence, progression, and metastasis [17,26,27]. Among them, some authors suggested candidate drugs for the treatment against CC [16,28]. Nevertheless, none of them have investigated the resistance of their suggested drugs against the independent receptors proposed by others. A question may be raised, namely, how can a drug be effective globally for all people around the world. Therefore, our main objectives are (i) computational identification of genomic biomarkers (drug targets) for CC, highlighting their functions, pathways, and regulatory factors, (ii) exploring proposed genomic biomarkers-guided candidate drugs for the treatment against CC, and (iii) in silico validation of the proposed drugs against the state-of-the-art alternative top-ranked independent receptors proposed by others. The workflow of the present research is displayed in Figure 1.

Identification of cDEGs
The datasets GSE6791, GSE27678, GSE63514, and GSE9750 were analyzed to identify DEGs between CC infections and control samples, and the DEGs in each dataset were presented using the volcano plots (Figure 2a-d), where blue and red dots represented the up-regulated and down-regulated genes, respectively. In GSE6791, a total of 4743 DEGs with 4232 up-regulated and 511 down-regulated genes; in GSE27678, a total of 596 DEGs with 154 up-regulated and 442 down-regulated genes; in GSE63514, a total of 4091 DEGs with 2631 up-regulated and 1460 down-regulated genes; in GSE9750, a total of 2640 DEGs with 711 up-regulated and 1929 down-regulated genes were identified by GEO2R online tool with |logFC| > 1.0 and adjusted p-value < 0.05. Then, we found 78 up-regulated cDEGs and 38 down-regulated cDEGs for CC patients (see Figure 2e; Table S3).

Identification of cDEGs
The datasets GSE6791, GSE27678, GSE63514, and GSE9750 were analyzed to identify DEGs between CC infections and control samples, and the DEGs in each dataset were presented using the volcano plots (Figure 2a-d), where blue and red dots represented the up-regulated and down-regulated genes, respectively. In GSE6791, a total of 4743 DEGs with 4232 up-regulated and 511 down-regulated genes; in GSE27678, a total of 596 DEGs with 154 up-regulated and 442 down-regulated genes; in GSE63514, a total of 4091 DEGs with 2631 up-regulated and 1460 down-regulated genes; in GSE9750, a total of 2640 DEGs with 711 up-regulated and 1929 down-regulated genes were identified by GEO2R online tool with |logFC| > 1.0 and adjusted p-value < 0.05. Then, we found 78 up-regulated cDEGs and 38 down-regulated cDEGs for CC patients (see Figure 2e; Table S3).

Identification of cDEGs
The datasets GSE6791, GSE27678, GSE63514, and GSE9750 were analyzed to identi DEGs between CC infections and control samples, and the DEGs in each dataset we presented using the volcano plots (Figure 2a-d), where blue and red dots represented th up-regulated and down-regulated genes, respectively. In GSE6791, a total of 4743 DEG with 4232 up-regulated and 511 down-regulated genes; in GSE27678, a total of 596 DEG with 154 up-regulated and 442 down-regulated genes; in GSE63514, a total of 4091 DEG with 2631 up-regulated and 1460 down-regulated genes; in GSE9750, a total of 2640 DEG with 711 up-regulated and 1929 down-regulated genes were identified by GEO2R onlin tool with |logFC| > 1.0 and adjusted p-value < 0.05. Then, we found 78 up-regulate cDEGs and 38 down-regulated cDEGs for CC patients (see Figure 2e; Table S3). ; red dots and blue dots represented the significantly down-regulated and up-regulated DEGs, respectively. (e) Common up-regulated and down-regulated differentially expressed genes from CC visualized through a Venn diagram. Seventy-eight genes were founded common-up-regulated and thirty-eight genes were founded common down-regulated in CC patients.

PPI Network Analysis of cDEGs for Identification of KGs
The PPI network of cDEGs was constructed using STRING database (Figure 3a), which contained 92 nodes and 1790 edges. We selected top-ranked eleven (11) cHubGs {AR, AURKA, BRCA1, CCNB1, CDK1, ECT2, ESR1, EZH2, MCM2, NCAPG2, and TOP2A}, applying four topological measures in the PPI network. Then, using MCODE, clusters were selected from the PPI network. It was shown that the most significant cluster had 42 nodes and 850 edges. MCODE analysis demonstrated that the most significant cluster contained the seven hub genes {AURKA, BRCA1, CCNB1, CDK1, MCM2, NCAPG2, and TOP2A} (see Figure 3b). So, we considered these seven key genes (KGs) for further analysis. visualized through a Venn diagram. Seventy-eight genes were founded common-up-regulated and thirty-eight genes were founded common down-regulated in CC patients.

PPI Network Analysis of cDEGs for Identification of KGs
The PPI network of cDEGs was constructed using STRING database (Figure 3a), which contained 92 nodes and 1790 edges. We selected top-ranked eleven (11) cHubGs {AR, AURKA, BRCA1, CCNB1, CDK1, ECT2, ESR1, EZH2, MCM2, NCAPG2, and TOP2A}, applying four topological measures in the PPI network. Then, using MCODE, clusters were selected from the PPI network. It was shown that the most significant cluster had 42 nodes and 850 edges. MCODE analysis demonstrated that the most significant cluster contained the seven hub genes {AURKA, BRCA1, CCNB1, CDK1, MCM2, NCAPG2, and TOP2A} (see Figure 3b). So, we considered these seven key genes (KGs) for further analysis. , and TOP2A) were found to be common across 11 hub genes, so we considered these 7 genes as the key genes. The network represents highly interconnected regions of the PPI network. The network holds 42 nodes and 850 edges.

Survival Analysis with KGs
The log-rank test was used to test the significant difference between two survival curves corresponding to low-and high-risk groups based on the seven KGs (AURKA, BRCA1, CCNB1, CDK1, MCM2, NCAPG2, and TOP2A). We observed a significant difference between the two survival probability curves ( Figure 5), which indicates that the proposed KGs have strong prognostic power in detecting CC.

Drug Repurposing by Molecular Docking
To explore candidate drugs by molecular docking simulation, we considered m= 11 drug target proteins (receptors) corresponding to our proposed seven KGs and their
On the other hand, we reviewed 52 published articles associated with CC infections that provided transcriptome-guided hub proteins (genomic biomarkers) for cross-validation of the proposed key genes and the candidate drug agents. There were 255 hub genes reported in those 52 articles, with 7 hub proteins (AURKA, PCNA, CCNB1, CDC45, MCM2, TOP2A, CDK1) appearing in at least 5 articles (Table S2) [15][16][17][18][19][20][21][23][24][25][26][27][40][41][42][43][44][45][46][47][48][49][50][51]. Five (AURKA, CCNB1, MCM2, TOP2A, and CDK1) of the seven reported hub proteins were found to be similar to our suggested seven KGs. We downloaded the 3D structure of the published remaining receptors (CDC45, PCNA) from PDB with source codes 5dgo, 1u76, respectively. Then, molecular docking was carried out between total m = 13 receptors (proposed and published) and n = 80 meta-drug agents to calculate the binding affinity scores (kcal/mol) for each pair of receptors and agents. Then, we ordered the target receptors in descending order of row sums of the binding affinity matrix A = (A ij ) and drug agents, according to the column sums, to select a few drug agents as the candidate drugs (see Figure 6). Thus, we selected top-ranked three drug agents (Paclitaxel, Vinorelbine, Vincristine) as candidate drugs with binding affinity scores −7.5 kcal/mol ≤ against the 13 receptors. The docked complexes of the top three virtual hits from AutoDock-Vina docking were further considered for protein-ligand interaction profiling. As shown in Figure 7a, AURKA_paclitaxel complex showed three hydrogen bonds with lys141, lys162, asp274 residues. Although the ligand formed major hydrophobic interactions with leu139, val147, leu210, thr217, tyr219, glu260, leu263 residues, and lys162 residue showed additional salt bridges interactions with the ligand. On the other hand, CDK1_vinorelbine (Figure 7b) formed hydrophobic interactions with tyr15, lys88, leu135 residues. In the case of the TOP2A_vincristine complex, vincristine showed hydrogen bonding to the active site, where the interaction was maintained by glu66 residue. Vincristine also formed hydrophobic interactions with glu66, pro111, lys233, leu257 residues. Furthermore, vin- The docked complexes of the top three virtual hits from AutoDock-Vina docking were further considered for protein-ligand interaction profiling. As shown in Figure 7a, AURKA_paclitaxel complex showed three hydrogen bonds with lys141, lys162, asp274 residues. Although the ligand formed major hydrophobic interactions with leu139, val147, leu210, thr217, tyr219, glu260, leu263 residues, and lys162 residue showed additional salt bridges interactions with the ligand. On the other hand, CDK1_vinorelbine (Figure 7b) formed hydrophobic interactions with tyr15, lys88, leu135 residues. In the case of the TOP2A_vincristine complex, vincristine showed hydrogen bonding to the active site, where the interaction was maintained by glu66 residue. Vincristine also formed hydrophobic interactions with glu66, pro111, lys233, leu257 residues. Furthermore, vincristine also formed π-cation interactions with lys233 residue (see Figure 7c).

MD Simulations
Among the proposed candidate drugs, Paclitaxel, Vinorelbine, and Vincristine were the top-ranked three candidate drugs ( Figure 6). Therefore, these top three drug agent were selected for their stability analysis through 100 ns MD-based MM-PBSA simula tions.
From Figure 8, we observed that all the three systems were significantly stable be tween the variations of moving and initial drug-target complexes. We calculated thei RMSD (root mean square deviation). Figure 8a represented the RMSD corresponding to the proposed receptors (AURKA, CDK1, TOP2A). All the systems projected the RMSD around 1 Å to 2.5 Å, except TOP2A complex, which shows the RMSD around 2 Å to 3.7 Å. The average RMSD for AURKA, CDK1, TOP2A complexes were 1.59 Å, 2.11 Å, and 2.80 Å, respectively. CDK1 complex showed slight fluctuation in around 20,000 ps to 28,000 ps and was stabilized in the remaining simulation. As can be seen from the plot AURKA showed a more rigid conformation than the other proteins, also achieved equi

MD Simulations
Among the proposed candidate drugs, Paclitaxel, Vinorelbine, and Vincristine were the top-ranked three candidate drugs ( Figure 6). Therefore, these top three drug agents were selected for their stability analysis through 100 ns MD-based MM-PBSA simulations.
From Figure 8, we observed that all the three systems were significantly stable between the variations of moving and initial drug-target complexes. We calculated their RMSD (root mean square deviation). Figure 8a represented the RMSD corresponding to the proposed receptors (AURKA, CDK1, TOP2A). All the systems projected the RMSD around 1 Å to 2.5 Å, except TOP2A complex, which shows the RMSD around 2 Å to 3.7 Å. The average RMSD for AURKA, CDK1, TOP2A complexes were 1.59 Å, 2.11 Å, and 2.80 Å, respectively. CDK1 complex showed slight fluctuation in around 20,000 ps to 28,000 ps and was stabilized in the remaining simulation. As can be seen from the plot, AURKA showed a more rigid conformation than the other proteins, also achieved equilibrium at 3 ns, and remained stable afterward. In contrast, TOP2A showed a dramatic increase in flexibility, with RMSD values rising gradually from 2 Å to 3.5 Å over time. Here, we calculated the MM-PBSA binding energy for three drug agents as mentioned previously, Figure 8b

Discussion
CC is the second most common type of malignant tumor in females with a 5-year survival rate of only about 52%. Thus, more research is required to discover potential biomarkers and candidate drugs for improving the 5-year survival rate and reducing the mortality rate of CC patients [16].
To investigate the genetic influence of CC infections, we identified 116 shared cDEGs. Among them, we detected seven cDEGs as KGs highlighting their functions, pathways, regulatory factors, and candidate drugs. The literature review suggested that AURKA was an enhanced potential target for cervical cancer treatment [15,25,27,43,45]. BRCA1 enhanced the sensitivity of cervical squamous cell carcinoma (CSCC) patients to cisplatin-based CCRT by up-regulating STAT1 to activate the JAK/STAT pathway [50]. CCNB1 played vital roles in the progression of CC through different signaling pathways [17,18,26,46,51]. CDK1 contributed to the occurrence and development of CSCC [18,23,[25][26][27]40,41,44,46]. MCM2 involved in the carcinogenesis of cervical cancer [16,[19][20][21]23,27,42]. TOP2A is regarded as a biomarker for the improved diagnosis of CC [16,17,19,23,24,26,27,41,43,45]. We also investigated the multivariate survival analysis of CC patients with KGs and found a substantial difference between two survival probability curves, indicating that the suggested KGs have a better prognostic capacity for CC detection. Four transcriptional (TFs) and two post-transcriptional (miRNAs) regulatory factors of KGs were introduced in Section 2.3. The four TFs proteins (SP1, TP53, NFYA, E2F1) were further utilized as the drug target receptors. Previous studies also supported that SP1 is an important biomarker for CC [52]. Khan M.A et al. revealed that TP53 has an association with cervical cancer pathogenesis [53]. NFYA promoted cell proliferation and

Discussion
CC is the second most common type of malignant tumor in females with a 5-year survival rate of only about 52%. Thus, more research is required to discover potential biomarkers and candidate drugs for improving the 5-year survival rate and reducing the mortality rate of CC patients [16].
To investigate the genetic influence of CC infections, we identified 116 shared cDEGs. Among them, we detected seven cDEGs as KGs highlighting their functions, pathways, regulatory factors, and candidate drugs. The literature review suggested that AURKA was an enhanced potential target for cervical cancer treatment [15,25,27,43,45]. BRCA1 enhanced the sensitivity of cervical squamous cell carcinoma (CSCC) patients to cisplatin-based CCRT by up-regulating STAT1 to activate the JAK/STAT pathway [50]. CCNB1 played vital roles in the progression of CC through different signaling pathways [17,18,26,46,51]. CDK1 contributed to the occurrence and development of CSCC [18,23,[25][26][27]40,41,44,46]. MCM2 involved in the carcinogenesis of cervical cancer [16,[19][20][21]23,27,42]. TOP2A is regarded as a biomarker for the improved diagnosis of CC [16,17,19,23,24,26,27,41,43,45]. We also investigated the multivariate survival analysis of CC patients with KGs and found a substantial difference between two survival probability curves, indicating that the suggested KGs have a better prognostic capacity for CC detection. Four transcriptional (TFs) and two post-transcriptional (miRNAs) regulatory factors of KGs were introduced in Section 2.3. The four TFs proteins (SP1, TP53, NFYA, E2F1) were further utilized as the drug target receptors. Previous studies also supported that SP1 is an important biomarker for CC [52]. Khan M.A et al. revealed that TP53 has an association with cervical cancer pathogenesis [53]. NFYA promoted cell proliferation and tumorigenic properties by transcriptional activation of SOX2 in cervical cancer [54]. The E2F1 protein is considered as a potential biomarker for CC [55].
To investigate the common pathogenetic processes of KGs, we selected the top five common GO terms for each of BPs, MFs and cellular components, and KEGG pathways that were significantly associated with cervical cancer disease through the cDEGs, including KGs. Among them, the association of the top five common BPs (DNA replication, cell division, G1/S transition of mitotic cell cycle, DNA replication initiation, mitotic nuclear division) with CC was supported by some other individual studies [17,19,30]. The top five common MFs (ATP binding, protein binding, DNA helicase activity, chromatin binding, and DNA binding) that were significantly associated with CC disease also received support from some individual studies [17,19,[31][32][33]. Similarly, the association of the top four cellular components (nucleoplasm, midbody, MCM complex, nucleus, and spindle) with CC disease was supported by the literature review [19,[34][35][36]. We selected the top five significantly enriched common KEGG pathways (DNA replication, cell cycle, p53 signaling pathway, Oocyte meiosis, and Fanconianemia pathway) that were also reported by some other studies [17,23,29].
To explore effective candidate drugs for the treatment against CC disease, we considered top-ranked proposed receptors and their regulatory four key TFs proteins as the drug target receptors and performed their docking analysis with 80 meta-drug agents (see Table S1). For cross-validation, we considered seven published hub proteins, where five proteins were common with our proposed KGs (see Figure 9). On the other hand, five articles at a time explored four hub genes individually, two articles explored three hub genes individually, etc. Moreover, TOP2A gene supported by 11 articles, CDK1 gene was supported by 10 articles, MCM2 gene was supported by 7 articles, CDC45 gene was supported by 6 articles, and (AURKA, CCNB1, PCNA) genes were supported by 5 articles (see Figure 9). Then, we selected top-ranked three drugs as the candidate drugs, where the first three drugs showed strong binding affinities with all target proteins (Figure 6), where red * denoted published and proposed receptors, and blue ** denoted published receptors. Some other independent studies also recommended our suggested drugs, including Vincristine [56,57], Vinorelbine [57,58], and Paclitaxel [56][57][58][59][60], for the treatment against CC. Finally, we examined the stability of top-ranked three drugs (Vincristine, Vinorelbine, Paclitaxel) by using 100 ns MD-based MM-PBSA simulations for three top-ranked proposed receptors (AURKA, CDK1, TOP2A) and observed their stable performance according to the laws of physics [61,62]. Therefore, the proposed candidate drugs might play a vital role in the treatment against CC with comorbidities, since our proposed target proteins are also associated with several comorbidities. The present study emphasizes the further wet-lab experimental validation for both the proposed target proteins and candidate drugs.

Data Sources and Descriptions
We used both original data and meta-data to reach the goal of this study as described below.

Collection of Microarray Exploring Profiles for Genomic Biomarkers and Drug Target Receptors
We collected microarray profiles for cervical cancer (CC) disease from the National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (GEO) website (http://www.ncbi.nlm.nih.gov/geo/, (accessed on 10 March 2022)). CC patients microarray datasets of GSE6791 [63], GSE27678 [64], and GSE63514 [65] were all based upon the GPL570 Platforms ((HG-U133_Plus_2) Affymetrix Human Genome U133 Plus 2.0 Array), which included 20 CC tissues and 8 normal cervical tissues, 28 CC tissues and 3 normal cervical tissues, and 28 CC tissues and 24 normal cervical tissues, respectively. The GSE9750 [66] dataset used the GPL96 Affymetrix Human Genome U133A Array platform and included 33 CC tissue samples that were primarily marked by HPV16 or HPV18 and 21 normal cervical samples.

Collection of Meta-Drug Agents for Exploring Candidate Drugs
We collected host-transcriptome-guided 80 meta-drug agents by the literature review of CC disease (see Table S1) for exploring candidate drugs. Thus, we considered 80 drug agents to explore candidate drugs by molecular docking with our proposed receptors.

Data Sources and Descriptions
We used both original data and meta-data to reach the goal of this study as described below.

Collection of Microarray Exploring Profiles for Genomic Biomarkers and Drug Target Receptors
We collected microarray profiles for cervical cancer (CC) disease from the National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (GEO) website (http://www.ncbi.nlm.nih.gov/geo/, (accessed on 10 March 2022)). CC patients microarray datasets of GSE6791 [63], GSE27678 [64], and GSE63514 [65] were all based upon the GPL570 Platforms ((HG-U133_Plus_2) Affymetrix Human Genome U133 Plus 2.0 Array), which included 20 CC tissues and 8 normal cervical tissues, 28 CC tissues and 3 normal cervical tissues, and 28 CC tissues and 24 normal cervical tissues, respectively. The GSE9750 [66] dataset used the GPL96 Affymetrix Human Genome U133A Array platform and included 33 CC tissue samples that were primarily marked by HPV16 or HPV18 and 21 normal cervical samples.

Collection of Meta-Drug Agents for Exploring Candidate Drugs
We collected host-transcriptome-guided 80 meta-drug agents by the literature review of CC disease (see Table S1) for exploring candidate drugs. Thus, we considered 80 drug agents to explore candidate drugs by molecular docking with our proposed receptors.

Collection of Independent Meta-Receptors for Cross-Validation with the Proposed Drugs
To select the top-ranked hub genes (independent meta-receptors) associated with CC disease, we reviewed 52 published articles and selected the top-ranked 7 target proteins as the independent meta-receptors (see Table S2).

Identification of cDEGs for CC Patients
The identification of cDEGs was a key step of this study. To identify cDEGs, we identified DEGs for each of GSE6791, GSE27678, GSE63514, and GSE9750 datasets separately, by using linear models for microarray (LIMMA) approach through GEO2R online tool [67] with |logFC| > 1.0 and adjusted p-value < 0.05. The LIMMA approach calculates the p-value by using the modified t-statistics to test the significance of differential gene expressions between two conditions, and p-value is then adjusted by using the Benjamini-Hochberg procedure [68]. Finally, we selected the common DEGs by using four DEGs sets derived from four publicly available microarray datasets.

Construction of Protein-Protein Interaction (PPI) Network for Identification of KGs
The PPI network of cDEGs was constructed through the STRING online database (https://string-db.org/, (accessed on 10 March 2022)) [69]. To improve the quality of PPI network, we used the Cytoscape software [70]. The Cytoscape plugin cytoHubba was used to select the common Hub Genes (cHubGs) or common Hub Proteins (cHubPs) from PPI network [70,71]. The PPI network provides several nodes and edges, which indicate proteins and their interactions, respectively. A node with the largest number of significant interactions/connections/edges with other nodes is considered as the top-ranked cHubGs. The cHubGs were selected by using four topological analyses (Degree [72], BottleNeck [73], Betweenness [74], and Stress [75]) of the PPI network. Molecular Complex Detection (MCODE) (http://apps.cytoscape.org/apps/mcode, (accessed on 10 March 2022)) plugin of Cytoscape software was used to detect the most profound modules from the PPIs network. Highly interconnected portions were identified through MCODE clustering that assists the research in effective drug designing. For representing molecular complexes in the PPI network, MCODE was used by detecting the densely connected areas [76]. Then, we selected the important key genes (KGs) that were shared by both cHubGs and MCODE clustering genes.

Regulatory Network Analysis of KGs
To explore key transcriptional regulatory transcription factors (TFs) and post-transcriptional regulatory micro-RNAs (miRNAs) of KGs, we performed TFs-TGs and miRNAs-KGs interaction network analysis, respectively, by using the NetworkAnalyst web platform [77]. The TFs-KGs and miRNAs-KGs interaction networks were constructed by using the ENCODE (https://www.encodeproject.org/, (accessed on 10 March 2022)) [78] and Reg-Network repository [79] databases, respectively. The Cytoscape software [70] was used to improve the quality of networks. The key regulators were selected by using two topological analyses (Degree [72] and Betweenness [74]) of networks.

GO Terms and KEGG Pathway Enrichment Analysis of KGs
Gene ontology (GO) functional and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment/annotation/over-representation analysis [80,81] is a widely used approach to determine the significantly annotated/enriched/over-represented functions/classes/terms and pathways by the identified cDEGs/cHubGs. It is an important part for revealing the molecular mechanisms of actions and cellular roles of genes. The GO terms are categorized into Biological Process, Cellular Component, and Molecular Function [82]. We performed GO and KEGG enrichment analysis using DAVID web tool (https://david.ncifcrf.gov/tools.jsp, (accessed on 10 March 2022)) [83]. The significance level was set to p-value < 0.05.

Survival Analysis
To examine the prognostic performance of KGs in detecting CC, we performed a multivariate survival analysis of CC patients based on expressions of KGs by using SurvExpress online tool [84]. The significance level was set to p-value < 0.05.

Drug Repurposing by Molecular Docking Study
To propose in silico validated efficient candidate drugs for the treatment against CC, we employed a molecular docking study of our proposed receptor proteins with the drug agents. We considered our proposed KGs-based key proteins (KPs) and their regulatory key TFs proteins as the drug target proteins and 80 drug agents as mentioned earlier in Section 4.3 (see Table S1). The molecular docking study requires three-dimensional (3D) structures of both receptor proteins and drug agents/ligands. We downloaded the 3D structure of all targeted proteins from the Protein Data Bank (PDB) [37] and SWISS-MODEL [85]. The 3D structures of all drug agents were downloaded from PubChem database [39]. The 3D structures of the target proteins were visualized using Discovery Studio Visualizer 2019 [86], and protein chains that were not part of the gene were removed. PDB2PQR and H++ servers were utilized to assign the protonation state of target proteins [87,88]. All the missing hydrogen atoms were also appropriately added. The pKa for target proteins residues were investigated under the physical conditions of salinity = 0.15, internal dielectric = 10, pH = 7, and external dielectric = 80. Further, the protein was prepared for molecular docking by removing ligand heteroatoms and water molecules and by addition of polar hydrogens on AutoDock tools 1.5.7 (developed by ADT, The Scripps Research Institute, La Jolla, CA, USA) [89]. The drug agents/ligands were prepared for molecular docking simulation by setting the torsion tree and rotatable/non-rotatable bonds present in the ligand through AutoDock tools 1.5.7. Then, binding affinities between the target proteins and drug agents were calculated using AutoDock Vina [90]. The exhaustiveness parameter was set to 10. Discovery Studio Visualizer 2019 [86], PLIP [91], and PyMol [92] were used to analyze the docked complexes for surface complexes, types, and distances of non-covalent bonds. Let A ij denotes the binding affinity between ith target protein (I = 1, 2, . . . , m) and jth drug agent (j = 1, 2, . . . , n). Then, target proteins were ordered according to the descending order of row sums ∑ n j=1 A ij , j = 1, 2, . . . , m, and drug agents were ordered according to the descending order of column sums ∑ m i=1 A ij , j = 1, 2, . . . , n, to select the few top-ranking drug agents as the candidate drugs. Then, we validated the proposed repurposed drugs by molecular docking study with the top-listed receptor proteins associated with CC infections that were obtained by the literature review. To select the top-listed receptor proteins associated with CC infections, we reviewed 52 recently published articles and selected the top-ranked seven receptor proteins (see Table S2).

Molecular Dynamic (MD) Simulations
MD simulations were carried out by using YASARA Dynamics software [93] and the AMBER14 force field [94] to study the dynamic behavior of the top-ranked protein-ligand complexes. A total of six different systems were used to run MD simulations. The systems included the top three hits, AURKA_paclitaxel, CDK1_vinorelbine, and TOP2A_vincristine complexes corresponding to our proposed receptors.
For the complexes, the parameters of ligands were assigned through AutoSMILES [95] algorithms, where unknown organic molecules are parameterized fully automatically by the calculation of semi-empirical AM1 Mulliken point charges with the COSMO solvation model, assigning of AM1BCC [96] atom and bond types, and also assigning general AMBER force field (GAFF) [97] atom types and the remaining force field parameters. Before the simulation, the protein-ligand complexes hydrogen bonding network was optimized and solvated by a TIP3P [98] water model in a simulation cell. Periodic boundary conditions were maintained with a solvent density of 0.997 g L −1 . Titratable amino acids in the protein complex were subjected to pKa calculation during solvation. The initial energy minimization process of each simulation system, consisting of 55,410 ± 10, 72,287 ± 10, and 96,252 ± 10 atoms for AURKA_paclitaxel, CDK1_vinorelbine, and TOP2A_vincristine complexes, was performed by a simulated annealing method, respectively, using the steepest gradient approach (5000 cycles). Each simulation was run with a multiple-time-step algorithm [99], using a time-step interval of 2.50 fs under physiological conditions (298 K, pH 7.4, 0.9% NaCl) [100]. All bond lengths were constrained using the linear constraint solver (LINCS) [101] algorithm, and SETTLE [102] was used for water molecules. Longrange electrostatic interactions were described by the PME [103] methods, and finally, 100 ns MD simulation was accomplished at Berendsen thermostat [104] and constant pressure. The trajectories were recorded every 250 ps for further analysis, and subsequent analysis was implemented by the default script of YASARA [105] macro and SciDAVis software available at http://scidavis.sourceforge.net/, (accessed on 10 Mar 2021). All snapshots were then subjected to YASARA software's MM-Poisson-Boltzmann surface area (MM-PBSA) binding free energy calculation using the formula below [106]: Here, MM-PBSA binding energy was calculated using YASARA built-in macros using AMBER 14 as a force field, with larger positive energies indicating better binding [107].

Conclusions
The present study utilized various well-established bioinformatics tools to reveal KGs, highlighting their regulatory factors and dysregulated molecular functions and pathways that were responsible for CC progression. The five KGs were common between our proposed seven KGs and the top-ranked seven KGs published by others, which indicated that our proposed KGs received more support by the literature review compared to any other individual studies. Finally, we suggested potential drugs, such as vincristine, vinorelbine, paclitaxel, and investigated their stability performance by using 100 ns MD-based MM-PBSA simulations for three top-ranked proposed receptors (AURKA, CDK1, TOP2A), also observing their stable performance. Thus, the proposed molecular biomarkers and repurposing candidate drugs presented in this study have merit for diagnosis and therapies of CC disease.