Unravelling the Role of miR-20b-5p, CCNB1, HMGA2 and E2F7 in Development and Progression of Non-Small Cell Lung Cancer (NSCLC)

Lung cancer is a prime cause of worldwide cancer deaths, with non-small cell lung cancer (NSCLC) as a frequent subtype. Surgical resection, chemotherapy are the currently used treatment methods. Delayed detection, poor prognosis, tumor heterogeneity, and chemoresistance make them relatively ineffective. Genomic medicine is a budding aspect of cancer therapeutics, where miRNAs are impressively involved. miRNAs are short ncRNAs that bind to 3′UTR of target mRNA, causing its degradation or translational repression to regulate gene expression. This study aims to identify important miRNA-mRNA-TF interactions in NSCLC using bioinformatics analysis. GEO datasets containing mRNA expression data of NSCLC were used to determine differentially expressed genes (DEGs), and identification of hub genes-BIRC5, CCNB1, KIF11, KIF20A, and KIF4A (all functionally enriched in cell cycle). The FFL network involved, comprised of miR-20b-5p, CCNB1, HMGA2, and E2F7. KM survival analysis determines that these components may be effective prognostic biomarkers and would be a new contemplation in NSCLC therapeutics as they target cell cycle and immunosurveillance mechanisms via HMGA2 and E2F7. They provide survival advantage and evasion of host immune response (via downregulation of cytokines-IL6, IL1R1 and upregulation of chemokines-CXCL13, CXCL14) to NSCLC. The study has provided innovative targets, but further validation is needed to confirm the proposed mechanism.


Introduction
Lung cancer is the trivial cause of cancer-associated worldwide deaths with a flat overall 5-year survival rate of less than 15%. It is patho-physiologically divided into two subgroups: highly aggressive, but less frequent (~15%) small cell lung cancer (SCLC) and less aggressive but highly intermittent (~85%) non-small cell lung cancer (NSCLC). NSCLC is further histologically subdivided into three subtypes-adenocarcinoma (40%), squamous cell carcinoma (25%), and large cell carcinoma (10%). Lung cancer usually commences with oncogene activation or tumor suppressor gene inactivation [1]. Despite developments of innovative therapies, the survival rate for NSCLC patients still remains low. Late disease presentation, histological subtype tumor heterogeneities, and restrained understanding

Weighted GCN Construction and Module Detection
The Weighted Gene Co-expression Network Analysis (WGCNA) package [24] available in R was used for GCN construction and representative module genes identification. The NSCLC-associated DEGs were primarily tested with goodSamplesGenes function to remove any unqualified genes and samples. Specifically, for every pair of genes i and j, the absolute value of Pearson correlation is defined as: The similarity matrix S is defined as S = s ij . Next, the pickSoftThreshold function was used for selecting an appropriate soft-thresholding power (β) based on a scale-free topology criterion [25]. We define a power adjacency function which assists to transform the similarity matrix s ij into a weighted adjacency matrix a ij . a ij = power s ij , β ≡ s ij β (2) The adjacency matrix was then transformed into a similarity measure topological overlap matrix (TOM) followed by the computation of a TOM-based dissimilarity measure (dissTOM). To group genes with similar patterns of expression across samples, a hierarchical clustering dendrogram using hclust function was constructed according to the dissTOM measure. The dynamic tree cut algorithm was applied to detect tightly connected modules of genes from the branches of the tree. A summary profile for each module was computed in the form of their Module Eigengenes (MEs) followed by the dissimilarity of module eigengenes (MEdiss) computation. ME can be regarded Biology 2020, 9,201 4 of 21 as the highest representative gene expression profile of the module. Modules with very similar expression profiles were merged (since their genes were highly co-expressed) by cutting the module eigengene dendrogram at suitable MEdiss threshold. Intramodular connectivity is a quantitative measure of network connectivity with respect to nodes or genes of a specific module. Standard and TOM-based-intramodular connectivity are abbreviated by k.in and ω.in, respectively. k.in = n j=1 a ij (3) ω.in = n j=1 ω ij (4) where ω ij is the topological overlap between i and j nodes. WGCNA function, signedKME was used to compute the module membership (MM) which correlated the gene expression values with ME. kME(i) = cor(x i , ME) The genes that infirmly correlated to all MEs (|kME| < 0.7), were assigned to none of the modules. A gene is vital in the given module if the MM was highly linked to k.in. Important genes within a module are characterized by high k.in and MM. These genes are highly connected with other genes and have a high functional relevance. The intramodular connectivity is computed for each module gene by using the WGCNA function intramodularConnectivity. This function calculates the within module connectivity (k.in), whole network connectivity (kTotal), kOut = kTotal − k.in, and kDiff = k.in − kOut. The genes from each module with MM > 0.9 were considered as the representative genes of modules [26].

PPI Network Construction, Enrichment Analysis, and Hub Genes Identification
The highly representative module genes were used for analyzing the strongest possible interactions among them in the form of PPI network. These genes were imported into the Search Tool for the Retrieval of Interacting Genes (STRING, https://string-db.org/) v11.0 database [27] and an overall score > 0.9 (corresponding to highest confidence) was set as the preferred threshold for the construction of PPI network. The network was subsequently visualized using Cytoscape v3.8.0 [28]. Also, the PPI network was analyzed using Molecular Complex Detection (MCODE) plugin available in Cytoscape to identify densely correlated local regions/clusters. The parameters set in MCODE for cluster detection were as follows: "Degree cutoff = 2", "node score cutoff = 0.2", "k-score = 2", "max. depth = 100", and "cut style = haircut" [29]. The top-scoring PPI cluster genes were further subjected to Gene Ontology (GO) term and pathway enrichment analysis to identify the hub genes. GO term enrichment analysis was performed using Enrichr [30] gene set libraries like GO Biological Process (BP), GO Molecular Function (MF), and GO Cellular Compartment (CC). Also, the pathway enrichment analysis was performed using different Enrichr gene set libraries like WikiPathways, BioPlanet, and KEGG. To account for multiple comparisons, the pathways and GO terms with a BH-p-value < 0.001 were considered as significantly enriched. The common genes in all significantly enriched GO terms and pathways were considered as the hub genes, respectively.

Survival Analysis and Hub Genes Validation
To analyze the prognostic impacts of NSCLC-associated hub genes, Kaplan Meier (KM) plotter database (https://kmplot.com/analysis/) [31] was accessed. This database includes gene expression data, overall survival (OS) information of patients and relapse information corresponding to 21 cancer types from sources like GEO, TCGA, and EGA, respectively. The samples were split into low and high expression cohorts by the median expression value of each hub gene to assess the OS of NSCLC Biology 2020, 9, 201 5 of 21 patients. Each hub gene was assigned an Affy ID followed by the removal of outlier arrays to produce the corresponding KM survival plots. Moreover, information on 95% confidence interval (95% CI) with hazard ratio (HR), number-at-risk, and log-rank p values were computed and showed on the plot. The genes with log rank p < 0.05 were considered as statistically significant. The Gene Expression Profiling Interactive Analysis v2 (GEPIA 2) web-based tool (http://gepia2.cancer-pku.cn/) [32] was accessed for validating the hub genes by comparing their relative expression between normal and NSCLC tissue samples from The Cancer Genome Atlas (TCGA) and Genotype-Tissue Expression (GTEx) databases, respectively. The settings used for boxplot comparison were as follows: Log 2 FC cutoff = 1.5, p-value cutoff = 0.0001, and jitter size = 0.3. Pathological stage plot analysis was also performed with GEPIA 2 to assess the expression levels of hub genes with respect to NSCLC TNM stages.

Construction of the NSCLC-Specific 3-Node miRNA FFL
The human miRNAs intended to target our hub genes were obtained from miRWalk v3.0 [33] and StarBase v2.0 [34] databases, respectively. miRNAs having a total score > 0.95 and binding region = 3 UTRwere chosen from miRWalk. Highly significant TFs with an integrated top rank score (p-value) < 0.001 intended to regulate our hub genes were obtained from ChEAv3.0 database [35]. The miRNAs targeting these TFs were also obtained from miRWalk and StarBase databases with previously described thresholds. A two-tier validation screening was performed to fetch highly confident interaction pairs. In the first-tier, all the obtained miRNAs and TFs were exhaustively searched via available literature studies and the ones having an association with NSCLC were promoted to second-tier. In the second-tier, we checked which of our first-tier miRNAs were evolutionarily conserved in humans and mouse. We obtained mouse miRNAs targeting our hub genes from the same databases (miRWalk and StarBase) with previously described thresholds and the common mouse and first-tier screened miRNAs were considered as highly conserved and confident miRNAs. All the three interaction pairs (i.e., miRNA-gene, TF-gene, and miRNA-TF) were then altered with respect to the first and second-tier validated regulatory elements (i.e., TFs and miRNAs) and merged to construct a 3-node NSCLC-specific miRNA feed-forward loop (FFL) network which was finally visualized using Cytoscape. Highest-order network motif in our network was also identified based on the degree of molecular interactions. The survival and expression analysis of the regulatory items in our network motif (i.e., miRNA and TF) were performed using KM plotter, OncoLnc (http://www.oncolnc.org/), and GEPIA 2, respectively.

Exploratory Genomic Analysis of Hub Genes and TFs
The cBioPortal for Cancer Genomics (https://www.cbioportal.org/) [36] was queried for investigating the alteration frequencies in our hub genes. Pan-lung cancer (TCGA, Nat Genet 2016) dataset [37] was selected in cBioPortal to perform our analysis.

NSCLC-Associated Meta-DEGs Identification
mRNA datasets with accession numbers GSE118370 and GSE18842 were chosen from GEO based on the search criteria specified in materials and methods section. Both the datasets were based on GPL570 [HG-U133_Plus_2] Affymetrix Human Genome U133 2.0 Array platform type. We considered a total of 100 paired tissue samples (12 from GSE118370 and 88 from GSE18842) for meta-analysis, respectively. There were a total of 20843 unique genes mapping to Affymetrix probes in both datasets. A total of 603 meta-DEGs were identified based on the BH-p-value and log 2 FC threshold criterion. A 2-dimensional principal component analysis (PCA) plot representing the variation in expression data of meta-DEGs between tumor and normal samples is shown in Figure 1. Also, a total of 218 and 385 meta-DEGs were classified as up and downregulated based on the log 2 FC threshold criterion and is shown by a volcano plot in Figure 1. List of all DEGs can be found in Table S1.

Weighted GCN Construction and Module Analysis
WGCNA was used for categorizing 603 meta-DEGs with similar expression levels into different modules. In our study, we selected β=16 (at scale free R 2 = 0.80) as the soft-thresholding power to construct a scale-free weighted GCN. The chosen value of β satisfied all the scale-free topology criteria as evidenced in Figure S1. A total of six different color-coded (i.e., green, yellow, turquoise, brown, blue, and grey) co-expression modules were identified by hierarchical clustering and dynamic branch cutting, ranging in size from 39 to 179. To merge modules with highly co-expressed genes, a ME dendrogram was cut at a height of 0.2 corresponding to a correlation of 0.8. Pairwise scatter plots among original MEs and samples representing their degree of correlations between them is shown in Figure 2A. Both yellow and turquoise modules were merged to the brown module based on their high correlation. A total of four meta-modules (i.e., green, blue, brown, and grey) were obtained after merging as compared to the original six modules and are shown in Figure 2B. Since the grey module consisted of unassigned genes, therefore we discarded it for further analysis. Figure  2C shows the GCN heatmap plot depicting TOM among all the genes. Figure 2D-F shows the heatmap plots of three meta-modules along with their corresponding ME levels. Figure S2 shows MM vs. k.in correlation plots of all three meta-modules where all of them are highly significant (based on p-value) and correlated (based on Pearson's correlation coefficient). A total of 151 genes from all the three meta-modules were screened as highly representative considering the threshold (module genes with MM > 0.9). The disease status is distinguished by the color of points, i.e., magenta for tumor and green for normal samples. It could be seen that both the normal and tumor samples are clustered independently and distinctly. The percentage of total variation that is accounted for by the 1st and 2nd principal components are shown on the x and y axes, respectively. (B) Volcano plot distribution highlighting 603 meta-DEGs between normal and tumor samples. The orange and green colored points denote the up (218) and downregulated (385) meta-DEGs. All the black colored points denotes non-significant genes. The x and y axes represent the log 2 FC and − log 10 p-value, respectively.

Weighted GCN Construction and Module Analysis
WGCNA was used for categorizing 603 meta-DEGs with similar expression levels into different modules. In our study, we selected β = 16 (at scale free R 2 = 0.80) as the soft-thresholding power to construct a scale-free weighted GCN. The chosen value of β satisfied all the scale-free topology criteria as evidenced in Figure S1. A total of six different color-coded (i.e., green, yellow, turquoise, brown, blue, and grey) co-expression modules were identified by hierarchical clustering and dynamic branch cutting, ranging in size from 39 to 179. To merge modules with highly co-expressed genes, a ME dendrogram was cut at a height of 0.2 corresponding to a correlation of 0.8. Pairwise scatter plots among original MEs and samples representing their degree of correlations between them is shown in Figure 2A. Both yellow and turquoise modules were merged to the brown module based on their high correlation. A total of four meta-modules (i.e., green, blue, brown, and grey) were obtained after merging as compared to the original six modules and are shown in Figure 2B. Since the grey module consisted of unassigned genes, therefore we discarded it for further analysis. Figure 2C shows the GCN heatmap plot depicting TOM among all the genes. Figure 2D-F shows the heatmap plots of three meta-modules along with their corresponding ME levels. Figure S2 shows MM vs. k.in correlation plots of all three meta-modules where all of them are highly significant (based on p-value) and correlated (based on Pearson's correlation coefficient). A total of 151 genes from all the three meta-modules were screened as highly representative considering the threshold (module genes with MM > 0.9). MEgrey denote the ME of brown, turquoise, yellow, green, blue, and grey modules, respectively. Absolute values of corresponding correlations are denoted by the numbers below the diagonal. Frequency plots (histograms) of the variables are plotted along the diagonal. Yellow and turquoise modules are highly correlated (i.e., r=0.91) followed by brown and turquoise modules (i.e., r=0.89). Both turquoise and yellow modules were merged to the brown module. (B) Hierarchical clustering dendrogram of DEGs clustered based on a dissimilarity measure (1-TOM) together with original module colors (6) and merged module colors (4). The four merged modules (or meta-modules) contained highly co-expressed genes with size as follows: brown (361 DEGs), blue (163 DEGs), green (39 DEGs), and grey (40 DEGs) respectively. (C) Topological overlap matrix (TOM) heatmap plot where the rows and columns correspond to single genes. Lighter color signifies low topological overlap and progressively darker orange and red colors signify higher topological overlap. Dark colored blocks along the diagonal signify the meta-modules. The corresponding gene dendrogram and module assignment are also displayed along the left and the top sides of the plot. Expression heatmap of (D) brown, (E) blue, and (F) green meta-module genes where the rows and columns correspond to genes and samples, respectively. The green and red colored bands in the heatmap denote lower and higher expression level of genes for each module. Also, the corresponding module eigengene expression levels (y-axis) across the same samples (x-axis) are displayed at the bottom panel of each module heatmap in the form of barplot. The module eigengene levels are directly correlated with the gene expression levels of each corresponding module.

PPI Network Analysis, Enrichment Analysis, and Hub Genes Identification
A total of 148 out of 151 module genes mapped to their corresponding proteins are available in STRING. Total of 73 genes out of them participated in the PPI network at the prescribed confidence score threshold, i.e., >0.9. The constructed PPI network had 73 nodes and 433 edges ( Figure 3A), respectively. MCODE plugin revealed a total of four clusters, out of which cluster-1 was selected as the hub cluster since it had the top score (22.261) and involved a total of 24 nodes with 256 edges ( Figure 3B), respectively. These 24 top cluster genes were subjected to GO term and pathway enrichment analysis. Lists of all significantly enriched GO terms and pathways can be seen in Supplementary Tables 2 and 3, respectively. A total of five genes were common in all the significantly enriched terms and pathways ( Figure 3C) and were considered as the hub genes. Violin plot distribution of these hub genes between tumor and normal samples is shown in Figure 3D. Among all the hub genes, CCNB1 was highly overexpressed in NSCLC samples. . Each dot signifies a microarray sample. MEbrown, MEturquoise, MEyellow, MEgreen, MEblue, and MEgrey denote the ME of brown, turquoise, yellow, green, blue, and grey modules, respectively. Absolute values of corresponding correlations are denoted by the numbers below the diagonal. Frequency plots (histograms) of the variables are plotted along the diagonal. Yellow and turquoise modules are highly correlated (i.e., r = 0.91) followed by brown and turquoise modules (i.e., r = 0.89). Both turquoise and yellow modules were merged to the brown module. (B) Hierarchical clustering dendrogram of DEGs clustered based on a dissimilarity measure (1-TOM) together with original module colors (6) and merged module colors (4). The four merged modules (or meta-modules) contained highly co-expressed genes with size as follows: brown (361 DEGs), blue (163 DEGs), green (39 DEGs), and grey (40 DEGs) respectively. (C) Topological overlap matrix (TOM) heatmap plot where the rows and columns correspond to single genes. Lighter color signifies low topological overlap and progressively darker orange and red colors signify higher topological overlap. Dark colored blocks along the diagonal signify the meta-modules. The corresponding gene dendrogram and module assignment are also displayed along the left and the top sides of the plot. Expression heatmap of (D) brown, (E) blue, and (F) green meta-module genes where the rows and columns correspond to genes and samples, respectively. The green and red colored bands in the heatmap denote lower and higher expression level of genes for each module. Also, the corresponding module eigengene expression levels (y-axis) across the same samples (x-axis) are displayed at the bottom panel of each module heatmap in the form of barplot. The module eigengene levels are directly correlated with the gene expression levels of each corresponding module.

PPI Network Analysis, Enrichment Analysis, and Hub Genes Identification
A total of 148 out of 151 module genes mapped to their corresponding proteins are available in STRING. Total of 73 genes out of them participated in the PPI network at the prescribed confidence score threshold, i.e., >0.9. The constructed PPI network had 73 nodes and 433 edges ( Figure 3A), respectively. MCODE plugin revealed a total of four clusters, out of which cluster-1 was selected as the hub cluster since it had the top score (22.261) and involved a total of 24 nodes with 256 edges (Figure 3B), respectively. These 24 top cluster genes were subjected to GO term and pathway enrichment analysis. Lists of all significantly enriched GO terms and pathways can be seen in Supplementary Tables S2  and S3, respectively. A total of five genes were common in all the significantly enriched terms and pathways ( Figure 3C) and were considered as the hub genes. Violin plot distribution of these hub genes between tumor and normal samples is shown in Figure 3D. Among all the hub genes, CCNB1 was highly overexpressed in NSCLC samples.

Survival Analysis and Hub Genes Validation
The KM plotter was used to determine prognostic information of our 5 hub genes (BIRC5, CCNB1, KIF4A, KIF11, KIF20A) to validate the link between their expression levels and metastasis risk in NSCLC. The KM plots of our hub genes shown in Figure 4A

Survival Analysis and Hub Genes Validation
The KM plotter was used to determine prognostic information of our 5 hub genes (BIRC5, CCNB1, KIF4A, KIF11, KIF20A) to validate the link between their expression levels and metastasis risk in NSCLC. The KM plots of our hub genes shown in Figure 4A   showing the association of their mRNA expression levels and various tumor stages in NSCLC patients. The white dots and black bars depict the median and interquartile ranges, respectively. The width of the turquoise colored shapes depicts the distribution density. The stages of lung cancer were represented by abscissa and the expression level of each hub genes are represented by its ordinate. The method for differential analysis is one-way ANOVA, using pathological stage as a variable for computing differential expression. A better study fit corresponds to a larger F-value. All the genes were statistically significant at p-value < 0.05 with CCNB1 having the best study fit (i.e., F-value = 12.1). The results of GEPIA analysis as shown by the boxplots in Figure 4F-J validates that all these five upregulated hub genes (BIRC5, CCNB1, KIF4A, KIF11, KIF20A) were also significantly overexpressed in LUAD and LUSC tissue samples compared with normal tissue samples. Also, GEPIA violin stage plots of hub genes with respect to different pathological staging as shown in Figure 4K-O suggested that their higher expression levels were significantly correlated with advanced TNM stages. Among all the hub genes, BIRC5 and CCNB1 were most overexpressed in LUAD and LUSC samples as evidenced from their boxplots.

Analysis of NSCLC-Specific 3-Node miRNA FFL
Our 3-node miRNA FFL network ( Figure 5A) comprised a total of 21 nodes and 66 edges, respectively.
Among these edges, 19, 25, and 22, belonged to miRNA-mRNA, TF-mRNA, and miRNA-TF interaction pairs, respectively. Table 2 summarizes all the three types of regulatory relationships between miRNAs, TFs, and mRNAs. Among the five hub genes, BIRC5 was targeted by a maximum number of miRNAs (i.e., 8) followed by CCNB1, KIF11, and KIF4A (i.e., 3). miR-20b-5p targeted maximum number of hub genes (i.e., 4). Among all the TFs, HMGA2 and E2F7 were targeted by a maximum number of miRNAs (i.e., 9 and 7). The analysis of whole FFL revealed that the highest-order network motif (based on the degree) comprised one miRNA (miR-20b-5p), two TFs (HMGA2, E2F7), and one hub gene (CCNB1) and is shown in Figure 5B. KM plots of HMGA2 and E2F7 plotted using KM plotter are shown in Figure 5C,F with the same settings as previously described. The plots indicated that higher expression levels of HMGA2 (HR = 1.33; 95% CI = 1.17-1.52; p < 0.05) and E2F7 (HR = 1.75; 95% CI = 1.37-2.23; p < 0.05) were associated with shorter OS in NSCLC patients. HMGA2 and E2F7 expression levels were significantly higher in LUSC samples only as evidenced by GEPIA boxplots in Figure 5D,G. Violin stage plots of both HMGA2 and E2F7 were significantly correlated with advanced TNM stages as evidenced from Figure 5E,H. Also, the survival plots of miR-20b-5p in LUAD and LUSC patient samples were plotted using OncoLnc database as shown in Figure 5I-J. The LUAD and LUSC patient samples were split into low and high expression cohorts by assigning lower and higher percentile a 50:50 ratio. Lower expression levels of miR-20b-5p indicated shorter OS in both the LUAD and LUSC patient samples. The median survival time in the high and low expression cohorts of each regulatory element (i.e., HMGA2, E2F7, and miR-20b-5p) is shown in Table 3. showing the association of their mRNA expression levels and various tumor stages in NSCLC patients. The white dots and black bars depict the median and interquartile ranges, respectively. The width of the turquoise colored shapes depicts the distribution density. The stages of lung cancer were represented by abscissa and the expression level of each hub genes are represented by its ordinate. The method for differential analysis is one-way ANOVA, using pathological stage as a variable for computing differential expression. A better study fit corresponds to a larger F-value. All the genes were statistically significant at p-value<0.05 with CCNB1 having the best study fit (i.e., F-value=12.1).

Genomic Alterations of Hub Genes and TFs
cBioportal database was used to scrutinize the specific genetic modifications associated with five hub genes in the selected NSCLC dataset with 1144 samples. Cancer type summary analysis displayed the overall alteration frequency of all hub genes as shown in Figure 6A i.e., 9.24% of 660 cases with a mutation frequency of 5% (33 cases), amplification frequency of 2.27% (15 cases), deep deletion frequency of 1.67% (11 cases), and multiple alterations with a frequency of 0.3% (2 cases) in case of LUAD.

Discussion
Lung cancer is a prominent cause of cancer-associated mortality worldwide, with NSCLC being the most frequent type. Insufficient measures for early diagnosis, development of resistance, and delayed prognosis make the available diagnostic and therapeutic tools less effective. Thus, there is an urgent need for the development of more reliable and effective biomarkers as well as therapeutic targets. It has been shown by multifarious studies that non-coding RNAs (ncRNAs) play significant roles in the pathogenesis of different types of cancers and could endeavor a new acumen into the biological aspects of tumorigenesis [38]. Small ncRNAs such as miRNAs have been greatly explored over the past decade. Their regulatory roles in gene expression and other cellular processes have been extensively illustrated but the molecular mechanisms are not fully elucidated. Therefore, a profound exploration of the associated molecular mechanisms is unconditionally important. In consistence, our study was designed to identify the interactions between mRNAs, miRNAs, and TFs taking place in NSCLC, and to further deduce a plausible regulatory network (FFL network) among them.
In this study, we have performed a series of bioinformatics analysis to identify deregulated genes and pathways involved in the development and progression of NSCLC. Upon comparison of DEG profiles of two NSCLC expression datasets retrieved from GEO database, a total of 603 meta-DEGs were screened, out of which 218 were upregulated and 385 were downregulated (Figure 1). WGCNA based scale-free GCN construction gave us three modules in which 151 genes were placed on the basis of their correlation ( Figure 2). Later, application of MCODE algorithm to the PPI network consisting of 73 nodes and 433 edges allowed us to identify a hub cluster of genes, consisting of 24 nodes and 256 edges. Afterwards, GO term and pathway enrichment analysis resulted in the identification of five hub genes-BIRC5, CCNB1, KIF11, KIF20A, and KIF4A ( Figure 3). The selected hub genes function as a group and may play a momentous role in NSCLC. Pathway enrichment analysis proclaimed that all of these genes were cardinally enriched in cell cycle. In addition, survival analysis of the genes was performed to determine their prognostic significance in terms of overall survival. Further, validation of hub genes determined that all of these were significantly upregulated in lung adenocarcinoma (LUAD) and lung squamous cell carcinoma (LUSC), and consequently associated with advanced TNM stages ( Figure 4). BIRC5 (14.7 kb long) is a mitotic spindle checkpoint gene, present at the telomeric end of chromosome 17. It is a fundamental protein involved in the regulation of mitosis by mediating tumor cell proliferation via β-catenin pathway and tumor cell invasion and migration via TGF-β pathway and PI3K/AKT pathway and apoptosis due to its interaction with DNA damage repair proteins [39] along with several pathological conditions. It has been found to be upregulated in pancreatic cancer, breast cancer, hepatocarcinoma, esophageal carcinoma, and neuroblastoma, with poor clinical outcomes. Our study has highlighted higher expression of BIRC5 (2.02-fold) (Supplementary Table S1) with a shorter OS of NSCLC patients, owing it with the prognosis roles in NSCLC. This gene is critically enriched in cell cycle-related signaling pathways, and hence it becomes logical to assume that the oncogenic functions of BIRC5 are due to these cell cycle mediated mechanisms. However, the mechanism associated with NSCLC remains to be largely unidentified.
Cyclins are a group of proteins that ramble the processes associated with progression and regulation of cell cycle such as cell cycle entry, DNA damage repair and CDK (cyclin-dependent Kinases) mediated control of cell death. CCNB1, CCNB2, and CCNA2 are the authorizing members of the cyclin family, critical for regulation of cellular growth, proliferation, and apoptosis; and consequently, associated with cancer development and progression. They have been found to be highly upregulated in several tumor types including hepatocellular carcinoma. Our study has identified the upregulation (2.6-fold) of CCNB1 in NSCLC tissues as compared to the normal tissues, which has undoubtedly contributed to unfavorable OS in NSCLC patients. CCNB1 is an influential member of the cyclin family. It can initiate mitotic progression by promoting G2/M transition of cells by creating a complex of maturation promoting factor with CDK1, leading to phosphorylation of retinoblastoma (Rb) protein, consequently causing activation of transcription factor E2F2. As a result, E2F2 target genes are activated and regulate G1/S transition. However, E2F2 is lost in most of the cancers, sequentially leading to uncontrolled cell cycle progression. It has been found to be associated with anomalous cellular proliferation in the liver, breast, esophageal, and cervical cancer. We have also found a significant upregulation of CDK1 (1.7-fold), along with CCNB1 in NSCLC (Supplementary Table S1), predicting poor OS of NSCLC patients. Some studies have found CDK1 to be particularly important as it is associated with both OS and relapse-free survival (RFS). Thus, CCNB1 and CDK1 could act as probable prognostic biomarkers of NSCLC. Also, a contemporary study has determined that the levels of anti-CCNB1 antibodies increase with histological grades and stages of cancer, supporting the importance of early-stage screening and recurrence follow-up in advanced stages of lung cancer. Our results are also in concordance with the above studies indicating a high correlation of overexpression of CCNB1 with poor prognosis and clinical stages. Additionally, CCNB1 silencing inhibits cellular proliferation and induces cell senescence and apoptosis via P53 signaling pathway [40].
KIF11 is a mitotic kinesin distributed throughout the cytoplasm and plays a significant role in bipolar mitotic spindle formation. It is also involved in the transport of secretory proteins from the Golgi complex to the surface of non-mitotic cells [41]. The expression of KIF11 is highly noticeable in proliferating cells and tissues during development, however, it is undetectable in case of non-proliferating cells and tissues [42]. It is reported to be an independent prognostic biomarker for early prediction and recurrence intermittence in non-muscle-invasive bladder carcinoma patients and is also correlated with impoverished differentiation of bladder cancer. However, its state of expression and effect on NSCLC remains unclear. Moreover, since it is an important component of mitotic machinery and could be an impressive target of cancer, it becomes coherent to corroborate its role in NSCLC. Bioinformatics analysis in our study has demonstrated a significant correlation between KIF11 upregulation (1.58-fold) (Supplementary Table S1) and progression of NSCLC. A recent study has shown the in vitro and in vivo decrease in cellular proliferation, induction of G2/M cell cycle arrest and increased apoptosis of human breast cancer cells upon RNAi mediated silencing of KIF11. Moreover, our study has found that overexpression of KIF11 is a predictor of poor prognosis and is associated with the advanced stages of NSCLC, indicating its biomarker potential. Some of the studies have demonstrated the mechanism associated with protumorigenic actions of KIF11. They have shown that KIF11 is needed for optimal nascent polypeptide synthesis and is associated with ribosomes during interphase. Thus, upon inhibition of KIF11, ribosomes no longer bind to microtubules, leading to accumulation of polysomes in intact cells and finally resulting in defective elongation or termination during polypeptide synthesis. Moreover, to regulate spindle formation in mitotic cells, KIF11 has been found to be phosphorylated at Thr972 by CDK1, Aurora Kinase A (AURKA), and NIMA-related Kinase 6 (NEK6). However, we have also found the upregulation of CDK1 and AURKA (1.66-fold) (Supplementary Table S1) in NSCLC tissues in our study, suggesting their functional association with KIF11 protein in NSCLC too [41,43].
KIF20A, also known as RAB6KIFL and MKLP2, is a member of Kinesin family-6 and is located on chromosome 5q31.2. It contains a conserved motor domain that generates the energy needed for movement of proteins, upon binding to microtubules and hence is implicated into Golgi apparatus dynamics by interacting with Rab6 small GTPase. It is engaged in mitotic spindle formation and significantly involved in cytokinesis. It is customarily upregulated in malignant tumors, but meagerly expressed in normal tissues except thymus and testis. Its oncogenic properties have been reported in several types of cancers, including hepatocarcinomas, melanomas, pancreatic, and nasopharyngeal carcinomas. It has been found that it is highly upregulated in LUAD and confers malignant phenotype by stimulating cellular proliferation and inhibiting apoptosis. Aneuploidy, abnormal chromosomal distribution, and spindle defects are consequences of aberrant expression of KIF20A and could also be probable causes of tumorigenesis. Our study has found 1.87-fold upregulation of KIF20A (Supplementary Table S1) and its correlation with metastatic status and advanced clinical stage of NSCLC. Consistently, Kaplan-Meier analysis in our study has shown that NSCLC patients with higher expression of KIF20A have a shorter OS and is also associated with clinical stages. In-vitro studies have identified the mechanism by which KIF20A promotes tumorigenesis, i.e., cell cycle dysregulation and inhibition of apoptosis [44]. In addition, cancer cells overexpressing KIF20A can be recognized by host immune system and hence KIF20A-derived peptides can be used as immunotherapeutic agents. They have undergone several clinical trials but there are no data available for NSCLC yet. Studies have also shown that KIF20A knockdown reduced the proliferation, invasion, and migration of NSCLC cells via regulation of JNK signaling pathways [45]. Microarray analysis based study has identified that Forkhead box M1 (FOXM1) positively regulates the expression of KIF20A and upregulation of FOXM1 is associated with several normal and transformed cells of hepatocellular and skin basal cell carcinoma [42]. Consistent with the above findings, we have also observed a significant upregulation (1.9-fold) (Supplementary Table S1) of FOXM1, suggestive of its correlation with KIF20A overexpression in NSCLC.
KIF4A, a microtubule-based motor protein, involved in the regulation of chromosome segregation and spindle formation during mitosis. It is responsible for the generation of directional movements along microtubules and play principal roles in regulation of anaphase spindle dynamics and cytokinesis completion. It has been found to be highly expressed in hepatocellular carcinoma cells and tissues and is also an indicator of poor prognosis. Studies involving siRNA mediated knockdown of KIF4A have resulted in suppression of NSCLC cell growth and its expression promotes cellular invasion. Microarray studies have also demonstrated the decreased OS of NSCLC patients with overexpression of KIF4A [46]. Bioinformatics analysis in our study has determined its upregulation (1.94-fold) in NSCLC patients (Supplementary Table S1). The findings of our study have also indicated the poor OS and association with advanced TNM stage upon overexpression of KIF4A in NSCLC, in consistence with the previous studies. These results are suggestive of the fact that KIF4A is a potential growth factor associated with exceptionally malignant phenotype of lung cancer cells. Despite, the molecular mechanisms behind its overexpression are not yet clarified, it is atypical cancer-testis antigens and scrupulous inhibition of KIF4A using molecular agents could be a promising therapeutic strategy against NSCLC. Nonetheless, one of a recent study has reported contradictory results where the loss of KIF4A lead to multiple mitotic defects such as spindle defects, chromosome misalignment and abnormal cytokinesis, which may lead to tumorigenesis [42]. Thus, extensive research is further required to deduce a useful role of KIF4A in development and progression of NSCLC.
Furthermore, to identify the mutational status of hub genes in NSCLC, we analyzed the information available on cBioPortal tool and found that alteration frequencies were higher in LUAD as compared to LUSC. Individual mutation analysis of hub genes has revealed that KIF4A had the highest alteration frequency of 2.2% in NSCLC ( Figure 6). This is indicative of the association of genetic modifications in deregulated cellular processes, which could be a significant cause of upregulation of these genes and hence aberrations in pathways associated with the genes. Moreover, a deeper understanding of associations between somatic mutations and cancer traits would be of great help in designing precision cancer therapy.
Besides this, to expedite the elucidation of regulatory network involved in NSCLC, a 3-node miRNA FFL was generated, consisting of a miRNA (miR-20b-5p), a hub gene (CCNB1), and two transcription factors (HMGA2 and E2F7) ( Figure 5). miR-20-5p is present at a site (human chromosome Xq26.2) that has been reported to be related to the development and progression of a number of cancer types. It is proclaimed to exert context-dependent tumor-suppressive or oncogenic roles affecting cellular proliferation, migration, and apoptosis. It is downregulated in renal cell carcinoma but upregulated in breast cancer and NSCLC. However, its biological functions in NSCLC are largely unclear [47]. HMGA2 and E2F7 are the two direct targets of miR-20b-5p. HMGA proteins are non-histone nuclear proteins, commonly recognized as architectural transcription factors as they are involved in assemblage of gene transcription-associated multiprotein complexes. Besides chromatin organization of the transcription machinery in a structure that concedes gene transcription, it also interacts with minor grooves of several AT-rich promoters and enhancers to exert transcriptional activity. There are four types of HMGA proteins-HMGA1a, HMGA1b, HMGA1c, and HMGA2. HMGA2 is associated with human malignant tumors and neoplastic transformation of thyroid cells [48]. It regulates cellular proliferation and metastasis in lung cancer. Notably, both HMGA1 and HMGA2 are actively involved in the construction of senescence-associated heterochromatin and prolongation of growth-arrested state. HMGA2 is customarily rearranged in benign tumors of mesenchymal origin. Chromosomal aberrations in the region 12q14-15, affecting HMGA2 are generally observed in a wide range of tumors [49]. It is highly upregulated in majority of human prolactinomas with genetic alterations such as chromosome 12 trisomy and tetrasomy. Our study has identified the upregulation of HMGA2 in NSCLC, which is correlated with poor OS and advanced clinical stage ( Figure 5). HMGA2 is also associated with transformation of NSCLC cells but the associated mechanisms are still unknown. HMGA2 is reported to be involved in pRb pathway. pRb gene is related to progression of cell cycle via E2F family of transcription factors. They have been well defined as G1/S regulators. HMGA2 overexpression enhances the transcriptional activity of E2F1 proteins. These proteins are well documented to promote transcription of genes involved in S phase of the cell cycle. Prior to the entry of cells into S phase, cyclin CCNB1 forms a complex with CDK1, which phosphorylates Rb at multiples sites, leading to its inactivation and activation of E2F1/E2F2, both of which are involved in the regulation of G1/S transition. However, E2F2 is mostly lost in several tumors, but E2F1 is still activated leading to the transcriptional activation of its target genes. On the contrary, pRb recruits HDAC-1 (Histone deacetylase 1) which removes acetyl groups to repress the transcription. But HMGA2 binds to pRb and displaces HDAC1 so that E2F1 is continually activated and resumes the transcription of its target genes [48]. E2F1 is an effective stimulant of apoptosis and hence keeps a regulation of cells entering into S phase. However, continuous activation of E2F1 may also cause the transcriptional activation of E2F7 (a transcriptional target of E2F1 and also exerts poor OS (Figure 5)). E2F7 is highly upregulated in case of NSCLC, which, in turn, exerts a negative feedback on E2F1, causing its transcriptional repression and hence represses the apoptosis function of E2F1 and aberrantly allow the cells to enter into S phase of cell cycle [50]. Moreover, genetic modification analysis of both HMGA2 and E2F7 has displayed an overall alteration frequency of 8.03% in LUAD and 3.93% in LUSC ( Figure 6). This is suggestive of the fact that epigenetic mechanisms could be an important factor in upregulation of these transcription factors. Overall, it can be speculated that genetic modifications could lead to an abrupt upregulation of HMGA2 and E2F7, which are involved in uncontrolled transition of the cells from G1 to S phase via CCNB1 and CDK1 mediated phosphorylation of Rb protein. pRb protein causes release of E2F1 proteins, which upon uncontrolled activation causes activation of E2F7 proteins, which acts as a novel target to mediate repression of E2F1-associated apoptosis and hence removes the regulatory powers of E2F1, leading to continued proliferation of NSCLC cells.
Furthermore, downregulation of miR-20b-5p worsens the OS of NSCLC patients, as demonstrated by Kaplan Meir Survival analysis ( Figure 5), which is suggestive of the tumor-suppressive functions of miR-20b-5p. A recent study has shown that 3 UTR of HMGA2 contains let-7 binding sites and chromosomal aberrations in HMGA2 may lead to loss of miRNA-mediated repression and hence upregulation of HMGA2 [49]. Similarly, our study has found that mir-20b-5p targets both HMGA2 and E2F7 (both having a good frequency of genetic alterations in NSCLC, particularly in LUAD) and genetic alterations in HMGA2 and E2F7 may alter its 3 UTR site, consequently inhibiting miR-20b-5p mediated repression of HMGA2 and E2F7. Moreover, studies have also reported that oxidative stress, which is an important aspect of tumor microenvironment, downregulates miR-20b-5p and E2F1 via overexpression of HMGA2 [51]. This can be explained as, oxidative damage causes activation of HMGA2 which participates in ATM/CHK-mediated DNA damage repair systems causing prolonged phosphorylation of CHK1, facilitating DNA repair, cell cycle delay in G2/M phase until the DNA is fully repaired and increases survival and chemoresistance of cancer cells. Thus, HMGA2 exerts oncogenic functions via E2F7-mediated repression of pro-apoptotic E2F1 and oxidative damage-mediated cell cycle delay in G2/M phase and development of chemoresistance. Therefore, upon overexpression of miR-20b-5p, the tumor-suppressive effects could be rescued back by regulating cell cycle and promoting apoptosis, which could prove to be an effective strategy of treatment of NSCLC.
In addition, studies have identified that overexpression of HMGA2 also affects direct NF-κβ targets, most of which are involved in cytokine signaling such as IL6, IL1R1, IL11, IL15, and LIF1. Genes in CXC chemokine cluster in chromosome band 4q13.3, such as CXCL6 are highly downregulated but CXCL12 is upregulated [49]. Our study has identified the downregulation of IL6 (2.20-fold), IL1R1 (1.73-fold), CXCR2 (2.20-fold), CXCL3 (1.77-fold) and an upregulation of CXCL13 (2.78-fold) and CXCL14 (2.17-fold) (Supplementary Table S1). Downregulation of these chemokines are associated with the escape from host immune response by reducing the attraction of neutrophils, leukocytes, and activation of adaptive immune response. Moreover, upregulation of CXCL13 and CXCL14 are associated with a reduction in HLA-DRA and MHC-Class II, because of which cancer-specific antigens are not presented by dendritic cells and hence cancer cells evade immunosurveillance [52]. Thus, HMGA2 also contribute in evading host immune response and continue cell proliferation, providing a survival advantage to cancer cells.
Overall, this study has identified certain novel molecules (miR-20b-5p, CCNB1, HMGA2, and E2F7), all of which could possess a great therapeutic and prognostic potential, which has not yet been identified in NSCLC. Studies have reported that these targets are commonly involved in the deregulation of cell cycle because of genetic modifications, and escape from immunosurveillance, which are predominant hallmarks of cancer. Our study has also indicated their associative roles in cell cycle deregulation and immunosurveillance, thus these genes, when further explored could prove to be milestones in lung cancer research and could be very useful in designing personalized therapies against cancer. Moreover, it is a proposed mechanism, suggested by the network obtained from bioinformatics analysis, however, extensive in vitro and in vivo research is further required to validate the above processes.

Conclusions
In conclusion, for NSCLC, all the hub genes-BIRC5, CCNB1, KIF11, KIF20A, and KIF4A, TFs-HMGA2 and E2F7 are associated with poor OS and advanced stages of clinical disease. They may serve as assuring prognostic predictors and therapeutic targets. The upregulation of these genes may facilitate the activation of cell cycle-associated pathways to take part in the development of NSCLC. However, upregulation of CCNB1, along with CDK1, forms a complex that participates in pRb pathway of cell cycle regulation, wherein HMGA2 and E2F7 are actively involved. Moreover, genetic modification status of HMGA2 and E2F7 determines the involvement of epigenetic modifications in affecting miR-20b-5p mediated repression of these TFs, sequentially resulting in their upregulation. Upregulation of HMGA2 participates in several mechanisms such as E2F7-mediated repression of pro-apoptotic effects of E2F1, cell cycle delay until the DNA damage is repaired, cancer cell survival, and development of chemoresistance, that confer survival-specific and evasion of immune surveillance (via downregulation of IL6, IL1R1 and upregulation of chemokines such as CXCL13 and CXCL14) advantage to cancer cells. Our findings contribute in providing an innovative comprehension into NSCLC via miR-20b-5p/CCNB1/HMGA2/E2F7. Nonetheless, cancer is an outcome of complex molecular processes, and hence extensive experimental corroborations are needed to validate the cell cycle and immune system-related signaling for NSCLC.
Supplementary Materials: The following are available online at http://www.mdpi.com/2079-7737/9/8/201/s1. Table S1: List of meta-DEGs; Table S2: List of significantly enriched GO terms for the identified PPI cluster genes; Table S3: List of significantly enriched pathways for the identified PPI cluster genes; Figure S1: The color indicates the module and the dots indicate the genes within that module. After raising the MM to β, it is highly correlated with k.in for all the modules.