A Novel Strategy for Identifying NSCLC MicroRNA Biomarkers and Their Mechanism Analysis Based on a Brand-New CeRNA-Hub-FFL Network

Finding reliable miRNA markers and revealing their potential mechanisms will play an important role in the diagnosis and treatment of NSCLC. Most existing computational methods for identifying miRNA biomarkers only consider the expression variation of miRNAs or rely heavily on training sets. These deficiencies lead to high false-positive rates. The independent regulatory model is an important complement to traditional models of co-regulation and is more impervious to the dataset. In addition, previous studies of miRNA mechanisms in the development of non-small cell lung cancer (NSCLC) have mostly focused on the post-transcriptional level and did not distinguish between NSCLC subtypes. For the above problems, we improved mainly in two areas: miRNA identification based on both the NOG network and biological functions of miRNA target genes; and the construction of a 4-node directed competitive regulatory network to illustrate the mechanisms. NSCLC was classified as lung adenocarcinoma (LUAD) and lung squamous cell carcinoma (LUSC) in this work. One miRNA biomarker of LUAD (miR-708-5p) and four of LUSC (miR-183-5p, miR-140-5p, miR-766-5p, and miR-766-3p) were obtained. They were validated using literature and external datasets. The ceRNA-hub-FFL involving transcription factors (TFs), microRNAs (miRNAs), mRNAs, and long non-coding RNAs (lncRNAs) was constructed. There were multiple interactions among these components within the net at the transcriptional, post-transcriptional, and protein levels. New regulations were revealed by the network. Meanwhile, the network revealed the reasons for the previous conflicting conclusions on the roles of CD44, ACTB, and ITGB1 in NSCLC, and demonstrated the necessity of typing studies on NSCLC. The novel miRNA markers screening method and the 4-node directed competitive ceRNA-hub-FFL network constructed in this work can provide new ideas for screening tumor markers and understanding tumor development mechanisms in depth.


Introduction
Lung cancer (LC) is one of the primary causes of cancer-related deaths worldwide [1]. Approximately 80-85% of LC is classified as non-small cell lung cancer (NSCLC), and the majority of NSCLC comprises two major histological subtypes: lung adenocarcinoma (LUAD) and lung squamous cell carcinoma (LUSC) [2]. Although significant progress has been made in the diagnosis and treatment strategies for LC, the 60-month overall survival (OS) and 5-year survival rates for LC are only 16% to 18%, because 70% to 80% of LC patients are already at an advanced stage when initially diagnosed [3]. With early diagnosis and treatment, the 5-year survival rate for LC can approach 45-65% [4]. Therefore, it becomes increasingly important to diagnose LC as early as possible.
The diagnosis of LC relies mainly on imaging, cytology, and biochemical examination. Due to the various limitations, there are problems with a high false-positive rate, low sensitivity, and specificity [5]. The use of microRNAs (miRNAs) in diagnosis and prognosis has several advantages: they can be detected in circulating plasma [6]; they have higher Figure 1. MiRNA-mRNA regulatory network combines the independent regulatory model with the biological function of target genes. MiRNA4 is more likely to be a marker than other miRNAs in the figure because its targets include independent genes, TF genes, and important disease-related genes.
Biological regulation is complex and multilayered. The regulatory mechanisms of miRNAs reported before are mostly based on the post-transcriptional level, such as the competition between miRNAs and long non-coding RNAs (lncRNAs) or the interaction between miRNAs and target genes [20], and the obtained conclusions would be very onesided. By controlling the transcriptional activity of genes at the transcriptional level by binding to cis-regulatory elements of genes [21], TFs can direct cell division, cell growth and death, as well as the migration and organization of cells during embryonic development. TFs are essential for a series of biological processes. The expression of protein-coding genes is regulated at the transcriptional level (TFs) and the post-transcriptional level (miRNAs) [22]. There is growing evidence for cooperation and crosstalk between miRNAs and TFs, primarily to buffer gene expression and/or tune signals [23].
In addition, lncRNAs are associated with many biological processes in a variety of diseases, such as cancer and cardiovascular disease [24,25]. LncRNAs can compete with mRNAs for the same miRNA response element (MRE) and alleviate miRNA repression on target genes (mRNAs). Thus, lncRNAs, serving as competing endogenous RNAs (ceR-NAs), lead to the down-regulation of these miRNAs in the cytoplasm and have undeniable effects on gene regulation and diseases [26].
In a word, both TFs and lncRNAs affect the development and progression of diseases and are important regulators that cannot be ignored. A complicated network, which combines miRNAs, mRNAs, TFs, and lncRNAs, would deeply reveal the mechanisms of transcriptional and post-transcriptional regulation, while it has rarely been considered before. Moreover, the protein-protein interaction (PPI) network can reveal insights into biological regulatory pathways at the protein level. Investigating underlying biological meaning at the transcript level, post-transcript level to protein level will be more systematic.
In this study, we identify reliable miRNA biomarkers for the diagnosis of the two major subtypes of NSCLC (LUAD and LUSC), respectively. An independent regulatory model including NOG and TFP values of miRNAs and the biological significance of the target genes would be taken into consideration. Furthermore, to understand the potential molecular mechanisms of miRNAs in NSCLC at the systemic level, a 4-node ceRNA-hub-FFL network involving TFs, lncRNAs, genes, and miRNAs would be constructed based on the obtained miRNA biomarkers. In the constructed network, PPI would be used to identify functionally important target genes.

Figure 1.
MiRNA-mRNA regulatory network combines the independent regulatory model with the biological function of target genes. MiRNA4 is more likely to be a marker than other miRNAs in the figure because its targets include independent genes, TF genes, and important disease-related genes.
In a word, both TFs and lncRNAs affect the development and progression of diseases and are important regulators that cannot be ignored. A complicated network, which combines miRNAs, mRNAs, TFs, and lncRNAs, would deeply reveal the mechanisms of transcriptional and post-transcriptional regulation, while it has rarely been considered before. Moreover, the protein-protein interaction (PPI) network can reveal insights into biological regulatory pathways at the protein level. Investigating underlying biological meaning at the transcript level, post-transcript level to protein level will be more systematic.
In this study, we identify reliable miRNA biomarkers for the diagnosis of the two major subtypes of NSCLC (LUAD and LUSC), respectively. An independent regulatory model including NOG and TFP values of miRNAs and the biological significance of the target genes would be taken into consideration. Furthermore, to understand the potential molecular mechanisms of miRNAs in NSCLC at the systemic level, a 4-node ceRNA-hub-FFL network involving TFs, lncRNAs, genes, and miRNAs would be constructed based on the obtained miRNA biomarkers. In the constructed network, PPI would be used to identify functionally important target genes.

Data Pre-Processing and Differential Expression Analysis Results
The data of LUAD and LUSC were downloaded from the TCGA database. After data pre-processing, there were 16,498 mRNAs and 523 mature miRNAs for each sample in LUAD, and 16,727 mRNAs and 531 mature miRNAs in LUSC.
Differential expression analysis showed that 1438 mRNAs and 111 miRNAs were up-regulated, and 1084 mRNAs as well as 74 miRNAs were down-regulated in LUAD (Figure 2a

Obtaining miRNA Biomarkers Based on the Independent Regulatory Model
A de-redundant operation was carried out to the data of experimentally validated databases and the computer-predicted databases, so 319,574 and 9405 miRNA-mRNA interaction pairs were left, respectively. The human miRNA-mRNA network was constructed using 325,729 miRNA-mRNA interaction pairs. Then, a disease-specific miRNA-mRNA network was extracted from it using differential expression mRNAs and miRNAs. NOG and TFP values of miRNAs were calculated based on the network. The distribution of NOG values conformed to the Power distribution. Finally, ten candidate miRNAs in LUAD (Table S1) and nine in LUSC were obtained (Table S2).

•
Obtaining the most relevant gene sets for LC using WGCNA A WGCNA network was built using mRNA expression data of paired normal and tumor samples of LUAD and LUSC, respectively, and highly cancer-related gene modules were obtained from it ( Figure 3). In LUAD, the best soft-thresholding power β = 7 was chosen for constructing the co-expression network based on the fit and connectivity. Eighteen modules were identified using the dynamic cutting algorithm. The correlations between the modules were calculated, and the modules with a correlation bigger than 0.75 were combined into one module, resulting in 13 modules. Normal and tumor samples in LUAD were used as clinical traits, and the strongest negative correlation was detected between the Turquoise module with the tumor samples (r = −0.95, p = 1 × 10 −57 ). Figure 3a-d shows that the expression profile of genes in the Turquoise module was strongly correlated with LUAD sample traits. Therefore, the gene set in the Turquoise module was selected. Similarly, in LUSC, 12 modules were identified finally, and the strongest negative correlation was detected between the Brown module with the tumor samples (r = −0.96, p = 4 × 10 −53 ).  Brown module was strongly  correlated with LUSC sample traits, so the gene set in the Brown module was selected. between the Turquoise module with the tumor samples (r = −0.95, p = 1 × 10 −57 ). Figure 3ad shows that the expression profile of genes in the Turquoise module was strongly correlated with LUAD sample traits. Therefore, the gene set in the Turquoise module was selected. Similarly, in LUSC, 12 modules were identified finally, and the strongest negative correlation was detected between the Brown module with the tumor samples (r = −0.96, p = 4 × 10 −53 ). Figure 3e-h shows that the expression profile of genes in the Brown module was strongly correlated with LUSC sample traits, so the gene set in the Brown module was selected.  (b,f) The average connectivity of various soft-thresholding powers, and the y-axis is a decreasing function of the soft-thresholding power β(x-axis). (c,g) The gene modules associated with the clinical features of LUAD/LUSC. Each module contains the corresponding correlation and p-value, and the correlation coefficient represents the correlation between the gene modules and the clinical features; the color shade represents the correlation size, red indicates a positive correlation, and blue indicates a negative correlation. (d,h) Scatter plots of the clinical trait between module membership (x-axis) and gene significance (y-axis) in the Turquoise module. Module membership (x-axis) refers to the correlation between the genes and the module, with larger values indicating the greater correlation between the genes and the module, and gene significance refers to the correlation between the genes and the trait, with larger values indicating the stronger correlation between the genes and the trait (LUAD: r = 0.97, p < 1× 10 −200 ; LUSC: r = 0.99, p < 1× 10 −200 ).

•
Screening LC-related genes based on databases of oncogenes and tumor suppressor genes In order to determine whether the above-obtained genes are associated with LC, genelung-related enrichment analysis was performed using DAVID. A total of 711 genes in LUAD and 1560 genes in LUSC were selected. In addition, further screening was carried out based on whether the genes are oncogenes or tumor suppressor genes. Finally, 100 and 198 oncogenes or tumor suppressor genes were obtained in LUAD and LUSC, respectively (Tables S3 and S4).

•
Identifying miRNA biomarkers based on biological significance of genes A strongly correlated miRNA-mRNA network was constructed by the above NSCLCrelated important genes. The miRNAs in the network were the candidate biomarkers that can regulate important LC-related genes. Finally, 13 candidate miRNAs in LUAD (Table S5) and 51 candidate miRNAs in LUSC (Table S6) were obtained.

The Final miRNA Biomarkers for LC
Considering both the independent regulation model and the biological significance of the target genes, two and five candidate biomarkers were obtained in LUAD and LUSC, respectively (Tables 1 and 2). Then, ROC curves and AUC were used to identify potential biomarkers (AUC > 0.85). The final miRNA biomarker was miR-708-5p in LUAD. There were four miRNAs (miR-766-5p, miR-766-3p, miR-140-5p, and miR-183-5p) in LUSC. The above five miRNAs not only had strong independent regulatory abilities, but their target genes were also biologically important. Those genes are all oncogenes or tumor suppressor genes.   Table 3 shows the amount of literature associated with the obtained miRNAs in Web of Science and PubMed databases. The relevance of each miRNA and LC was reported, although the reports are very few. However, it is evident that in most of the literature, subtypes are not distinguished. Validation of the ability to distinguish LUAD from normal: Three GEO datasets, GSE102286, GSE63805, and GSE36681, were used to validate the potential to distinguish LUAD from normal based on the expression and AUC of miR-708-5p. Because there were two different sample preparation methods (FF and FFPE) in the GSE36681 dataset, the dataset was bisected. In all three datasets, the expression of miR-708-5p in LUAD was significantly higher than in normal samples, which was consistent with our results (Figure 4). The AUC values of miR-708-5p in the four datasets were 0. 70 • Validation using external datasets Validation of the ability to distinguish LUAD from normal: Three GEO datasets, GSE102286, GSE63805, and GSE36681, were used to validate the potential to distinguish LUAD from normal based on the expression and AUC of miR-708-5p. Because there were two different sample preparation methods (FF and FFPE) in the GSE36681 dataset, the dataset was bisected. In all three datasets, the expression of miR-708-5p in LUAD was significantly higher than in normal samples, which was consistent with our results (Figure 4). The AUC values of miR-708-5p in the four datasets were 0.70, 0.79, 0.79 and 0.71, respectively. Validation of the ability to distinguish LUSC from normal: The GEO datasets GSE15008 and GSE74190 were used to validate the potential to distinguish LUSC from normal based on the expression and AUC of biomarkers. MiR-766-5p could not be validated because there is no miR-766-5p in the sequencing platform microarray. As to miR-140-5p, there was no significant difference in expression in GSE15008 ( Figure 5), but it was significantly downregulated in LC samples than in normal tissues in GSE74190 ( Figure 6). The expression levels of other biomarkers were significantly up-regulated in LC samples in the two datasets.
The expression levels of other biomarkers were significantly up-regulated in LC samples in the two datasets. The AUC values of miR-140-5p, miR-183-5p, and miR-140-5p were 0.52, 0.85, and 0.61 using GSE15008, respectively, and were 0.96, 0.99, and 0.63 in GSE74190, respectively ( Figures 5 and 6). There are no more datasets available for validation in the GEO database.  Validation of the ability to distinguish NSCLC from normal tissue using a TCGA integration dataset without distinction of subtypes: Validation was performed using paired and unpaired samples, respectively. MiR-766-3p was not validated because of too many missing values. In both unpaired and paired samples, miR-183-5p and miR-708-5p were significantly over-expressed, and miR-766-5p and miR-140-5p were significantly low-expressed (Figures 7 and 8). The AUC values of miR-183-5p, miR-708-5p, miR-766-5p, and      Evaluation of the ability of biomarkers to distinguish between LUAD and LUSC: There are some pathological differences between LUAD and LUSC. However, only a few published papers have explored miRNA biomarkers for stratifying LC subtypes. Thus, we carried out this study.
In terms of expression level, miR-183-5p was not significantly different between the two subtypes. As to miR-766-3p, it could not be compared because of too many missing values in LUAD, but it was detectable in LUSC, indicating that the expression level of miR-766-3p was higher in LUSC than in LUAD. The expression levels of other miRNAs were significantly different between the two subtypes ( Figure 9).
Evaluation of the ability of biomarkers to distinguish between LUAD and LUSC: There are some pathological differences between LUAD and LUSC. However, only a few published papers have explored miRNA biomarkers for stratifying LC subtypes. Thus, we carried out this study.
In terms of expression level, miR-183-5p was not significantly different between the two subtypes. As to miR-766-3p, it could not be compared because of too many missing values in LUAD, but it was detectable in LUSC, indicating that the expression level of miR-766-3p was higher in LUSC than in LUAD. The expression levels of other miRNAs were significantly different between the two subtypes ( Figure 9).

Construction of FFL Network
MiRNAs and TFs can co-regulate shared target genes in feed-forward loops (FFLs) [27] in at least three situations: (1) TFs regulate miRNA expression in their promoter regions [28], (2) TFs and miRNAs can regulate each other by forming feedback loops (FBLs), and (3) both miRNAs and TFs can co-regulate shared target genes and form feed-forward loops (FFLs). Previous studies have explored the underlying molecular mechanisms of diseases or cellular conditions through the key regulators (miRNAs and TFs) and their interactions in these networks [29].
The interaction pairs of the four regulatory factors, TFs, miRNAs, genes, and lncRNAs, at the transcriptional and post-transcriptional levels, were obtained (Table S7). Then, TCGA expression data was used to obtain the direction of regulation.

Construction of FFL Network
MiRNAs and TFs can co-regulate shared target genes in feed-forward loops (FFLs) [27] in at least three situations: (1) TFs regulate miRNA expression in their promoter regions [28], (2) TFs and miRNAs can regulate each other by forming feedback loops (FBLs), and (3) both miRNAs and TFs can co-regulate shared target genes and form feed-forward loops (FFLs). Previous studies have explored the underlying molecular mechanisms of diseases or cellular conditions through the key regulators (miRNAs and TFs) and their interactions in these networks [29].
The interaction pairs of the four regulatory factors, TFs, miRNAs, genes, and lncRNAs, at the transcriptional and post-transcriptional levels, were obtained (Table S7). Then, TCGA expression data was used to obtain the direction of regulation.

Obtaining hub-FFL Network and Further Extracting ceRNA-hub-FFL Network
To obtain the key subnets from the above FFL network, a PPI net was constructed ( Figure S2) to identify miRNA-regulated target genes whose proteins play important functions, and these key genes were used to extract the hub-FFL subnetworks from the FFL network ( Figure S3).
LncRNAs in the cytoplasm can also interact with miRNAs as competitive endogenous RNAs and participate in the regulation of target gene expression, so the subcellular localization of the lncRNAs in the hub-FFL network was analyzed using both the lncLocatorand and iLoc-lncRNA (Figure 10), and the intersection was obtained. Finally, 8 lncRNAs were determined, and they were used to extract the ceRNA-hub-FFL network from the hub-FFL network.

Analysis of Potential Molecular Mechanisms of Lung Cancer Based on ceRNA-hub-FFL Network
The ceRNA-hub-FFL subnetwork of each miRNA biomarker was extracted to explore the potential mechanism in depth.

Analysis of Potential Molecular Mechanisms of Lung Cancer Based on ceRNA-hub-FFL Network
The ceRNA-hub-FFL subnetwork of each miRNA biomarker was extracted to explore the potential mechanism in depth.

Mechanism Revealed by miR-708-5p Related ceRNA-hub-FFL Regulatory Subnetwork
The miR-708-5p related ceRNA-hub-FFL subnetwork is shown in Figure 12a. As it can be seen, at the transcriptional level, the two TF genes, MAZ and EZH2, act as activators of the miR-708-5p and repressors of the gene CD44 and the four lncRNAs. At the posttranscriptional level, miR-708-5p acts as a repressor of CD44, and the four lncRNAs can competitively combine with miR-708-5p, thus weakening its repressive effect on CD44.  According to our differential expression analysis and the ceRNA-hub-FFL subnetwork, it can be inferred that miR-708-5p promoted the development and migration of LUAD by up-regulating itself and further inhibiting CD44 (Figure 12b). The previous study suggested that the down-regulation of CD44 (or its variants) was associated with an increased invasive and metastatic capacity [30].
To validate our conclusion, the protein and mRNA expression data of CD44 in normal tissues versus in LUAD were downloaded from the CPTAC, UALCAN [31], and TIMER [32] databases, respectively, and the data confirmed the down-regulation of CD44 at the protein and mRNA levels (Figure 12c,d,f). All the above confirmed the reliability of our results.
However, there are opposite views on the role of CD44 in cancer. These studies found that miR-708-5p directly targeted CD44 in prostate cancer; that the reduced expression level of miR-708-5p resulted in the increased expression level of CD44 and AKT2, resulting in the initiation, development, and progression of prostate cancer [33]; and that deletion or decrease in CD44 inhibited cancer stem cell properties, induced cell cycle arrest and apoptosis [34].
To uncover the cause of the two contradictory opinions, we studied the two major subtypes of NSCLC, LUAD, and LUSC, separately. MRNA level data from the UALCAN and TIMER databases showed that CD44 was significantly down-regulated in LUAD samples compared to normal samples, but the difference was not significant in LUSC (Figure 12d-f). The mRNA data of CD44 and miRNA data of miR-708 were downloaded from the DepMap (https://depmap.org/portal/, accessed on 19 September 2022) database, and the outliers were removed using z-score (|z-score| < 2). Finally, 55 cell lines of LUAD and 17 of LUSC were obtained. The correlation between miR-708 and CD44 was calculated. The result showed that, in most cell lines (44/55) of LUAD, miR-708 had a negative correlation with CD44 (r = −0.5926) (Figure 13a), but no correlation in LUSC (Figure 13b). However, there were a few cell lines (11/55) where the correlation was just the opposite. Due to the extreme lack of data on CD44 mutation-positive NSCLC cell lines and the precise staging information of these cell lines, the possible reason for the different expression of CD44 between different cell lines in LUAD needs to be further revealed. The above results showed that the expression of CD44 was different in the two subtypes. If the two subtypes are not distinguished, completely opposite conclusions may occur depending on the sample sizes of the two subtypes. This further illustrates the necessity of exploring the two subtypes separately.

Mechanism Revealed by miR-183-5p Related ceRNA-hub-FFL Regulatory Subnetworks
There are three types of FFL in the miR-183-5p related ceRNA-hub-FFL subnetworks ( Figure 14). As seen from the type II and III networks (Figure 14d,e), mutual inhibition exists between TF ZEB1 and miR-183-5p at the transcriptional and post-transcriptional levels, and the two factors regulate ITGB1 gene expression simultaneously. In the TF-meditated FFL (I-FFL) subnetwork (Figure 14a), some TFs activate miR-183-5p and correspondingly repress the two lncRNAs, leading to the suppression of the two genes ITGB1 and ACTB. While other TFs play the opposite roles, they finally activate the two genes. According to our differential expression analysis and the ceRNA-hub-FFL subnetwork, it can be inferred that miR-183-5p was significantly up-regulated in LUSC (Figure 15a), which targeted ACTB and ultimately led to the down-regulation of ACTB and ITGB1 in LUSC.
ACTB gene encodes β-actin. β-actin is a highly conserved cytoskeletal protein that commonly expresses and is essential for cell migration, mitosis, intracellular transport, and maintenance [35]. It has been considered an endogenous house-keeping gene and reference gene in cells and tissues for many years [36], which leads to its role in cancer being ignored. However, the emerging evidence suggests that ACTB expresses unstably and plays a key role in a variety of human diseases, particularly cancer [37]. Previous studies showed that the mRNA level of ACTB was down-regulated in esophageal cancer, colon cancer, and LUSC compared to normal tissues [38,39]. The same is also true for ITGB1. Some studies indicated that down-regulation of ITGB1 triggered lung disease and even cancer, such as colon cancer [40] and breast cancer [41]. These may be the mechanisms of miR-183-5p regulating the development of lung cancer.
To further validate the above conclusions, we analyzed the data with different data- In the TF-meditated FFL (I-FFL) subnetwork (Figure 14a), some TFs activate miR-183-5p and correspondingly repress the two lncRNAs, leading to the suppression of the two genes ITGB1 and ACTB. While other TFs play the opposite roles, they finally activate the two genes. According to our differential expression analysis and the ceRNA-hub-FFL subnetwork, it can be inferred that miR-183-5p was significantly up-regulated in LUSC (Figure 15a), which targeted ACTB and ultimately led to the down-regulation of ACTB and ITGB1 in LUSC.
ACTB gene encodes β-actin. β-actin is a highly conserved cytoskeletal protein that commonly expresses and is essential for cell migration, mitosis, intracellular transport, and maintenance [35]. It has been considered an endogenous house-keeping gene and reference gene in cells and tissues for many years [36], which leads to its role in cancer being ignored. However, the emerging evidence suggests that ACTB expresses unstably and plays a key role in a variety of human diseases, particularly cancer [37]. Previous studies showed that the mRNA level of ACTB was down-regulated in esophageal cancer, colon cancer, and LUSC compared to normal tissues [38,39]. The same is also true for ITGB1. Some studies indicated that down-regulation of ITGB1 triggered lung disease and even cancer, such as colon cancer [40] and breast cancer [41]. These may be the mechanisms of miR-183-5p regulating the development of lung cancer. To further validate the above conclusions, we analyzed the data with different databases at different levels. Using the protein and mRNA expression data of ACTB and ITGB1 from the CPTAC, UALCAN, and TIMER databases, the results showed down-regulated ACTB and ITGB1 in LUSC tissues compared to normal tissues (Figure 15b,c,e,g,h). The mRNA data of ACTB and ITGB1 and miRNA data of miR-183 were downloaded from the DepMap database, and the outliers were removed using z-score (|z-score| < 2). Finally, 55 cell lines of LUAD and 17 of LUSC were obtained. The correlations between miR-183 and ACTB, miR-183 and ITGB1 were calculated, respectively. The results showed that, in all 17 cell lines of LUSC, miR-183 had a negative correlation with ACTB (r = −0.5477), and with ITGB1 (r = −0.5203), respectively, (Figure 16a,c), but no correlation in all 55 cell lines of LUAD (Figure 16b,d). All the above data proved the reliability of the conclusion we obtained.  However, some other studies found the opposite results, suggesting that ACTB up-regulated in some cancers, such as esophageal cancer (ES) and NSCLC (not ty compared to normal samples [42,43], and a study showed that up-regulated ACTB associated with poor prognosis in LUAD [39]. These studies did not distinguish bet LUAD and LUSC. Similarly, different voices had emerged on the role of ITGB1 in tum that is up-regulated ITGB1 promoted the development of LC [44]. Some studies found down-regulated ITGB1 inhibited NSCLC [45], and the samples with up-regulated I in LUAD had worse overall survival [46]. To discern the correctness of the above results, we analyzed the mRNA expre data from the UALCAN and TIMER databases. The results showed that ACTB wa significantly different in LUAD in both UCLCAN and TIMER (Figure 15d,g), and I However, some other studies found the opposite results, suggesting that ACTB was up-regulated in some cancers, such as esophageal cancer (ES) and NSCLC (not typed) compared to normal samples [42,43], and a study showed that up-regulated ACTB was associated with poor prognosis in LUAD [39]. These studies did not distinguish between LUAD and LUSC. Similarly, different voices had emerged on the role of ITGB1 in tumors, that is up-regulated ITGB1 promoted the development of LC [44]. Some studies found that down-regulated ITGB1 inhibited NSCLC [45], and the samples with up-regulated ITGB1 in LUAD had worse overall survival [46].
To discern the correctness of the above results, we analyzed the mRNA expression data from the UALCAN and TIMER databases. The results showed that ACTB was not significantly different in LUAD in both UCLCAN and TIMER (Figure 15d,g), and ITGB1 was not significantly different in LUAD in TIMER (Figure 15h). These results further illustrated the necessity of studies for NSCLC subtypes. Unfortunately, ACTB has been considered as an endogenous house-keeping gene in cells and tissues and as a reference gene for quantitative experiments for many years [47]. The belief that it expresses stably in cells has led to its neglected role in cancer. Our results brought to light the unstable expression of ACTB in NSCLC, making the role of ACTB as a cancer reference gene challenging. The mechanism of ACTB action in NSCLC also needs to be further elucidated. Similarly, no studies have been seen on miR-183-5p targeting ITGB1 and thus affecting the development of NSCLC.

Mechanism Revealed by miR-766-5p Related ceRNA-hub-FFL Regulatory Subnetworks
There are three types of FFL in the miR-766-5p related ceRNA-hub-FFL subnetworks ( Figure 17). In the TF-mediated (I-FFL) network, most TFs (right) have repressive effects on miR-766-5p, with activating effects on CCNB1 and related four lncRNAs, while the other seven TFs act in the opposite way. for quantitative experiments for many years [47]. The belief that it expresses stably in cells has led to its neglected role in cancer. Our results brought to light the unstable expression of ACTB in NSCLC, making the role of ACTB as a cancer reference gene challenging. The mechanism of ACTB action in NSCLC also needs to be further elucidated. Similarly, no studies have been seen on miR-183-5p targeting ITGB1 and thus affecting the development of NSCLC.

Mechanism Revealed by miR-766-5p Related ceRNA-hub-FFL Regulatory Subnetworks
There are three types of FFL in the miR-766-5p related ceRNA-hub-FFL subnetworks ( Figure 17). In the TF-mediated (I-FFL) network, most TFs (right) have repressive effects on miR-766-5p, with activating effects on CCNB1 and related four lncRNAs, while the other seven TFs act in the opposite way.  In the miR-766-5p-mediated FFL (II-FFL), TFs MAZ, HDAC2, and KDM5B have activating effects on CCNB1 and lncRNAs. However, miR-766-5p represses the three TFs at the post-transcriptional level. Eventually, these interactions enhance the repressive effect of miR-766-5p on CCNB1. According to our differential expression analysis and the ceRNA-hub-FFL subnetwork, it can be inferred that miR-766-5p was significantly downregulated in LUSC (Figure 18a), and the enhanced inhibitory effect of TFs on miR-766-5p led to the up-regulation of the regulated CCNB1. A previous study has confirmed that the protein encoded by CCNB1 is a regulatory protein involved in mitosis, which has been demonstrated to be associated with the development of LC [48]. Reduced expression of CCNB1 can induce G2/M cell cycle arrest and promote apoptosis [49]. This may be the mechanism of miR-183-5p regulating the development of lung cancer.
tivating effects on CCNB1 and lncRNAs. However, miR-766-5p represses the three TFs at the post-transcriptional level. Eventually, these interactions enhance the repressive effect of miR-766-5p on CCNB1. According to our differential expression analysis and the ceRNA-hub-FFL subnetwork, it can be inferred that miR-766-5p was significantly downregulated in LUSC (Figure 18a), and the enhanced inhibitory effect of TFs on miR-766-5p led to the up-regulation of the regulated CCNB1. A previous study has confirmed that the protein encoded by CCNB1 is a regulatory protein involved in mitosis, which has been demonstrated to be associated with the development of LC [48]. Reduced expression of CCNB1 can induce G2/M cell cycle arrest and promote apoptosis [49]. This may be the mechanism of miR-183-5p regulating the development of lung cancer.
To validate our conclusion, the protein and mRNA expression data of CCNB1 in normal tissues versus in LUSC and LUAD were analyzed based on the CPTAC, UALCAN, and TIMER databases, respectively, and the data confirmed the significantly up-regulated expression of CCNB1 (Figure 18b-e). All the above proved the reliability of our results.  To validate our conclusion, the protein and mRNA expression data of CCNB1 in normal tissues versus in LUSC and LUAD were analyzed based on the CPTAC, UALCAN, and TIMER databases, respectively, and the data confirmed the significantly up-regulated expression of CCNB1 (Figure 18b-e). All the above proved the reliability of our results.

Mechanisms Revealed by miR-140-5p Related ceRNA-hub-FFL Regulatory Subnetworks
There are three types of FFL in the MiR-140-5p-related ceRNA-hub-FFL subnetworks ( Figure 19). To further validate the above conclusions, we analyzed the data in different databases at different levels. At the proteomic and mRNA levels of LUSC and LUAD in CPTAC, UALCAN, and TIMER databases, YWHAG, YWHAQ, CCNB1, and BIRC5 were significantly up-regulated (Figures 20b-j, 18b-e and 21a-c). There is no CDC6 proteome data in the CPTAC database, but in UALCAN and TIMER databases at the mRNA level, In the miRNA-mediated FFL (II-FFL) and composite FFL (III-FFL) regulatory networks, TF TRIM28 and miR-140-5p inhibit each other at the transcriptional level and post-transcriptional level. They form a feedback loop that co-regulates the expression of the target genes CDC6, YWHAG, CCNB1, and BRIC5. TF MYBL2 participates in the activation of the four target genes at the transcript level also.
Most of the key regulators exist in TF-mediated FFL (I-FFL) (Figure 19a,d). Five genes, YWHAG, YWHAQ, CCNB1, CDC6, and BIRC5, are regulated by activation or repression of different factors. At the transcriptional level, the TFs ETS1 and GATA1 activate miR-140-5p and correspondingly repress the lncRNA SNHG1, TUG1, and the target gene YWHAG. However, the effects of TCF3, KDM5B, EZH2, TRIM28, and HDAC2 are the opposite. As for YWHAQ, it is almost the same. Six TFs TCF3, TRIM28, KDM5B, HDAC2, HDAC1, and EZH2 have activating effects on gene CCNB1 and lncRNAs TUG1 and SNHG1 at the transcriptional level, repressive effects on miR-140-5p. The other five TFs NFIC, JUND, ZEB1, ELF1, and ETS1 have the opposite effect. CCNB1 is an oncogene. It is possible that six TFs play a dominant role in the repression of miR-140-5p, and thus low-expressed miR-140-5p in LUSC weakens the repressive effect on oncogene CCNB1. The same thing is true for CDC6 and oncogene BIRC5. According to our differential expression analysis and the I-FFL, it could be inferred that miR-140-5p was significantly down-regulated in LUSC (Figure 20a), so down-regulated miR-140-5p led to a diminished inhibitory effect on YWHAG, YWHAQ, CCNB1, BIRC5, and CDC6.   YWHAG and YWHAQ are members of the 14-3-3 protein family. 14-3-3 proteins have been demonstrated to contribute to the regulation of key cellular processes such as signal transduction, cell cycle control, apoptosis, and malignant transformation [50], especially many cellular processes associated with cancer development [51], and over-expression of YWHAG has been observed in a variety of tumors [52]. It was reported that YWHAG could inhibit the death of apoptotic cells and promote cell migration in breast cancer [53] and have clinical prognostic significance as an oncogene in advanced NSCLC [54]. CCNB1 and BIRC5 are all oncogenes. BIRC5 is a member of the inhibitors of apoptosis (IAP) gene family, which encodes negative regulatory proteins that prevent apoptotic cell death. Unlike other IAPs, BIRC5 expresses intensely in most tumors, including LUAD and LUSC [55], and it could serve as a predictive biomarker for NSCLC, especially for LUAD [56], but does not express or only expresses at low levels in most normally differentiated tissues [57], and down-regulated BIRC5 was positively associated with NSCLC survival time [58]. Cell division cycle 6 (CDC6) is an important regulator of DNA replication, and there is evidence that silencing of CDC6 may lead to G1 phase arrest [59]. Over-expression of CDC6 has been detected in several types of cancer, and up-regulation of CDC6 is associated with poor prognosis in cancer patients [60]. A study has confirmed that the expression level of CDC6 is significantly elevated in NSCLC tumor tissues [61]. This may be the mechanism of miR-140-5p regulating the development of LC.
To further validate the above conclusions, we analyzed the data in different databases at different levels. At the proteomic and mRNA levels of LUSC and LUAD in CPTAC, UALCAN, and TIMER databases, YWHAG, YWHAQ, CCNB1, and BIRC5 were significantly up-regulated (Figures 18b-e, 20b-j and 21a-c). There is no CDC6 proteome data in the CPTAC database, but in UALCAN and TIMER databases at the mRNA level, CDC6 was up-regulated in LUSC and LUAD compared to normal samples (Figures 20k,l and 21d). These all proved the reliability of our findings.

Discussion
The biology network can deepen our understanding of the intrinsic mechanisms of disease. Based on the network, the complex interactions within the organism can be fully considered from a systematic perspective.
Most studies aiming to explore diagnostic and prognostic miRNA markers are based on the coordinated relationships in the regulatory networks. They look for coordinated relationships in the many-to-many relationships between miRNAs and genes, find diseaserelated miRNA modules and identify markers from them eventually [17]. Disease-related miRNAs in the caner-miRNA modules enhance the common action on the co-regulated genes, i.e., synergistic effects [62]. However, according to the view of systematic biology [63], biomarkers are important indicators reflecting changes between different biological states, so biomarker identification should consider their effects on system stability. Therefore, a different model-an independent regulation model of miRNAs has been proposed. Independent regulation relationships are fragile. If the protein coded by the miRNA target gene is a key node in the network, the abnormal expression of the miRNA with independently regulatory ability to this gene will lead to large changes in this "fragile" structure and affect the stability of the biological system. In biological networks, the independent regulation model is an important complement to a synergistic effect and is more impervious to the dataset [15]. There are some meaningful studies based on this model [16,64], but this model does not consider the biological functions of the target genes. We argue that if the miRNA target gene is highly correlated with the disease, then the miRNA will be more directly and definitively involved in the progression of this disease. Therefore, this miRNA is more likely to become a marker for the disease. The identification of markers using a low-cost and highly accurate method is only the first step. It is more important to gain insight into the potential mechanisms of these markers in disease, which will provide guidance for finding new markers, therapeutic targets, and treatment regimens.
The involvement of miRNAs in the pathogenesis of NSCLC is extremely complex. Most previous studies have analyzed their regulatory mechanisms based only on the post-transcriptional level [65]. This would lead to very one-sided conclusions and even produce erroneous results. The regulation of eukaryotic gene expression can occur at many different hierarchies, including the genetic level, transcriptional level, and posttranscriptional level. Each hierarchy involves multiple factors. At the transcriptional level, TFs as important regulatory molecules can directly control the timing, location, and intensity of gene expression. They control the transcription of DNA into mRNA by combining with specific DNA sequences, and they can also bind to RNA produced by gene transcription and then control the transcription, localization, and stability of RNA. At the same time, many TFs are regulated by factors in the signal transduction pathway. LncRNAs are involved in regulating protein-coding genes at multi-levels, including epigenetic regulation, transcriptional regulation, and post-transcriptional regulation. LncRNAs and miRNAs can interact with each other by competing with some shared RNAs to participate in the expression regulation of target genes in the cytoplasm [66].
The above shows the importance and complexity of TFs and lncRNAs in biological regulatory networks. In short, TFs can regulate the expression of mRNA, miRNA, and lncRNA genes at the transcriptional level, and miRNAs can regulate mRNAs (TF mRNAs and non-TF mRNAs) and lncRNAs at the post-transcriptional level. Such a relationship between TFs and miRNAs can connect the transcriptional level and the post-transcriptional level [67]. Based on this bridging role of TFs, we constructed a 4-node directed competitive ceRNA-hub-FFL network based on miRNA markers in NSCLC for the first time, and the regulatory directions were determined in the network based on the correlation between expressions. Based on the constructed ceRNA-hub-FFL network, the mechanisms of the obtained markers were deeply explored, and we indeed got some novel conclusions and depicted a more comprehensive regulatory mechanism. The constructed network revealed for the first time that miR-708-5p affects the development of LUAD by regulating CD44, miR-183-5p affects the development of LUSC through regulating ITGB1 and ACTB, and miR-766-5p affects the development of LUSC through regulating CCNB1. These conclusions were validated using multiple databases. In addition, the constructed ceRNA-hub-FFL network provided a more comprehensive landscape of the different roles of lncRNAs and TFs in miRNA regulation of gene expression. Meanwhile, through the ceRNA-hub-FFL network, we clarified that the expression levels of CD44, ACTB, and ITGB1 are significantly different in LUAD and in LUSC. This is because multiple factors play different regulatory roles in gene expression, and the final results depend on the synergistic effect of various regulatory factors. The necessity of subtyping studies on NSCLC becomes evident.
The network can give us a more comprehensive and systematic understanding of the underlying molecular mechanisms of NSCLC. The constructed 4-node network can be deeply mined at the level of key subnetwork modules, FFL modules, and relations among TFs, miRNAs, genes, and lncRNAs, thus, the inherent laws of NSCLC can be explored at the systemic level. It is expected to find new regulatory pathways and rules from multiple angles and multiple layers, and it is also expected to identify potential oncogenes and drug targets of NSCLC [68,69].
However, another non-coding RNA, circular RNAs (circRNAs), can also act as a sponge to compete with miRNAs to bind mRNAs and indirectly regulate the expression of target genes [70]. A previous study has demonstrated the regulatory function of circR-NAs in tumor progression [71]. Therefore, our study is still deficient. The independent regulation model only reflects the regulatory relationship at the miRNA-mRNA level, and it would be more reasonable and valuable if the regulatory effects of lncRNAs and circRNAs on miRNAs and mRNAs are further considered, as well as to further examine the generality of this method. Furthermore, many factors in organisms are temporally and spatially regulated, so a multi-node multi-level dynamic network will reflect the mechanisms more accurately and fully.

•
Obtaining candidate biomarkers based on miRNA independent regulation model Differential miRNAs and mRNAs were used to extract disease-specific miRNA-mRNA networks from the human miRNA-mRNA network, and the NOG and TFP values were calculated based on the net. MiRNAs with significantly high NOG and TFP values were screened out with Wilcoxon signed rank test (threshold of NOG and TFP: p < 0.05). Because miRNAs with higher NOG values are more likely to be biomarkers [15], the median NOG values of the obtained miRNAs were calculated further, and miRNAs with NOG values bigger than the median were selected as candidate biomarkers.

Screening miRNA Biomarkers Based on the Biological Significance of Target Genes
A weighted gene co-expression network analysis (WGCNA) was built, and the key target gene set was obtained from it. Then, key LC-related genes, oncogenes, and tumor suppressor genes were screened out from these target genes enriched in lung tissue. The miRNA-mRNA target pairs of these key lung cancer-related genes were identified from the interactions obtained experimentally. The Pearson correlation coefficients between the target pairs were calculated, and the interaction pairs with high correlation were selected as the strongly correlated pairs. Finally, miRNAs in strongly correlated pairs were regarded as candidate biomarkers, which regulate key LC-related genes.

•
Obtaining the gene set most relevant to LC using WGCNA To obtain the gene modules highly relevant to cancer, mRNA expression data of paired normal and tumor samples of LUAD and LUSC was obtained from the TCGA database, respectively. The top 75% of variant genes were selected by a robust method of median absolute deviation (MAD) [79]. A WGCNA was constructed and Module Eigengene (ME) dissimilarities were calculated using the R package. The association between ME and LC sample phenotypes was assessed by Pearson correlation values. The p-value < 0.05 is significant. The module that was highly correlated with the LC sample phenotypes was selected as the gene module most relevant to LC, and it would be used for subsequent analysis.

•
Screening LC-related genes based on databases of oncogenes and tumor suppressor genes To screen out the genes expressed in lung tissue from the gene set obtained above, enrichment analysis was performed using DAVID [80] (https://david.ncifcrf.gov/, accessed on 20 June 2021). The lung-enriched genes were further screened in the oncogene database TSGene [81] and tumor suppressor genes database on Gene [82]. Finally, oncogenes or tumor suppressor genes that express in lung tissue and are highly associated with LC were obtained.

•
Construction of the strong relationship pairs between LC-related genes and miRNAs To identify the miRNAs that regulate the above-obtained oncogenes or tumor suppressor genes, strong interactions between miRNAs and target genes must be found. The miRNA-mRNA pairs from the four experimentally validated databases (in "Construction of human miRNA-mRNA regulatory network") were selected and Pearson correlation coefficients between the target pairs were calculated based on TCGA expression data. The threshold of strong relationship pairs: r < −0.3, p-value < 0.05.

The Final miRNA Biomarkers for LC
Candidate miRNA biomarkers were obtained from the intersection of the two methods. Then the final biomarkers were identified by the ROC curve (AUC > 0.85).

•
Validation based on the existing literature The biomarkers were submitted to the PubMed database. Literature screening criteria for a certain miRNA: "miR-XXX" [All Fields] AND "humans" [MeSH Terms]. Literature screening criteria for whether miRNA is associated with cancer: "Neoplasms" [MeSH Terms] AND "miR-XXX" [All Fields] AND "humans" [MeSH Terms]. Literature screening criteria for the association of a miRNA with LC: "Lung Neoplasms" [MeSH Terms] AND "miR-XXX" [All Fields] AND "humans" [MeSH Terms]. The literature related to NSCLC was searched in the Web of Science database. Literature screening criteria: TS = (("LUAD" OR "LUSC" OR "NSCLC" OR "non-small cell lung cancer" OR "non-small-cell lung cancer" OR "lung squamous cell carcinomas" OR "lung adenocarcinoma" OR "non-small cell lung carcinomas") AND ("miR-XX" OR "miR XX" OR "microRNA XX" OR "microRNA-XX")). The screening covered literature published up to 16 September 2021.

•
Validation based on external datasets The ability to distinguish LUAD/LUSC from normal was validated using the expression and ROC of external GEO datasets GSE102286, GSE63805, GSE36681, and GSE74190, GSE15008, respectively. The ability to distinguish NSCLC from normal tissue of the biomarkers was verified using the unpaired and paired data of normal and tumor samples from mixed subtypes in the TCGA database. The ability to distinguish LUAD from LUSC was tested using the expression data and ROC of LUAD and LUSC tumor samples in the TCGA database. p-value < 0.05 was considered statistically significant. First, a TF dataset based on existing databases was established: TFs with a quality rating reaching C and above were downloaded from HOCOMOCO V11 database [83]; TFs were downloaded from HumanTFDB (hust.edu.cn, accessed on 9 October 2021), a sub-database of AnimalTFDB 3.0 [84]; TFs were extracted from the following databases: TRRUST [85], ENCODE [86], TRANSFAC [87], ChIP-X Enrichment Analysis [88], Trans-miR [89], and HTRIdb [90]. Finally, the above TFs were merged as the TF dataset for this study.
The miRNA-TF relationship pairs were obtained based on the TF dataset constructed above. Then, the interaction pairs between the miRNA biomarkers and genes were extracted from the human miRNA-mRNA network constructed in "Construction of human miRNA-mRNA regulatory network"; miRNA-lncRNA relationship pairs were obtained from the intersection of starBase 3.0 database [91] and lncBase database [92]. The interaction pairs of TFs and targets, including TF-miRNA, TF-gene, and TF-lncRNA, were extracted from ENCODE, CHIP-X, HTRIdb, TRANSFAC, and TRRUST databases. Furthermore, TF-miRNA pairs were additionally extracted from puTmiR database [93].
Correlations and interaction directions between all interaction pairs were calculated based on TCGA expression data, and significantly correlated interaction pairs were retained. For miRNA target pairs: Pearson correlation coefficient r ≤ −0.15, p < 0.05, for TFs target pairs: Pearson correlation coefficient |r| ≥ 0.3, p < 0.05.
To reduce false positives, the intersection of the relationship pairs obtained from the databases and the expression was chosen as the final relationship pair.

Obtaining hub-FFL Network
Cytohubba's "degree" and MCC methods were used to extract the top 5% of n as the key nodes to construct the FFL key subnetworks [94]. Further, considering th portance of the genes co-regulated by TFs and miRNAs, the protein products of miR regulated genes were extracted to construct a PPI network. The genes of the top 10 nodes extracted from the PPI network were used to extract the hub-FFL network from FFL key subnetworks.

Obtaining hub-FFL Network
Cytohubba's "degree" and MCC methods were used to extract the top 5% of nodes as the key nodes to construct the FFL key subnetworks [94]. Further, considering the importance of the genes co-regulated by TFs and miRNAs, the protein products of miRNAregulated genes were extracted to construct a PPI network. The genes of the top 10% of nodes extracted from the PPI network were used to extract the hub-FFL network from the FFL key subnetworks.

Conclusions
A novel miRNA markers screening method and a 4-node directed competitive ceRNAhub-FFL network were constructed in this work. Finally, miRNA biomarkers of LUAD and LUSC were obtained and validated, respectively, and the new target genes regulated by these biomarkers were revealed to be associated with LUAD and LUSC based on the constructed ceRNA-hub-FFL network. Moreover, we clarified that the expression levels of these target genes are significantly different in LUAD and in LUSC.