Identification of Candidate lncRNA and Pseudogene Biomarkers Associated with Carbon-Nanotube-Induced Malignant Transformation of Lung Cells and Prediction of Potential Preventive Drugs

Mounting evidence has linked carbon nanotube (CNT) exposure with malignant transformation of lungs. Long non-coding RNAs (lncRNAs) and pseudogenes are important regulators to mediate the pathogenesis of diseases, representing potential biomarkers for surveillance of lung carcinogenesis in workers exposed to CNTs and possible targets to develop preventive strategies. The aim of this study was to screen crucial lncRNAs and pseudogenes and predict preventive drugs. GSE41178 (small airway epithelial cells exposed to single- or multi-walled CNTs or dispersant control) and GSE56104 (lung epithelial cells exposed to single-walled CNTs or dispersant control) datasets were downloaded from the Gene Expression Omnibus database. Weighted correlation network analysis was performed for these two datasets, and the turquoise module was preserved and associated with CNT-induced malignant phenotypes. In total, 24 lncRNAs and 112 pseudogenes in this module were identified as differentially expressed in CNT-exposed cells compared with controls. Four lncRNAs (MEG3, ARHGAP5-AS1, LINC00174 and PVT1) and five pseudogenes (MT1JP, MT1L, RPL23AP64, ZNF826P and TMEM198B) were predicted to function by competing endogenous RNA (MEG3/RPL23AP64-hsa-miR-942-5p-CPEB2/PHF21A/BAMBI; ZNF826P-hsa-miR-23a-3p-SYNGAP1, TMEM198B-hsa-miR-15b-5p-SYNGAP1/CLU; PVT1-hsa-miR-423-5p-PSME3) or co-expression (MEG3/MT1L/ZNF826P/MT1JP-ATM; ARHGAP5-AS1-TMED10, LINC00174-NEDD4L, ARHGAP5-AS1/PVT1-NIP7; MT1L/MT1JP-SYNGAP1; MT1L/MT1JP-CLU) mechanisms. The expression levels and prognosis of all genes in the above interaction pairs were validated using lung cancer patient samples. The receiver operating characteristic curve analysis showed the combination of four lncRNAs, five pseudogenes or lncRNAs + pseudogenes were all effective for predicting lung cancer (accuracy >0.8). The comparative toxicogenomics database suggested schizandrin A, folic acid, zinc or gamma-linolenic acid may be preventive drugs by reversing the expression levels of lncRNAs or pseudogenes. In conclusion, this study highlights lncRNAs and pseudogenes as candidate diagnostic biomarkers and drug targets for CNT-induced lung cancer.


Introduction
Carbon nanotubes (CNTs), which are structurally composed of nano-sized hollow molecular cylinders rolled up from single (single-walled CNTs, SWCNTs) or multiple layers (multi-walled CNTs, MWCNTs) of graphene sheets, have become widely utilized in industrial and biomedical fields due to their unique properties [1][2][3][4][5]. However, their wide range of applications and mass production may lead to a concern that these CNTs can potentially enter the human body (especially for occupational workers) and cause adverse

Weighted Correlation Network Analysis of lncRNAs and Pseudogenes
The weighted correlation network analysis (WGCNA) package in R (v1.61; https:// cran.r-project.org/web/packages/WGCNA/index.html; accessed on 7 September 2021) [23] was employed to screen crucial lncRNAs and pseudogenes associated with CNT-induced phenotypic traits. The data of all overlapped lncRNAs and pseudogenes between GSE41178 and GSE56104 datasets were used to build the co-expression network. GSE41178 was set as the training dataset, and GSE56104 was set as the testing dataset. The filtering principle of the optimal soft threshold power (β) was to make the network have the scale-free topology characteristics. The weighted adjacency matrix was transformed into a topological overlap matrix to construct a dendrogram. The dynamic tree cut method was applied to detect gene modules with a minModuleSize of 50 and a cut height of 0.995. The preservation of identified modules was analyzed using the modulePreservation function, with a Z summary score > 2-10 defined as moderately preserved. The module eigengenes (ME) were calculated to represent the overall expression level of each module. The correlation between ME and the disease status was investigated in each module. If the correlation coefficient, r, was >0.5 and the p-value was <0.05, modules were considered crucial. Furthermore, the gene significance (GS) was measured to represent the correlation between the expression levels of genes in each module and each trait; module membership (MM) was measured to correlate the ME with gene expression values to determine the importance of a gene in a given module. LncRNAs and pseudogenes with |GS| > 0.5 and |MM| > 0.5 in crucial modules were considered hub genes.

Validation of the Expression Levels and Prognosis of DE-lncRNAs and DE-Pseudogenes in Lung Cancer
To evaluate whether these hub DE-lncRNAs and DE-pseudogenes were associated with initiation of lung cancer, we validated their expression levels using Gene Expression Profiling Interactive Analysis (GEPIA, http://gepia.cancer-pku.cn; accessed on 12 September 2021), a newly developed interactive web server for analyzing the RNA sequencing expression data of 9736 tumors and 8587 normal samples from the Cancer Genome Atlas (TCGA) and the Genotype-Tissue Expression (GTEx) project [25]. Lung adenocarcinoma (LUAD) and squamous cell carcinoma (LUSC) samples were selected as the disease group; whereas match TCGA normal and GTEx data or match TCGA normal data were selected as the control group. |log 2 FC| > 0.5 and p-value < 0.05 were set as the cut-off point.
To evaluate whether these hub DE-lncRNAs and DE-pseudogenes were associated with prognostic outcomes of lung cancer patients, the Kaplan-Meier plotting (http://www. kmplot.com/analysis/index.php?p=background; accessed on 12 September 2021) [26] was performed. Kaplan-Meier plots allow for assessment of the effects of DE-lncRNAs and DE-pseudogenes (included in mRNA data) on survival using the GeneChip and RNA-seq data of GEO, the European Genome-Phenome Archive and TCGA databases. The overall survival (OS) and recurrence-free survival (RFS) was assessed by RNA-seq data analysis, whereas OS, first-progression survival (FPS) and post-progression survival (PPS) were assessed in GeneChip data analysis. Patients were divided into two groups by "autoselect best cut-off". Hazard ratios (HRs) with 95% confidence intervals (CIs) and log-rank p-values were automatically calculated. A log-rank test p-value < 0.05 was considered statistically significant. HR < 1 indicated a better survival rate for the high-expression group; HR > 1 indicated a poor survival rate for the high-expression group.

Construction of lncRNA-and Pseudogene-Related ceRNA Networks
The DE-lncRNAs and DE-pseudogenes with validated expression and survival associations in lung cancer data were used to construct the ceRNA networks. The interactions between DE-lncRNAs/DE-pseudogenes and miRNAs were predicted using the starBase database (v2.0; http://starbase.sysu.edu.cn/; accessed on 26 September 2021) [27] with the cut-off point of Ago CLIP-seq Data ≥ 3. Then, miRNAs that interacted with lncRNAs/pseudogenes were used to predict their target mRNAs by the starBase database, with the cut-off point of predicting program ≥ 3 and CLIP data > 3. The final lncRNA/pseudogene-miRNA-mRNA interaction pairs were confirmed after comparing the predicted mRNAs and DE-mRNAs. Only DE-mRNAs that had expression trends simi-lar to those of lncRNAs or pseudogenes were retained. The ceRNA network was visualized in Cytoscape (v3.6.1; www.cytoscape.org/; accessed on 26 September 2021).

Construction of lncRNA-and Pseudogene-mRNA Co-Expression Networks
The DE-lncRNAs and DE-pseudogenes with validated expression and survival associations in lung cancer data were used to construct the co-expression networks. The Pearson correlation coefficient (PCC) was calculated between DE-lncRNAs/DE-pseudogenes and DE-mRNAs. Meaningful correlation pairs were defined as PCC > 0.9 and p-value < 0.05. The co-expression network was visualized using Cytoscape.

Construction of Protein-Protein Interaction (PPI) Networks
To further screen hub DE-mRNAs from ceRNA and co-expression networks, we respectively imported DE-mRNAs of lncRNA-and pseudogene-related networks into the Search Tool for the Retrieval of Interacting Genes (STRING; v10.0; http://stringdb.org/; accessed on 26 September 2021) database [28] to obtain the PPI pairs. PPI pairs with a combined score > 0.4 were considered significant. Furthermore, seven topological characteristics of each protein in the PPI network were calculated using the CytoNCA plugin in Cytoscape software (http://apps.cytoscape.org/apps/cytonca; accessed on 26 September 2021) [29], including degree, betweenness, closeness, eigenvector, local average connectivity, information and subgragh centrality. The proteins ranked in the top 60 of each topological characteristic were suggested to be hub DE-mRNAs.

Validation of the Expression Levels and Survival Associations of DE-mRNAs in Lung Cancer
The mRNA expression levels of hub DE-mRNAs were confirmed using the GEPIA database as lncRNAs and pseudogenes. The protein expression levels of hub DE-mRNAs were validated using UALCAN (http://ualcan.path.uab.edu/index.html; accessed on 26 September 2021) (in which proteomic datasets of LUAD and normal controls from the Clinical Proteomic Tumor Analysis Consortium (CPTAC) were provided; p-value < 0.05 was set as the significant threshold value) and the Human Protein Atlas (HPA; v20.1, https: //www.proteinatlas.org; accessed on 27 September 2021) (in which immunohistochemistry results for LUAD and LUSC tissues were available). The associations of hub DE-mRNAs with OS, RFS, FPS and PPS were analyzed using the Kaplan-Meier plotter based on the sequencing and chip data.

Validation of the Expression Levels and Survival Associations of miRNAs in Lung Cancer
The expression levels of miRNAs associated with hub DE-lncRNAs and DE-mRNAs were confirmed using the UALCAN database (http://ualcan.path.uab.edu/index.html; accessed on 6 October 2021) based on the TCGA data of LUAD and LUSC patient samples. Furthermore, Ventura et al. performed an miRNA expression profile analysis of A549 cells exposed to MWCNTs (MWCNT-7, Mitsui & Company, Ibaraki, Japan) and controls [30]. The differentially expressed miRNAs identified in the study of Ventura et al. were also compared with ours to confirm the expression levels of crucial miRNAs induced by CNTs. The associations of miRNAs with OS and RFS were identified using the Kaplan-Meier plotter based on the sequencing data.

Function Enrichment Analysis
To understand potential functions of mRNAs regulated by DE-lncRNAs and DEpseudogenes, the online Database for Annotation, Visualization and Integrated Discovery (DAVID) (v6.8; http://david.abcc.ncifcrf.gov; accessed on 12 October 2021) [31] was searched, after which gene ontology (GO) biological process, molecular function terms and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways were obtained at a significance level of p-value < 0.05. The "Goplot" package in R was used to draw the chord plot for hub DE-mRNAs.

Diagnostic Analysis of Hub lncRNAs and Pseudogenes
The mRNA sequencing data of LUAD/LUSC patients and normal controls were downloaded from the TCGA database. The receiver operating characteristic (ROC) curve analysis was performed using the MedCalc program (v9.3; MedCalc, Mariakerke, Belgium) to assess the diagnostic value of hub lncRNAs and pseudogenes in distinguishing lung cancer from normal controls. The area under the ROC curve (AUC), sensitivity and specificity were calculated. The genes with AUC > 0.5 were considered to have diagnostic significance.

Prediction of Small-Molecule Drugs That Regulated Hub lncRNAs and Pseudogenes
To predict potential drugs that may prevent and treat CNT-induced lung cancer, the Comparative Toxicogenomics Database (CTD, http://ctdbase.org; accessed on 12 October 2021) [32] was searched to obtain the chemical interactions of hub lncRNAs and pseudogenes associated with CNT exposure. Only chemical drugs that reversed the expression levels of lncRNAs and pseudogenes and involved in cancer initiation and development were screened (that is, if the lncRNA was upregulated, we selected the drugs that decreased its expression levels). The lncRNA/pseudogene-chemical interaction network was visualized in Cytoscape.

Identification of Key Modules Related to CNT-Induced Malignant Transformation of Lung Cells
After Entrez Gene annotation, the GSE41178 dataset was found to include 995 lncRNAs (726) and pseudogenes (269), whereas the GSE56104 dataset included 1051 lncRNAs (305) and pseudogenes (746). A total of 812 lncRNAs and pseudogenes were shared between the GSE41178 and GSE56104 datasets. Additionally, the expression levels (r = 0.87, p < 1 × 10 200 ) and connectivity (r = 0.13, p = 2 × 10 4 were significantly positively correlated between these two datasets. Thus, these shared 812 lncRNAs and pseudogenes were used for WGCNA analysis. A β value of 13 was selected to satisfy the requirements of the scale-free network distribution ( Figure 1A,B). By calculating the scale-free topology fitting index, the value of R square was confirmed to reach 0.93 ( Figure 1C,D). After a dynamic tree cut analysis, four modules were discovered using the GSE41178 training dataset (Figure 2A). These modules were also generated after analysis with the GSE56104 testing dataset ( Figure 2B). Among them, only the turquoise module was moderately preserved (Z summary score = 4.21; Figure 2C). The module-trait heatmap revealed that the eigengenes of the turquoise module were strongly correlated with disease status (r = −0.94, p = 1 × 10 8 ; Figure 2D). The correlation between GS for disease status and MM in the turquoise module was also significant (r = 0.83, p = 4.61 × 10 94 ; Figure 2E). These findings indicated that 66 lncRNAs and 299 pseudogenes in this module were highly correlated with CNT-induced malignant transformation of lung cells. Among them, 57 lncRNAs and 267 pseudogenes had |GS| > 0.5 and |MM| > 0.5. Thus, they were considered to be hub lncRNAs and pseudogenes and chosen for further analyses.

Identification of Differential Genes in CNT-Induced Malignant Transformation of Lung Cells
Using theLIMMA method, a total of 103 lncRNAs (33 upregulated, 70 downregulated), 289 pseudogenes (86 upregulated, 203 downregulated) and 5136 mRNAs (2961 upregulated, 2175 downregulated) were identified to be differentially expressed between SWCNT-exposed and control-treated SAECs. A total of 99 lncRNAs (27 upregulated Figure S1A,B). Furthermore, we also compared these DE-lncRNAs and DE-pseudogenes with hub lncRNAs and pseudogenes identified in WGCNA analysis. The results showed that 24 lncRNAs and 112 pseudogenes were overlapped ( Figure S1C). These findings suggest that these 24 DE-lncRNAs and 112 DE-pseudogenes may be particularly important lncRNAs and pseudogenes associated with CNT-induced malignant transformation of lung cells.

Validation of the Expression Levels and Prognosis of lncRNAs and Pseudogenes in Tissues of Lung Cancer Patients
Under the threshold of log 2 FC > 0.5 and p-value < 0.5, the GEPIA database confirmed that PVT1 was significantly upregulated in both LUAD and LUSC samples compared with controls; ARHGAP5-AS1, RBM12B-AS1, LOC644656 and SNHG17 were only more highly expressed in LUSC samples compared to controls; MEG3, LINC00174, WDFY3-AS2 and EIF3J-DT were significantly downregulated in both LUAD and LUSC samples compared with controls; LINC00863 was only had lower expression in LUSC samples compared to controls (Figure 4). The expression trend was in line with the results of these lncRNAs identified in the GSE41178 dataset (Table 1). Kaplan-Meier plotter analysis demonstrated that patients with high expression levels of PVT1, ARHGAP5-AS1 and LOC644656 possessed a shorter survival rate (HR >1), whereas high expression levels of MEG3, LINC00174, WDFY3-AS2 and LINC00863 were associated with excellent prognostic outcomes (HR< 1) ( Figure 5; Table S1). The prognostic results of these lncRNAs were in accordance with our expectation according to their expression trends. Likewise, GEPIA and the Kaplan-Meier plotter database were used to confirm the expression levels and prognosis of pseudogenes, respectively. As a result, MT1JP, MT1L, RPL23AP64 and TMEM198B were found to be significantly downregulated in both LUAD and LUSC samples compared with controls; ZNF826P was downregulated and EP400P1 was upregulated in LUSC samples compared with controls ( Figure 4), which was in line with the results obtained in the GSE41178 dataset (Table 1). Among these six pseudogenes, five (HR <1 in all pseudogenes) were significantly associated with prognostic outcomes of lung cancer patients ( Figure 6). These findings suggest that these seven lncRNAs and five pseudogenes may be potentially important genes associated with CNT-induced initiation and progression of lung cancer, which were used for the following ceRNA and co-expression network analyses.

Validation of the Expression Levels and Prognosis of lncRNAs and Pseudogenes in Tissues of Lung Cancer Patients
Under the threshold of log2FC > 0.5 and p-value < 0.5, the GEPIA database confirmed that PVT1 was significantly upregulated in both LUAD and LUSC samples compared with controls; ARHGAP5-AS1, RBM12B-AS1, LOC644656 and SNHG17 were only more highly expressed in LUSC samples compared to controls; MEG3, LINC00174, WDFY3-AS2 and EIF3J-DT were significantly downregulated in both LUAD and LUSC samples compared with controls; LINC00863 was only had lower expression in LUSC samples compared to controls ( Figure 4). The expression trend was in line with the results of these lncRNAs identified in the GSE41178 dataset (Table 1). Kaplan-Meier plotter analysis demonstrated that patients with high expression levels of PVT1, ARHGAP5-AS1 and LOC644656 possessed a shorter survival rate (HR >1), whereas high expression levels of MEG3, LINC00174, WDFY3-AS2 and LINC00863 were associated with excellent prognostic outcomes (HR< 1) ( Figure 5; Table S1). The prognostic results of these lncRNAs were in accordance with our expectation according to their expression trends. Likewise, GEPIA and the Kaplan-Meier plotter database were used to confirm the expression levels and prognosis of pseudogenes, respectively. As a result, MT1JP, MT1L, RPL23AP64 and TMEM198B were found to be significantly downregulated in both LUAD and LUSC samples compared with controls; ZNF826P was downregulated and EP400P1 was upregulated in LUSC samples compared with controls ( Figure 4), which was in line with the results obtained in the GSE41178 dataset (Table 1). Among these six pseudogenes, five (HR <1 in all pseudogenes) were significantly associated with prognostic outcomes of lung cancer patients ( Figure 6). These findings suggest that these seven lncRNAs and five pseudogenes may be potentially important genes associated with CNT-induced initiation and progression of lung cancer, which were used for the following ceRNA and co-expression network analyses. Heatmap of the expression levels of all differentially expressed lncRNAs, pseudogenes and mRNAs between CNT-exposed and control SAECs. (A) The difference between SWCNT-exposed and control SAECs; (B) the difference between MWCNT-exposed and control SAECs. SWCNT, single-walled carbon nanotube; MWCNT, multiple-walled carbon nanotube; red, high expression; blue, low expression.

Establishment of lncRNA-and Pseudogene-Based ceRNA Regulatory Networks
The starBase database predicted that five lncRNAs (LINC00174, LINC00863, MEG3, ARHGAP5-AS1 and PVT1) could interact with 41 miRNAs, and among these 41 miRNAs, 32 could target 3429 mRNAs. Integration of these lncRNA-mRNA and miRNA-mRNA interaction pairs and selection of the intersection of predicted mRNAs with DE-mRNAs showed that upregulated PVT1 could interact with 15 miRNAs to lead to the upregulation of 376 DE-mRNAs, whereas downregulated LINC00174, LINC00863 and MEG3 could interact with 15 miRNAs to result in the downregulation of 88 DE-mRNAs. Thus, these four DE-lncRNAs-30, miRNAs-464 and DE-mRNAs were used to establish an lncRNArelated ceRNA network (Table S2).

Establishment of PPI Networks for lncRNA-and Pseudogene-Related Target Genes
To screen crucial target mRNAs regulated by lncRNAs and pseudogenes, we attempted to explore the interactions of mRNAs in the ceRNA and co-expression networks by using the STRING database. As a result, 2527 PPI pairs with a combined score > 0.4, were obtained for target mRNAs of lncRNAs, which were constructed by 685 genes, whereas 286 genes and 513 PPI pairs were predicted for target mRNAs of pseudogenes (Table S4). The hub genes were selected according to the top 60 genes ranked in seven topological characteristics, which led to 112 genes screened from lncRNA-( Table 2) and pseudogene-related (Table 3) PPI pairs, respectively.

Function Enrichment Analysis
All genes in PPI networks of lncRNAs and pseudogenes were uploaded into th DAVID database to predict their potential functions. A total of 125 GO biological process 43 molecular function terms and 23 KEGG pathways were enriched (Table S6). Th

Diagnostic Values of Hub lncRNAs and Pseudogenes
ROC analysis was performed on these four lncRNAs (MEG3, ARHGAP5-AS1, LINC00174 and PVT1) and five pseudogenes (MT1JP, MT1L, RPL23AP64, ZNF826P and TMEM198B) to explore their diagnostic values. The results showed that the AUCs of all single lncRNAs and pseudogenes were >0.5, indicating they had diagnostic significance in distinguishing lung cancer from normal controls. PVT1 and MT1JP may be especially effective because their AUCs were larger than 0.85 in the analysis of LUAD and LUSC samples ( Figure 8A,B). Furthermore, the combination of all lncRNAs, pseudogenes and lncRNAs + pseudogenes was shown to further increase the AUC, sensitivity or specificity. Thus, this gene combination should be considered as a promising diagnostic biomarker ( Figure 8A,B; Table 4). LUAD, lung adenocarcinoma; LUSC, lung squamous cell carcinoma; AUC, the area under the receiver operating characteristic curve.

Prediction of Small-Molecule Drugs That Regulated Hub lncRNAs and Pseudogenes
CTD analysis identified a series of drugs that reversed the expression levels of hub lncRNAs and pseudogenes in cancer ( Figure 8C). Among them, schizandrin A (targeted MEG3), folic acid (targeted MEG3 and TMEM198B), zinc (targeted MT1JP and MT1L) and gamma-linolenic acid (targeted MT1L) may be candidate drugs to prevent the development of lung cancer for CNT occupational workers because they were components of traditional Chinese medicine or OTC health products.

Prediction of Small-Molecule Drugs That Regulated Hub lncRNAs and Pseudogenes
CTD analysis identified a series of drugs that reversed the expression levels of hub lncRNAs and pseudogenes in cancer ( Figure 8C). Among them, schizandrin A (targeted MEG3), folic acid (targeted MEG3 and TMEM198B), zinc (targeted MT1JP and MT1L) and gamma-linolenic acid (targeted MT1L) may be candidate drugs to prevent the development of lung cancer for CNT occupational workers because they were components of traditional Chinese medicine or OTC health products.

Discussion
In recent decades, mounting evidence has supported the idea that environmental pollutants (such as PM2.5 [33][34][35], nickel [36] and cadmium [37,38]) promote the malignant transformation of lung epithelial cells by changing the expression levels of ncRNAs. Dysregulation of ncRNA expression levels was also revealed to be a novel paradigm for interpretation of the toxicology of various nanoparticles [39][40][41]. However, to date, there has been no study exploring the molecular mechanisms associated with CNT-induced lung carcinogenesis from the perspective of lncRNAs and pseudogenes. In this study, we, for the first time, attempted to screen potential lncRNAs and pseudogenes associated with CNT-induced lung carcinogenesis based on a comprehensive bioinformatics analysis of microarray data of lung cells directly exposed to CNTs without other influencing factors (GSE41178 and GSE561040) and sequencing (TCGA), proteomics (CPTAC) and immunohistochemical data (HPA) of lung cancer patients. Similar to the studies of PM2.5 [35], nickel [36] and cadmium [37] exposure, we also identified that the upregulation of PVT1 and downregulation of MEG3 contributed to the acquisition of tumorigenic phenotypes of human SAECs after CNT exposure. An increase in the expression level of lncRNA ARHGAP5-AS1 and decreases in the expression levels of lncRNA LINC00174, pseudogene MT1JP, MT1L, RPL23AP64, ZNF826P and TMEM198B were newly identified mechanisms in our study to explain the carcinogenic effects of CNTs. Nevertheless, some of our newly identified lncRNAs and pseudogenes had been demonstrated to mediate cancer development and progression. For example, Zhu et al. identified that ARHGAP5-AS1 was upregulated in chemo-resistant gastric cancer cells, and its knockdown reversed the chemoresistance. A high expression level of ARHGAP5-AS1 was associated with a poor prognosis for gastric cancer patients [42]. The expression level of MT1JP was shown to be significantly lower in gastric cancer [43], hepatocellular carcinoma [44] and lung cancer [45] tissues (especially for lymph node metastasis and advanced-stage cancer) than that in adjacent normal tissues. The survival time was significantly lengthened in cancer patients with a higher expression level of MT1JP [44]. Functionally, overexpression of MT1JP was found to inhibit cell proliferation, migration and invasion, promote cell apoptosis in vitro and slow down tumor growth and metastasis in vivo [43,44]. Ding et al. reported low MT1L expression in bladder cancer [46]. These findings indirectly indicate the importance of our identified lncRNAs and pseudogenes for CNT-induced lung cancer.
LncRNAs and pseudogenes were shown to be involved in various cell processes by regulating the expression levels of mRNAs via ceRNA [34,35] or co-expression [36,37,39,47] mechanisms. Thus, in our study, we constructed ceRNA and co-expression networks to identify the interaction pairs of hub lncRNAs and pseudogenes. As a result, we found that MEG3 and RPL23AP64 acted as a ceRNA to positively regulate the expression levels of CPEB2, PHF21A and BAMBI by sponging miR-942-5p, and PVT1, ZNF826P, RPL23AP64 and TMEM198B regulated the expression levels of PSME3, SYNGAP1, SGK3 and CLU expression by absorbing miR-423-5p, miR-23a-3p and miR-15b-5p, respectively. Furthermore, MEG3 and ZNF826P directly modulated the transcription of ATM and SGK3; MT1L could co-express with ATM, SYNGAP1, CLU and SGK3; ARHGAP5-AS1 could co-express with TMED10 and NIP7; LINC00174 could co-express with NEDD4L; RPL23AP64 could co-express with SGK3; PVT1 could co-express with NIP7; and MT1JP could co-express with SGK3. Among these regulatory relationship pairs, only the binding of PVT1 and miR-423-5p was verified by dual-luciferase reporter, RNA immunoprecipitation and RNA pull-down assays [48]. The expression level of PVT1 was significantly and negatively correlated with miR-423-5p in cancer tissues [48]. Thus, other mechanisms may be the first evidence provided by our study. However, the expression levels and roles of related miRNAs or mRNAs had been reported previously in lung cancer or other cancers. For example, Dong et al. found that miR-942-5p was significantly upregulated in human LUAD tissues and cell lines [49]. Enforced expression of miR-942-5p significantly enhanced the growth [49], migration and invasion capacities of non-small cell lung cancer (NSCLC) cells [50]. GSE24279 dataset analysis and PCR experiments confirmed that miR-23a-3p was upregulated in pancreatic cancer compared with controls [51]. Analysis of NSCLC tissue data in TCGA database showed a higher expression level of miR-23a was significantly associated with a shorter OS rate [52]. Serum miR-15b-5p was detected to be significantly upregulated in NSCLCs. Multivariate logistic regression analysis revealed that miR-15b-5p expression was an independent diagnostic factor for the identification of patients with NSCLCs from controls [53]. Knockdown of miR-15b-5p restrained growth and invasiveness and induced apoptosis of breast and prostate cancer cells [54,55]. Knockout of CPEB2 was shown to promote oncogenic properties of MCF10A and MCF7 cells, including increased proliferation, migration, invasion, EMT and stem-like cell phenotypes. Injection of CPEB2-knockout MCF10A cells in mice resulted in the formation of subcutaneous tumors and spontaneous lung metastases [56]. Kaplan-Meier analyses of GSE4412 and GSE4271 datasets demonstrated that downregulation of PHF21A was closely associated with a decreased OS rate among patients with glioma. A multivariate analysis further confirmed that the expression level of PHF21A was an independent prognostic factor [57]. BAMBI is a negative regulator of the TGF-β signaling pathway. Downregulation of BAMBI was observed to promote TGF-β-induced EMT, migration and invasion of NSCLCs in vitro, along with increased tumor burdens and tumor growth in vivo [58]. Overexpression of BAMBI possessed antitumor potentialities [58,59]. Immunohistochemistry experiments showed that PSME3 was highly expressed in pancreatic cancer cells [60]. RNA interferencemediated silencing of PSME3 suppressed cancer cell proliferation, invasion and migration; induced cell apoptosis and cell cycle arrest at the G2/M phase; and enhanced radiosensitivity [61]. SYNGAP1 (also known as RASA5) was frequently methylated in multiple carcinomas to induce a decrease in its expression levels. Knockdown of RASA5 enhanced Ras signaling and then promoted tumor cell growth, migration and invasion via activation of EMT [62]. NSCLC patients with clusterin-positive tumors had a significantly longer survival period relative to those with clusterin-negative tumors (63% vs. 42%) [63]. Consistent with the tissues, the mRNA level of CLU was lower in lung cancer cell lines. Knockdown of CLU was verified to significantly promote the growth of lung cancer cells in vitro and in vivo [64]. A decreased expression level of ATM was associated with inferior OS and higher prostate cancer-specific mortalities [65]. Activation of the ATM/checkpoint kinase 2/p53-dependent signaling pathway was shown to inhibit the chemoresistance of HGC-27 cells to oxaliplatin [66]. The cell viability was significantly higher, but the apoptotic rate was significantly lower by transfection with TMED10 into a papillary thyroid cancer cell line than those in the control group [67]. NEDD4L was downregulated in NSCLCs, which was correlated with lymph node invasion, advanced stage and poor prognosis. Overexpression of NEDD4L significantly suppressed cell proliferation, migration and invasion abilities, whereas knockdown of NEDD4L enhanced the tumor metastasis of NSCLC cells [68]. NIP7 was significantly upregulated in the TCGA glioblastoma dataset [69]. In line with these studies, we also confirmed that CPEB2, PHF21A, BAMBI, ATM, SYNGAP1, CLU, NEDD4L and miR-423-5p were downregulated, whereas TMED10, PSME3, NIP7, miR-15b-5p and miR-23a-3p were upregulated in CNT-exposed lung cells and lung cancer samples. All of them (including miR-942) were associated with survival of lung cancer patients. Experimental studies revealed that SGK3 was an oncogenic gene [70,71]; however, TCGA analysis of several cancer data indicated SGK3 was downregulated compared with controls (including lung cancer, which was also confirmed in our prognosis and HPA analyses) ( Figure S8). Thus, further experiments are necessary to confirm the expression levels and roles of SGK3 in CNT-related lung cancer. Furthermore, co-expression relationships between MT1JP and SYNGAP1/BAMBI/NEDD4L/ATM/CPEB2/CLU/PHF21A were present (all r > 0.6 and p < 0.01; Table S4), indicating MT1JP also functioned in lung cancer by influencing these genes.
To confirm whether our identified lncRNAs and pseudogenes were potential biomarkers for early diagnosis of CNT occupational workers with the risk to develop lung cancer, we performed ROC curve analysis based on the TCGA data of lung cancer. Our results showed the diagnostic accuracy of a four-lncRNA signature, a five-pseudogene signature or a nine-lncRNA-pseudogene signature were all higher than 80% (even approximate to 1), which was obviously higher than those of the 35-mRNA signature previously identified by Guo et al. (accuracy of 62-80%) [20] and essentially similar to a four-gene signature identified in our previous study [72]. Thus, we consider that our lncRNA and pseudogene signatures may be promising biomarkers for medical surveillance of lung cancer risk in occupational CNT-exposed workers.
To provide underlying preventive drugs, we mined the CTD database and selected some chemical molecules that reversed the expression levels of lncRNAs and pseudogenes. Our results showed that using supplements of Schisandra chinensis (containing a schizandrin A component), folic acid, zinc and gamma-linolenic Acid may be a potential approach to reduce the risk of developing lung cancer for populations with long-term CNT exposure. Our hypothesis can be indirectly validated because there studies have demonstrated the antitumor effects of these drugs. For example, Zhu et al. found schizandrin A treatment reduced viability, inhibited proliferation and induced cell cycle arrest and apoptosis of NSCLC cells [73]. Bae reported that higher foliate levels can decrease lung cancer risk in men (odds ratios = 0.82) and former smokers (odds ratios = 0.7) [74]. A meta-analysis performed by Charoenngam et al. showed that individuals with higher zinc intake had a significantly decreased risk of lung cancer, with a pooled risk ratio of 0.68 [75]. Dietary supplementation with gamma-linolenic acid was shown to inhibit the growth of a human lung carcinoma implanted in nude mice [76]. In vitro experiments also confirmed that gamma-linolenic acid was effective in inhibiting the migration and invasion of NSCLC cells [77]. This research has some limitations. First, this study preliminarily screened lncR-NAs and pseudogenes associated with CNT-induced lung carcinogenesis based on two microarray data of human lung epithelial cells and the sequencing data of lung cancer patient tissues. Their expression and downstream function mechanisms (target genes identified by ceRNA or co-expression analysis) still needed further confirmation by more high-throughput datasets and wet experiments (including PCR, Western blot, overexpression or silencing of genes, dual-luciferase reporter assay, proliferation, apoptosis, invasion and metastasis analyses) in in vitro (different cell lines) and in vivo lung cancer models induced by different types of CNTs, especially for those first reported in our study, such as RPL23AP64, ZNF826P and TMEM198B. Second, CNT-induced lung cancer may be a long-term process. It may be difficult to collect samples from clinical populations. Thus, mice models with low-dose, chronic, long-term exposure to CNTs should be first established to observe the risk of development of lung cancer and validate the diagnostic values of our identified lncRNAs and pseudogenes. Furthermore, dietary supplementation of schizandrin A, folic acid, zinc or gamma-linolenic acid were also assigned for model mice to reveal their possible protective roles in the development of lung cancer in CNT occupational workers. However, a repository of biological samples from CNT-exposed workers needs to be created in the future by collaboration with other experts, which may be more beneficial to monitor biologically relevant changes of humans over time and to verify the effects of molecular biomarkers and drugs [78].

Conclusions
In the present study, we identified four lncRNAs (MEG3, ARHGAP5-AS1, LINC00174 and PVT1) and five pseudogenes (MT1JP, MT1L, RPL23AP64, ZNF826P and TMEM198B) that function as a ceRNA for miRNA-mRNA interactions or co-express with mRNAs as candidate biomarkers for surveillance of lung malignant transformation in occupational CNT-exposed workers. Dietary supplementation of schizandrin A, folic acid, zinc or gamma-linolenic acid may be a potential preventive strategy for CNT-induced lung cancer.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/ 10.3390/ijerph19052936/s1, Figure S1: Venn diagram to show the overlapped genes. (A) commonly upregulated DE-lncRNAs, DE-pseudogenes and DE-mRNAs between SWCNT and MWCNT; (B) commonly downregulated lncRNAs, pseudogenes and mRNAs between SWCNT and MWCNT; (C) the overlap between module lncRNAs/pseudogenes and DE-lncRNAs/DE-pseudogenes. DE-lncRNAs, differentially expressed lncRNAs; DE-pseudogenes, differentially expressed pseudogenes; DE-mRNAs, differentially expressed mRNAs. Figure S2: GEPIA analysis to validate the expression of mRNAs in lung cancer. N = 50 or 59 indicates the control is match TCGA normal data; N = 338 or 347 indicates the control is match TCGA normal and GTEx data; asterisk (*) indicates the statistical significance at the threshold of |log 2 FC| > 0.5 and p-value < 0.05. Figure S3: UALCAN analysis to validate the expression of mRNAs in lung adenocarcinoma. Proteomic data from Clinical Proteomic Tumor Analysis Consortium (CPTAC) are used. Figure S4: HPA immunohistochemical staining results to validate the expression of mRNAs in lung cancer. LUAD, lung adenocarcinoma; LUSC, and squamous cell carcinomas. Figure Figure S6: Validation of the expression and prognosis of miR-942, miR-23a, miR-376b and miR-15b. A-D: the expression of miRNAs; E-H, the prognosis of miRNAs. LUAD, lung adenocarcinoma; LUSC, and squamous cell carcinomas. Figure S7: Validation of the expression and prognosis of miR-423 and its target mRNA PSME3. A: the prognosis of miR-423; B: the expression of PSME3; C: the prognosis of PSME3 for overall survival of TCGA LUAD samples; D: the prognosis of PSME3 for recurrence-free survival of TCGA LUAD samples; E: the prognosis of PSME3 for recurrence-free survival of TCGA LUSC samples. LUAD, lung adenocarcinoma; LUSC, and squamous cell carcinomas; TCGA, The Cancer Genome Atlas. Figure S8: SGK3 downregulated in several cancers. Asterisk indicates the statistical significance at the threshold of |log 2 FC| > 0.5 and p-value < 0.05. CESC, cervical squamous cell carcinoma and endocervical adenocarcinoma; OV, Ovarian serous cystadenocarcinoma; UCEC, uterine corpus endometrial carcinoma; UCS, uterine carcinosarcoma. Table S1: Survival analysis for lncRNAs. Table S2: The ceRNA interaction pairs for lncRNAs and pseudogenes. Table S3: The co-expression interaction pairs for lncRNAs and pseudogenes. Table S4: The protein-protein interaction pairs for lncRNAs and pseudogenes. Table S5: Survival analysis for mRNAs. Table S6: Function enrichment results for all genes in the protein-protein interaction networks of lncRNAs and pseudogenes.