Next Article in Journal
Novel Insights into Mitochondrial DNA: Mitochondrial Microproteins and mtDNA Variants Modulate Athletic Performance and Age-Related Diseases
Next Article in Special Issue
Reconstruction of Single-Cell Trajectories Using Stochastic Tree Search
Previous Article in Journal
Genetic Improvement and Application Practices of Synthetic Hexaploid Wheat
Previous Article in Special Issue
Client Applications and Server-Side Docker for Management of RNASeq and/or VariantSeq Workflows and Pipelines of the GPRO Suite
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Identification of TRPC6 as a Novel Diagnostic Biomarker of PM-Induced Chronic Obstructive Pulmonary Disease Using Machine Learning Models

1
Department of Food Science and Biotechnology, College of BioNano Technology, Gachon University, 1342 Seongnam-daero, Sujeong-gu, Seongnam-si 13120, Republic of Korea
2
Department of Computer Engineering, College of Information Technology, Gachon University, 1342 Seongnam-daero, Sujeong-gu, Seongnam-si 13120, Republic of Korea
*
Author to whom correspondence should be addressed.
Genes 2023, 14(2), 284; https://doi.org/10.3390/genes14020284
Submission received: 6 December 2022 / Revised: 18 January 2023 / Accepted: 19 January 2023 / Published: 21 January 2023
(This article belongs to the Collection Feature Papers in Bioinformatics)

Abstract

:
Chronic obstructive pulmonary disease (COPD) was the third most prevalent cause of mortality worldwide in 2010; it results from a progressive and fatal deterioration of lung function because of cigarette smoking and particulate matter (PM). Therefore, it is important to identify molecular biomarkers that can diagnose the COPD phenotype to plan therapeutic efficacy. To identify potential novel biomarkers of COPD, we first obtained COPD and the normal lung tissue gene expression dataset GSE151052 from the NCBI Gene Expression Omnibus (GEO). A total of 250 differentially expressed genes (DEGs) were investigated and analyzed using GEO2R, gene ontology (GO) functional annotation, and Kyoto Encyclopedia of Genes and Genomes (KEGG) identification. The GEO2R analysis revealed that TRPC6 was the sixth most highly expressed gene in patients with COPD. The GO analysis indicated that the upregulated DEGs were mainly concentrated in the plasma membrane, transcription, and DNA binding. The KEGG pathway analysis indicated that the upregulated DEGs were mainly involved in pathways related to cancer and axon guidance. TRPC6, one of the most abundant genes among the top 10 differentially expressed total RNAs (fold change ≥ 1.5) between the COPD and normal groups, was selected as a novel COPD biomarker based on the results of the GEO dataset and analysis using machine learning models. The upregulation of TRPC6 was verified in PM-stimulated RAW264.7 cells, which mimicked COPD conditions, compared to untreated RAW264.7 cells by a quantitative reverse transcription polymerase chain reaction. In conclusion, our study suggests that TRPC6 can be regarded as a potential novel biomarker for COPD pathogenesis.

1. Introduction

Chronic obstructive pulmonary disease (COPD) is a major lung disease and the third leading cause of death worldwide in 2010 [1]. It is an abnormal inflammatory response caused by exposure to particulate matter (PM), including toxic particles and gases. Patients with COPD experience chronic cough, chronic bronchitis, and accelerated lung dysfunction. Many studies have reported that a significant increase in inflammatory immune cells has been observed in the small airways of patients with COPD, which release fatal enzymes and inflammatory factors, leading to lung damage [2]. Neutrophils play major roles in COPD by producing neutrophil elastase (NE) and myeloperoxidase (MPO) [3]. Macrophages exacerbate COPD by releasing excess proinflammatory cytokines, matrix metalloproteinases (MMP), and reactive oxygen species (ROS) [4].
Examining the sputum and bronchoalveolar lavage fluid (BALF) of COPD patients for clinical applications is difficult. Hence, the detection of serum inflammatory markers in COPD patients is mainly used in clinical practice and correlates inflammatory markers with the severity of COPD. Thomsen et al. found that C-reactive protein (CRP), fibrinogen, and leukocyte population could be important COPD biomarkers associated with increased exacerbation [5]. However, the study had a reliability issue in the experimental results because of the unbalanced and too few non-viable samples.
To clearly distinguish the severity of COPD, many scientists have attempted to use machine learning algorithms for clinical decision making. Nunavath et al. investigated feed-forward neural networks (FFNN) for COPD classification and long short-term memory (LSTM) for the early prediction of exacerbation degrees and subsequent triage in patients with COPD [6]. However, the data were obtained from a family environment, which was likely to be affected by multiple factors, resulting in diminishing data quality. Tang et al. suggested a four-layer deep learning model that optimizes a specifically configured recurrent neural network to address temporal variations in COPD progression [7]. The proposed model led to a poor interpretation owing to its complexity. Almagro et al. utilized the Charlson index and a questionnaire to investigate comorbidities and short-term prognosis in hospitalized patients with COPD with exacerbation [8]. That study did not include the role of inflammation in the diagnosis of COPD.
In this study, we aimed to identify novel potential diagnostic biomarkers of COPD through machine learning models a database analysis of the largest publicly available repository of mRNA expression in COPD collected by the Gene Expression Omnibus (GEO) (https://www.ncbi.nlm.nih.gov/geo/ accessed on 10 January 2022) and revealed their expression in macrophages with PM-induced COPD (Figure 1).

2. Materials and Methods

2.1. Microarray Data Acquisition

GEO (http://www.ncbi.nlm.nih.gov/geo accessed on 10 January 2022) provides genomics data including high throughout microarrays and gene expression data to the public. One gene expression dataset [GSE151052] was used from GEO (GPL17556 [HuGene-1_0-st] Affymetrix Human Genome 1.0 ST Array). According to the annotation information in the platform, the probes are transformed into corresponding gene symbols. The GSE151052 dataset contained the total RNAs of 117 samples, including 78 samples from lung tissues of COPD patients and 39 samples from control.

2.2. Identification of Differentially Expressed Genes (DEGs)

GEO2R (http://www.ncbi.nlm.nih.gov/geo/geo2r/ accessed on 10 January 2022) is a web tool based on the R language limma package to obtain differentially expressed genes (DEGs) for comparing more than two groups of samples [9]. We utilized this tool to conduct comparisons on GSE151052 raw data. For this, we initially checked the overall characteristics of value distributions. Usually, median-centered values mean that the data are normalized. If the data were not normalized, the force normalization option was applied for quantile normalization to the expression data, forcing all selected samples to show identical value distribution. Then, we assigned samples from COPD patients and normal to “case group” and “control group”, respectively. Differentially expressed genes (DEGs) were identified using GEO2R (http://www.ncbi.nlm.nih.gov/geo/geo2r/ accessed on 10 January 2022), which is based on the GEO databases. In order to classify the DEGs among patients with COPD and controls, DEGs were acquired by |log (fold change; FC) | >1 and t-tests with p < 0.05.

2.3. GO Enrichment and KEGG Pathway Analysis

The Database for Annotation, Visualization, and Integrated Discovery (DAVID) v6.8 tool was used to interpret the functional roles of genes based on the genome studies [10]. The Gene Ontology database (GO; http://www.geneontology.org accessed on 15 February 2022) provides structured ontologies or vocabularies, explaining the characteristics of genes and gene products [11]. Kyoto Encyclopedia of Genes and Genomes database (KEGG; (http://www.genome.jp/kegg/ accessed on 15 February 2022)) provides information on biological systems from genomic, systemic functional, and chemical points [12]. We analyzed the GSE151052 database using DAVID software, according to Han’s report [13]. In our study, in the first step, we input the gene list into the search box, subsequently selected identifier “ENSEMBL_GENE_ID” and chose list type “Gene List”, then submitted the list. In the second step, we selected “Homo sapiens” to limit annotations and selected “List 1”. In the third step, we chose the four parameters “GOTERM-BP-DIRECT”, “GOTERM-CC-DIRECT”, “GOTERM-MF-DIRECT”, and “KEGG-PATHWAY” in the annotation summary results. To determine GO an KEGG pathway analysis results, “Functional Annotation Chart” was used.

2.4. Cell Culture

The RAW 264.7 cells were obtained from the Korean Cell Line Bank (KCLB, Seoul, Republic of Korea). Cells were cultured in Dulbecco’s modified Eagle’s medium (DMEM, Welgene, Daegu, Republic of Korea) supplemented with 10% fetal bovine serum (Welgene) and 1% penicillin and streptomycin (Welgene). They were grown in a 75 cm2 cell culture flask at 37.5 °C in humidified 5% CO2 incubators.

2.5. Extraction of RNAs and qRT-PCR

According to a previous study [14,15], total mRNA was extracted from RAW 264.7 cells using TRIzol reagent (Invitrogen, Carlsbad, CA, USA) and reverse transcribed using Revertra Ace qPCR RT kit (Toyobo Biologics Inc., Osaka, Japan). The PCR program was implemented as follows: initial denaturation at 94 °C for 2 min, followed by 30 cycles of denaturation at 94 °C for 20 s, annealing at 62.2 °C for 10 s, and extension at 72 °C for 45 s, with a final extension at 72 °C for 5 min. Polymerase chain reaction was performed at 94 °C for 2 min, 94 °C for 30 s, 55 °C for 30 s, and 68 °C for 1 min for 30 cycles. The sequence of TRPC6: forward 5′- GAA CTT AGC AAT GAG CTG GC -3′ and reverse5′- CAG AGG TCC AAG AGA CCA AC -3′. The levels of TRPC6 mRNA were normalized to GAPDH mRNA.

2.6. Statistical Analysis

The data were expressed as means ± standard deviation (SD) and analyzed by one-way ANOVA/Duncan’s t-test. These analyses were performed using the SPSS software program, version 12 (SPSS Inc., Chicago, IL, USA).

2.7. Machine Learning with Decision Tree

Machine learning is a technique to optimize compute performance criteria that use data or experience [16]. It provides possible solutions to detect the information hidden in massive and complex data [17]. Applying machine learning is an appropriate way to analyze microarray data, which are expressions of measurements of thousands of genes, and select the necessary genes from the microarray [17,18].
However, many machine learning models lack the explanatory ability for results, but decision tree models make it easy to identify the criteria for classification problems [14]. Therefore, we used decision tree [15] algorithms in this study. Decision tree algorithms allow the identification of the criteria for classifying COPD efficiently. When applying a decision tree algorithm with datasets, the algorithm generates tree-based classification criteria [18]. In other words, when we apply this algorithm with microarray data, the algorithm creates a tree-based COPD classification criterion based on the gene expression amount contained in the data. It makes it possible to visually check which genes greatly influence COPD classification and how much gene expression is the criterion for disease classification. Figure 2 shows a brief schema of decision tree.
The decision tree algorithm can be also used to classify microarray data because it outputs easy-to-understand results without generating complex rules that are difficult to analyze from a medical point of view. It does not require a complicated parameter-tuning process [19,20]. It is also a suitable method for biological data analysis in that the results obtained through this method can be treated as valuable information for further analysis [21]. Because of this feature, decision tree-based algorithms can be used as effective methodologies for microarray-based data analysis, such as those used for direct disease classification [19,21] or target gene screening [20,22].
Among them, the J48 algorithm, a representative decision tree algorithm, is an implementation of the C4.5 algorithm (revision 8) by Ross Quinlan [23]. The C4.5 algorithm is a decision tree algorithm and an improvement of the ID3 algorithm. Like ID3, C4.5 also uses formulas based on information theory and evaluates the goodness of a test with them, under the criterion of selecting a test that can extract maximum extractable information from a set of cases, considering constraints in which only one attribute is tested [24]. C4.5 shows several improvements over ID3, such as continuous data and unknown values that can be used for the algorithm’s input and attributes with different weights. Furthermore, due to pruning that is carried out after creation of the tree, the algorithm is enabled for pessimistic error prediction, subtree raising to simplify the tree by delete node, replacing it with the sub-tree, and redistributing instances with its classification criteria [25].
C4.5 is a form of a greedy technique that is a top-down recursive divide-and-conquer form of approach [26]. The algorithm selects specific well-classified values, separates them into child nodes, and recursively invokes the algorithm per sub-node basis [23]. Figure 3 shows its pseudocode.

3. Results

3.1. Identification of TRPC6 as a Potential Biomarker for COPD Using Machine Learning Models and GEO2R

3.1.1. Analysis Using Machine Learning Models

We conducted machine learning to identify a classifier capable of identifying COPD from the microarray data, and to identify important genes for COPD classification. The GSE151052 dataset used in this study comprises microarray profiles of 77 COPD samples and 40 control samples extracted from the lungs of patients with COPD and control group donors. Each sample contained information on 19,718 DEGs. A decision tree, which is a type of a machine learning model, can be used to classify microarray profiling data into the two groups (COPD and control).
In this study, we generated decision tree models for classifying COPD gene expression data and investigated the genes that were crucial to classification by analyzing the generated decision tree structures. We used the J48, DecisionStump, and REPTree models [27] implemented in WEKA [28]. Owing to the small amount of data, the results were verified using 10-fold cross validation [23]. We evaluated each classifier using the accuracy and F1 scores that were derived based on a confusion matrix [29], as shown in Table 1.
The Accuracy (Acc) and F1 Scores (F_1) are calculated as follows:
A c c = T P + T N T P + F P + T N + F N F 1 = 2 P R P + R P = T P T P + F P R = T P T P + F N
Figure 4 and Figure 5 show the tree structures obtained by learning the three decision tree models and their performances—J48, DecisionStump, and REPTree; all three models classified the data using only the value of gene ID 7225_at. These figures explain that algorithms that specially concentrated in 7225_at among genes in the microarray data while making decision trees classifying the disease. Moreover, these classifiers, using only 7225_at, showed significant performance in the COPD microarray profile dataset, with an accuracy of up to 0.991.
Originally, a decision tree is a machine learning model used to classify groups, and it is impossible to calculate the validation rate of each gene or rank the genes using the model. However, it is possible to infer the importance of genes in classification by analyzing the structure of the tree optimized for classification. Figure 4 shows that all three decision tree models optimized for the classification of COPD contained only one TRPC6 gene (7225_at). The result is unusual because machine learning results using a conventional microarray or RNA-sequencing generally do not determine disease classification using only one gene. Figure 6 and Figure 7 show the results of the same experiment but with the GSE57148 [30] data, and revealed that the results with the J48 classifier came in a complex tree form while at the same time performing poorly overall. The results also indicated that all three classifiers were focused on different genes. The results also differed from those of the original experiment in which most classifiers focused on only one gene. Therefore, Figure d suggests that there is something to pay attention to in the results from the original experiment.
The statistical analysis results (Table 1) were ranked using the statistical value of LogFC, and the ranking was not absolute. If the results were re-ranked with p-value, another result would be obtained. However, according to the results of the decision trees, there was strong evidence that COPD can be classified using only one TRPC6 gene. All three decision tree models with dataset GSE151052 included only one TRPC6 gene (Figure 5), and their classification accuracies were very high, at over 98%. Hence, the results of the decision trees were considered to have priority over those of statistical analysis in this study.
The results of the decision trees with dataset GSE151052 showed that COPD could be classified with only TRPC6 alone regardless of other genes, so we appraised TRPC6 as a definite biomarker. In practice, it is very rare that samples are classified with high accuracy based on only one gene in a decision tree. The results with the J48 decision tree using another dataset, GSE57148 (Figure 5), which does not contain TRPC6, showed that many genes were included in the decision tree. However, all three decision tree models with dataset GSE151052 included only one TRPC6 gene (Figure 5), and their classification accuracies were very high, at over 98%. Since this is strong evidence that COPD can be classified using only one TRPC6 gene, the results of the decision trees were considered to have priority over those of statistical analysis.
Typically, a decision tree algorithm applies pruning to deal with overfitting in the machine learning process. The algorithm produces an optimal decision tree through pruning, which soon shows which gene the algorithm is paying attention to [30]. According to the presented experimental results, all three algorithms produced classifiers by focusing on the 7225_at, which is termed TRPC6, and the accuracy of each classifier was also relatively high. This result shows that the algorithms generated the correct classifiers with only the TRPC6. Moreover, the experimental results suggest that TRPC6 plays a crucial role in terms of COPD classification using machine learning methods. This finding implies that TRPC6 is very important in terms of classification using machine learning and further suggests that TRPC6 can also be considered as a biomarker of COPD pathogenesis.

3.1.2. GEO2R

After the analysis of differentially expressed RNA in the GSE151052 (n = 117) dataset, a total of 250 DEGs were identified, of which 15 genes were upregulated and 12 genes were downregulated in patients with COPD compared to the normal group (Table 2). The machine learning results indicated that TRPC6 was the most highly expressed gene among the top 10 upregulated genes in patients with COPD compared to the control. The GEO2R analysis showed that RTKN2 was the most abundant among the top 10 upregulated genes, and TRPC6 was the eighth most abundant gene among the top 10 upregulated genes in patients with COPD compared to the control.

3.2. GO Term Enrichment and KEGG Pathway Analysis

To identify significant functional DEGs between the COPD and normal groups, DAVID software was used. The top five enrichment analyses for GO are shown in Table 3 and Table 4. For the biological process (BP) enrichment analysis, the upregulated genes were significantly involved in the transcription: DNA-templated (GO:0006351), transmembrane transport (GO:0055085), post-embryonic development (GO:0009791), ossification (GO:0001503), and covalent chromatin modification (GO:0016569). In addition, the TRPC6 gene was included in the manganese ion transport (GO:0006828).
For the cell component (CC) enrichment analysis, upregulated genes in patients with COPD were enriched in the plasma membrane (GO:0005886), intracellular (GO:0005622), nuclear chromatin (GO:0000785), sarcolemma (GO:0042383), and receptor complex (GO:0043235). TRPC6 was localized in the plasma membrane (GO:0005886).
For the molecular function (MF), upregulated genes were mainly involved in the DNA binding (GO:0003677), transcription factor activity, sequence-specific DNA binding (GO:0003700), calcium ion binding (GO:0005509), chromatin binding (GO:0003682), and integrin binding (GO:0005178). The TRPC6 gene was involved in inositol 1,4,5 trisphosphate binding (GO:0070679) and store-operated calcium channel activity (GO:0015279).
The KEGG pathway analysis showed that the upregulated DEGs were mainly enriched in axon guidance (map04360), serotonergic synapse (map04726), and pathways in cancer (map05200). The downregulated DEGs were mainly enriched in the biosynthesis of antibiotics (map00998), metabolic pathways (map01100), biosynthesis of amino acids (map01230), carbon metabolism (map01200), and aminoacyl-tRNA biosynthesis(map00970). However, TRPC6, which is upregulated in patients with COPD, was not included in the three pathways of upregulated DEGs.

3.3. Validating the Expression and Diagnostic Value of TRPC6 in Vitro Model of COPD Using PCR Analysis

Recently, many groups have reported that particulate air pollution, including PM10, is a major risk factor for COPD [31]. It stimulates immune cells in the lungs, such as alveolar macrophages [22]. To confirm TRPC6, which was identified as a novel biomarker of COPD based on the GEO2R analysis and machine learning analysis, we investigated the level of TRPC6 mRNA expression in PM-stimulated RAW 264.7 macrophages. The level of TRPC6 mRNA expression was significantly upregulated in the PM-stimulated RAW 264.7 macrophages compared to the control (p < 0.05) (Figure 8).

4. Discussion

COPD is a major disease with a steep increase in morbidity and mortality rates worldwide. According to COPD studies, the levels of IL-6, IL-8, tumor necrosis factor (TNF)-α, CRP, fibrinogen, and leukocyte population could be considered COPD biomarkers [12,32]. Many studies have implied that a large surge in COPD incidence is due to air pollution, such as PM. In vitro and in vivo studies have shown that PM can induce pulmonary inflammation, destroy lung function, cause emphysematous changes in PM10, and induce the release of proinflammatory cytokines (e.g., TNF-α and IL-1) and reactive oxygen radicals by alveolar macrophages in patients with COPD [31]. These biomarkers of COPD, based on the literature review, have been studied for decades. However, they cannot reflect with certainty the severity of COPD pathophysiology in current clinical practice. To identify new biomarkers in PM-induced COPD, we used GEO2R and machine learning methods. In our study, the gene expression data of GSE6676 were downloaded to identify novel biomarkers that were differentially expressed in the lungs of patients with COPD versus healthy people. Our study indicated that there were TRPC6 DEGs between the COPD and normal groups.
In this study, an unusual result was obtained from the analysis of COPD microarray data using decision tree machine learning models. A decision tree learned for a classification problem usually has many nodes and a depth of three or more. However, in this case, a simple binary tree structure with only three nodes and a depth of two was obtained, in which the COPD and control groups were classified with more than 99% accuracy. In this decision tree, only one gene (TRPC6) was used for the classification. Hence, the effect of TRPC6 was significantly greater in the classification of COPD than that of other genes. This finding contradicts the results of the analysis using GEO2R. In the analysis using GEO2R, TRPC6 was ranked sixth. In addition, TRPC6 has not been identified as a major biomarker for COPD in previous studies. Thus, new biomarker candidates that cannot be found using statistical methods such as GEO2R can be identified by some machine learning methods.
To investigate the predicted biological functions and signaling pathways of the DEGs in patients with COPD, we performed GO and KEGG pathway analyses. The GO analysis indicated that upregulated DEGs mainly participated in the BP and MF, whereas downregulated DEGs mainly took part in the BP. The GO analysis showed that the predicted targets of COPD were mainly enriched for transcription (DNA-templated), plasma membrane, and DNA binding. TRPC6 is localized in the plasma membranes.
TRPC6, which is a Ca2+-permeable cation and an oxidative stress-sensitive channel located in the plasma membrane, is widely expressed in various tissues. TRPC-dependent increases in Ca2+ in pulmonary cells induce the activation of inflammatory signaling molecules (e.g., ERK1/2, p38, and JNK), which increases the levels of the inflammatory factors IL-6 and IL-8 in COPD [33]. TRPC6 is expressed in the lungs, including bronchial epithelial cells, alveolar macrophages, and the pulmonary vasculature [34]. Finney-Hayward et al. reported that the level of TRPC6 mRNA in alveolar macrophages from patients is significantly higher than that in healthy controls [35]. Therefore, TRPC6 identified from GEO2R and machine learning analysis could be a novel biomarker for the pathogenesis of COPD.
Studies have demonstrated that macrophages are the major cell type in COPD [36]. Macrophages are innate effector cells for pulmonary host defense against pathogeneses and inhaled particles such as PM. The number of macrophages was significantly increased (5- to 10-fold) in the airways, bronchial tubes, and BALF of patients with COPD [37,38]. In addition, a positive correlation was shown between the number of macrophages in the airways and COPD severity [39]. Recently, many groups have reported that particulate air pollution, including PM10, is a major risk factor for COPD by stimulating alveolar macrophages [40]. We found that TRPC6 is a significantly increased molecule in patients with COPD using machine learning methods and GEO2R. To determine whether TRPC6 is upregulated in COPD, we investigated the level of TRPC6 mRNA expression in RAW 264.7 macrophage-stimulated PM and analogical COPD condition. The experimental verification showed that the level of TRPC6 mRNA expression in the PM-stimulated RAW 264.7 cells was increased in a concentration-dependent manner. This result suggests that TRPC6 is significantly expressed in the pathogenesis of COPD.

5. Conclusions

Our study suggests that TRPC6 can be regarded as a potential novel biomarker for COPD pathogenesis. All three machine learning algorithms (J48, DecisionStump and REPTree) suggested that TRPC6 plays a crucial role in terms of COPD classification. The mRNA expression of TRPC6 is significantly increased in PM-stimulated RAW264.7 cells, which mimic COPD. For diseases other than COPD, a method for deriving biomarker candidates using machine learning and microarray data can be effective. Research on diverse gene expression data is left for future works.

Author Contributions

Conceptualization, H.-J.P. and Y.-R.Y.; methodology, H.-J.P., K.-R.D., Y.-R.Y. and J.-H.L.; software, Y.-R.Y. and J.-H.L.; validation, H.-J.P., K.-R.D., Y.-R.Y. and J.-H.L.; investigation, H.-J.P., K.-R.D., Y.-R.Y. and J.-H.L.; resources, H.-J.P., K.-R.D., Y.-R.Y. and J.-H.L.; data curation, H.-J.P., K.-R.D., Y.-R.Y. and J.-H.L.; writing—original draft preparation, H.-J.P., K.-R.D., Y.-R.Y. and J.-H.L.; writing—review and editing, H.-J.P. and J.-H.L.; visualization, K.-R.D. and Y.-R.Y.; supervision, H.-J.P. and J.-H.L.; project administration, H.-J.P. and J.-H.L.; funding acquisition, H.-J.P. and J.-H.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Gachon University, grant number 202002720001.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

This work was supported by the Gachon University (NO. 202002720001).

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. López-Campos, J.L.; Tan, W.; Soriano, J.B. Global burden of COPD. Respirology 2016, 21, 14–23. [Google Scholar] [CrossRef] [PubMed]
  2. Polosukhin, V.V.; Richmond, B.W.; Du, R.-H.; Cates, J.M.; Wu, P.; Nian, H.; Massion, P.P.; Ware, L.B.; Lee, J.W.; Kononov, A.V. Secretory IgA deficiency in individual small airways is associated with persistent inflammation and remodeling. Am. J. Respir. Crit. Care Med. 2017, 195, 1010–1021. [Google Scholar] [CrossRef] [PubMed]
  3. Hogg, J.C.; Chu, F.; Utokaparch, S.; Woods, R.; Elliott, W.M.; Buzatu, L.; Cherniack, R.M.; Rogers, R.M.; Sciurba, F.C.; Coxson, H.O. The nature of small-airway obstruction in chronic obstructive pulmonary disease. New Engl. J. Med. 2004, 350, 2645–2653. [Google Scholar] [CrossRef] [PubMed]
  4. Barnes, P.J. Cellular and molecular mechanisms of chronic obstructive pulmonary disease. Clin. Chest Med. 2014, 35, 71–86. [Google Scholar] [CrossRef]
  5. Thomsen, R.W.; Lange, P.; Hellquist, B.; Frausing, E.; Bartels, P.D.; Krog, B.R.; Hansen, A.-M.S.; Buck, D.; Bunk, A.E. Validity and underrecording of diagnosis of COPD in the Danish National Patient Registry. Respir. Med. 2011, 105, 1063–1068. [Google Scholar] [CrossRef] [Green Version]
  6. Nunavath, V.; Goodwin, M.; Fidje, J.T.; Moe, C.E. Deep neural networks for prediction of exacerbations of patients with chronic obstructive pulmonary disease. In Proceedings of the International Conference on Engineering Applications of Neural Networks, Bristol, UK, 3–5 September 2018; pp. 217–228. [Google Scholar]
  7. Tang, C.Y.; Taylor, N.F.; Blackstock, F.C. Chest physiotherapy for patients admitted to hospital with an acute exacerbation of chronic obstructive pulmonary disease (COPD): A systematic review. Physiotherapy 2010, 96, 1–13. [Google Scholar] [CrossRef]
  8. Almagro, P.; Calbo, E.; de Echaguïen, A.O.; Barreiro, B.; Quintana, S.; Heredia, J.L.; Garau, J. Mortality after hospitalization for COPD. Chest 2002, 121, 1441–1448. [Google Scholar] [CrossRef]
  9. Team, R.C. R: A language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. Available online: http://www.R-project.org/2013 (accessed on 9 November 2022).
  10. Huang, D.W.; Sherman, B.T.; Tan, Q.; Kir, J.; Liu, D.; Bryant, D.; Guo, Y.; Stephens, R.; Baseler, M.W.; Lane, H.C. DAVID Bioinformatics Resources: Expanded annotation database and novel algorithms to better extract biology from large gene lists. Nucleic Acids Res. 2007, 35, W169–W175. [Google Scholar] [CrossRef] [Green Version]
  11. Carbon, S.; Ireland, A.; Mungall, C.J.; Shu, S.; Marshall, B.; Lewis, S.; Hub, A.; Group, W.P.W. AmiGO: Online access to ontology and annotation data. Bioinformatics 2009, 25, 288–289. [Google Scholar] [CrossRef] [Green Version]
  12. Faner, R.; Tal-Singer, R.; Riley, J.H.; Celli, B.; Vestbo, J.; MacNee, W.; Bakke, P.; Calverley, P.M.; Coxson, H.; Crim, C. Lessons from ECLIPSE: A review of COPD biomarkers. Thorax 2014, 69, 666–672. [Google Scholar] [CrossRef]
  13. Han, J.; Wan, M.; Ma, Z.; Hu, C.; Yi, H. Prediction of targets of curculigoside A in osteoporosis and rheumatoid arthritis using network pharmacology and experimental verification. Drug Des. Dev. Ther. 2020, 14, 5235. [Google Scholar] [CrossRef] [PubMed]
  14. Kwon, H.-K.; Song, M.-J.; Lee, H.-J.; Park, T.-S.; Kim, M.I.; Park, H.-J. Pediococcus pentosaceus-fermented Cordyceps militaris inhibits inflammatory reactions and alleviates contact dermatitis. Int. J. Mol. Sci. 2018, 19, 3504. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  15. Dhong, K.-R.; Kwon, H.-K.; Park, H.-J. Immunostimulatory Activity of Cordyceps militaris Fermented with Pediococcus pentosaceus SC11 Isolated from a Salted Small Octopus in Cyclophosphamide-Induced Immunocompromised Mice and Its Inhibitory Activity against SARS-CoV 3CL Protease. Microorganisms 2022, 10, 2321. [Google Scholar] [CrossRef] [PubMed]
  16. Alpaydin, E. Introduction to Machine Learning. MIT Press: Cambridge, MA, USA, 2020. [Google Scholar]
  17. Mahendran, N.; Durai Raj Vincent, P.; Srinivasan, K.; Chang, C.-Y. Machine learning based computational gene selection models: A survey, performance evaluation, open issues, and future research directions. Front. Genet. 2020, 11, 603808. [Google Scholar] [CrossRef] [PubMed]
  18. Selvaraj, S.; Natarajan, J. Microarray Data Analysis and Mining Tools. Bioinformation 2011, 6, 95–99. [Google Scholar] [CrossRef] [PubMed]
  19. Czajkowski, M.; Grześ, M.; Kretowski, M. Multi-test decision tree and its application to microarray data classification. Artif. Intell. Med. 2014, 61, 35–44. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  20. Tsai, M.H.; Wang, H.C.; Lee, G.W.; Lin, Y.C.; Chiu, S.H.M. A decision tree based classifier to analyze human ovarian cancer cDNA microarray datasets. J. Med. Syst. 2016, 40, 21. [Google Scholar] [CrossRef]
  21. Jurczuk, K.; Czajkowski, M.; Kretowski, M. Evolutionary induction of a decision tree for large-scale data: A GPU-based approach. Soft Comput. 2017, 21, 7363–7379. [Google Scholar] [CrossRef]
  22. Vlahos, R.; Bozinovski, S. Role of alveolar macrophages in chronic obstructive pulmonary disease. Front. Immunol. 2014, 5, 435. [Google Scholar] [CrossRef] [Green Version]
  23. Refaeilzadeh, P.; Tang, L.; Liu, H. Cross-validation. Encycl. Database Syst. 2009, 5, 532–538. [Google Scholar]
  24. Salzberg, S.L. Book Review: C4.5: Programs for Machine Learning by J. Ross Quinlan. Mach. Learn. 1993, 16, 235–240. [Google Scholar] [CrossRef]
  25. Hssina, B.; Merbouha, A.; Ezzikouri, H.; Erritali, M. A Comparative Study of Decision Tree ID3 and C4.5. Int. J. Adv. Comput. Sci. Appl. 2014, 4, 13–19. [Google Scholar] [CrossRef] [Green Version]
  26. Aljawarneh, S.; Yassein, M.B.; Aljundi, M. An Enhanced J48 Classification Algorithm for the Anomaly Intrusion Detection Systems. Clust. Comput. 2019, 22, 10549–10565. [Google Scholar] [CrossRef]
  27. Eibe Frank, M.A.H.; Witten, I.H. Data Mining, Fourth Edition: Practical Machine Learning Tools and Techniques, 4th ed.; Morgan kaufmann: Burlington, MA, USA, 2016. [Google Scholar]
  28. Eibe Frank, M.A.H.; Pal, C.J. The WEKA Workbench. Online Appendix for “Data Mining: Practical Machine Learning Tools and Techniques”, 4th ed.; Morgan Kaufmann: Burlington, MA, USA, 2016. [Google Scholar]
  29. Heydarian, M.; Doyle, T.E.; Samavi, R. MLCM: Multi-label confusion matrix. IEEE Access 2022, 10, 19083–19095. [Google Scholar] [CrossRef]
  30. Kim, W.J.; Lim, J.H.; Lee, J.S.; Lee, S.-D.; Kim, J.H.; Oh, Y.-M. Comprehensive Analysis of Transcriptome Sequencing Data in the Lung Tissues of COPD Subjects. Int. J. Genom. 2015, 2015, 1–9. [Google Scholar] [CrossRef] [Green Version]
  31. Zhao, J.; Li, M.; Wang, Z.; Chen, J.; Zhao, J.; Xu, Y.; Wei, X.; Wang, J.; Xie, J. Role of PM2. 5 in the development and progression of COPD and its mechanisms. Respir. Res. 2019, 20, 1–13. [Google Scholar] [CrossRef] [Green Version]
  32. Bradford, E.; Jacobson, S.; Varasteh, J.; Comellas, A.P.; Woodruff, P.; O’Neal, W.; DeMeo, D.L.; Li, X.; Kim, V.; Cho, M. The value of blood cytokines and chemokines in assessing COPD. Respir. Res. 2017, 18, 1–11. [Google Scholar] [CrossRef] [Green Version]
  33. Chen, Q.; Zhou, Y.; Zhou, L.; Fu, Z.; Yang, C.; Zhao, L.; Li, S.; Chen, Y.; Wu, Y.; Ling, Z. TRPC6-dependent Ca2+ signaling mediates airway inflammation in response to oxidative stress via ERK pathway. Cell Death Dis. 2020, 11, 1–16. [Google Scholar]
  34. Abramowitz, J.; Birnbaumer, L. Physiology and pathophysiology of canonical transient receptor potential channels. FASEB J. 2009, 23, 297–328. [Google Scholar] [CrossRef] [Green Version]
  35. Finney-Hayward, T.K.; Popa, M.O.; Bahra, P.; Li, S.; Poll, C.T.; Gosling, M.; Nicholson, A.G.; Russell, R.E.; Kon, O.M.; Jarai, G. Expression of transient receptor potential C6 channels in human lung macrophages. Am. J. Respir. Cell Mol. Biol. 2010, 43, 296–304. [Google Scholar] [CrossRef]
  36. Cosio, M.G.; Guerassimov, A. Chronic obstructive pulmonary disease: Inflammation of small airways and lung parenchyma. Am. J. Respir. Crit. Care Med. 1999, 160, S21–S25. [Google Scholar] [CrossRef] [PubMed]
  37. Keating, V.; Collins, P.; Scott, D.; Barnes, P. Differences in interleukin-8 and tumour necrosis factorinduced sputum from patients with chronic obstructive pulmonary disease or asthma. Am. J. Respir. Crit. Care Med. 1996, 153, 4. [Google Scholar]
  38. Ficker, J.; Dertinger, S.; Siegfried, W.; König, H.; Pentz, M.; Sailer, D.; Katalinic, A.; Hahn, E. Obstructive sleep apnoea and diabetes mellitus: The role of cardiovascular autonomic neuropathy. Eur. Respir. J. 1998, 11, 14–19. [Google Scholar] [CrossRef] [PubMed]
  39. Boulet, L.-P.; Milot, J.; Boutet, M.; St Georges, F.; Laviolette, M. Airway inflammation in nonasthmatic subjects with chronic cough. Am. J. Respir. Crit. Care Med. 1994, 149, 482–489. [Google Scholar] [CrossRef] [PubMed]
  40. Ling, S.H.; van Eeden, S.F. Particulate matter air pollution exposure: Role in the development and exacerbation of chronic obstructive pulmonary disease. Int. J. Chronic Obstr. Pulm. Dis. 2009, 4, 233. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Overview of the experimental workflow.
Figure 1. Overview of the experimental workflow.
Genes 14 00284 g001
Figure 2. A brief schema of decision tree.
Figure 2. A brief schema of decision tree.
Genes 14 00284 g002
Figure 3. Pseudocode of C4.5 algorithm [26].
Figure 3. Pseudocode of C4.5 algorithm [26].
Genes 14 00284 g003
Figure 4. Chart visualizing the performance of the three decision tree models and their structures.
Figure 4. Chart visualizing the performance of the three decision tree models and their structures.
Genes 14 00284 g004
Figure 5. Confusion matrix for results shown in Figure 4. As a result of the experiment using machine learning models, the COPD and control groups of the dataset were classified only according to the expression level of TRPC6, regardless of the values of the other genes, and the accuracy of the best model (J48) was over 99%. This result shows that the classifiers that the decision tree algorithm generated classified COPD patients and controls with simple criteria but high accuracy.
Figure 5. Confusion matrix for results shown in Figure 4. As a result of the experiment using machine learning models, the COPD and control groups of the dataset were classified only according to the expression level of TRPC6, regardless of the values of the other genes, and the accuracy of the best model (J48) was over 99%. This result shows that the classifiers that the decision tree algorithm generated classified COPD patients and controls with simple criteria but high accuracy.
Genes 14 00284 g005
Figure 6. Chart for visualizing the performance of the three decision tree models with the GSE57148 dataset [30] and their structure.
Figure 6. Chart for visualizing the performance of the three decision tree models with the GSE57148 dataset [30] and their structure.
Genes 14 00284 g006
Figure 7. Confusion matrices for results showed in Figure 6.
Figure 7. Confusion matrices for results showed in Figure 6.
Genes 14 00284 g007
Figure 8. Determination of the TRPC6 mRNA expression in RAW 264.7 cells. Data are expressed as mean ± standard deviation (SD) of 3 independent experiments (n ≥ 3). The mRNA band intensities were converted to the numerical data using Image-Studio software (LI COR, Lincoln, NE, USA). One-way ANOVA was used for comparison of group means, followed by Dunnett’s t-test (* p < 0.05 vs. untreated control).
Figure 8. Determination of the TRPC6 mRNA expression in RAW 264.7 cells. Data are expressed as mean ± standard deviation (SD) of 3 independent experiments (n ≥ 3). The mRNA band intensities were converted to the numerical data using Image-Studio software (LI COR, Lincoln, NE, USA). One-way ANOVA was used for comparison of group means, followed by Dunnett’s t-test (* p < 0.05 vs. untreated control).
Genes 14 00284 g008
Table 1. Confusion matrix.
Table 1. Confusion matrix.
ActualCOPDControl
Prediction
COPDTrue Positive ( T P )False Positive ( F P )
ControlFalse Negative ( F N )True Negative ( T N )
Table 2. List of differentially expressed total RNAs using GEO2R.
Table 2. List of differentially expressed total RNAs using GEO2R.
No.IDGeneAdjusted
p Value
p ValuelogFC
Upregulated genes (top 10)
155282_atLRRC367.54 × 10−232.45 × 10−251.377
2344148_atNCKAPS1.87 × 10−271.71 × 10−301.251
32487_atFRZB6.54 × 10−213.52 × 10−231.250
46092_atROBO29.24 × 10−251.64 × 10−271.159
56387_atCXCL124.81 × 10−194.20 × 10−211.129
6653_atBMP56.71 × 10−204.46 × 10−221.128
72669_atGEM3.15 × 10−192.58 × 10−211.116
87225_atTRPC61.15 × 10−304.08 × 10−341.110
984251_atSGIP13.26 × 10−273.31 × 10−301.062
10114905_atC1QTNF72.09 × 10−308.48 × 10−341.062
Downregulated genes
19332_atCD1632.33 × 10−244.73 × 10−27−2.271
26036_atRNASE23.00 × 10−263.65E × 10-29−1.567
329968_atPSAT11.46 × 10−242.60 × 10−27−1.34
44830_atNME11.05 × 10−223.71 × 10−25−1.318
51646_atAKR1C28.76 × 10−206.22 × 10−22−1.221
61510_atCTSE3.28 × 10−183.90 × 10−20−1.208
7195814_atSDR16C53.13 × 10−201.90 × 10−22−1.049
8123_atPLIN25.33 × 10−241.32 × 10−26−1.039
93855_atKRT71.01 × 10−223.47 × 10−25−1.035
106472_atSHMT22.58 × 10−245.37 × 10−27−1.019
Table 3. Top five gene ontology (GO) terms and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways enriched for upregulated DEGs.
Table 3. Top five gene ontology (GO) terms and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways enriched for upregulated DEGs.
CategoryTermCount%p-ValueBenjamin
GOTERM_BP_DIRECTTranscription, DNA-templated1415.98.40 × 10−021.00 × 100
GOTERM_BP_DIRECTTransmembrane transport44.59.30 × 10−021.00 × 100
GOTERM_BP_DIRECTPostembryonic development33.44.10 × 10−021.00 × 100
GOTERM_BP_DIRECTOssification33.44.90 × 10−021.00 × 100
GOTERM_BP_DIRECTCovalent chromatin modification33.48.90 × 10−021.00 × 100
GOTERM_CC_DIRECTPlasma membrane29333.50 × 10−032.60 × 10−01
GOTERM_CC_DIRECTIntracellular1112.54.90 × 10−021.00 × 100
GOTERM_CC_DIRECTNuclear chromatin66.81.30 × 10−031.90 × 10−01
GOTERM_CC_DIRECTSarcolemma33.44.90 × 10−021.00 × 100
GOTERM_CC_DIRECTReceptor complex33.49.80 × 10−021.00 × 100
GOTERM_MF_DIRECTDNA binding1415.93.20 × 10−021.00 × 100
GOTERM_MF_DIRECTTranscription factor activity, sequence-specific DNA binding910.26.30 × 10−021.00 × 100
GOTERM_MF_DIRECTCalcium ion binding89.14.00 × 10−021.00 × 100
GOTERM_MF_DIRECTChromatin binding787.80 × 10−039.90 × 10−01
GOTERM_MF_DIRECTIntegrin binding44.51.10 × 10−029.90 × 10−01
KEGG_PATHWAYAxon guidance55.72.10 × 10−032.30 × 10−01
KEGG_PATHWAYSerotonergic synapse33.48.40 × 10−021.00 × 100
KEGG_PATHWAYPathways in cancer55.78.90 × 10−021.00 × 100
Table 4. Top five gene ontology (GO) terms and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways enriched for downregulated DEGs.
Table 4. Top five gene ontology (GO) terms and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways enriched for downregulated DEGs.
CategoryTermCount%p-ValueBenjamin
GOTERM_BP_DIRECTOxidation–reduction process1510.33.20 × 10−046.70 × 10−02
GOTERM_BP_DIRECTtRNA aminoacylation for protein translation85.52.90 × 10−082.30 × 10−05
GOTERM_BP_DIRECTCell–cell adhesion85.56.60 × 10−037.50 × 10−01
GOTERM_BP_DIRECTIRE1-mediated unfolded protein response74.88.00 × 10−063.20 × 10−03
GOTERM_BP_DIRECTResponse to nutrient64.13.30 × 10−046.70 × 10−02
GOTERM_CC_DIRECTExtracellular exosome5134.92.00 × 10−094.40 × 10−07
GOTERM_CC_DIRECTCytoplasm5336.31.60 × 10−022.40 × 10−01
GOTERM_CC_DIRECTCytosol5034.21.20 × 10−061.30 × 10−04
GOTERM_CC_DIRECTMembrane3221.94.80 × 10−041.70 × 10−02
GOTERM_CC_DIRECTMitochondrion2718.57.70 × 10−065.50 × 10−04
GOTERM_MF_DIRECTNADP binding64.17.60 × 10−062.50 × 10−03
GOTERM_MF_DIRECTPoly(A) RNA binding2114.45.30 × 10−047.60 × 10−02
GOTERM_MF_DIRECTATP binding2114.41.30 × 10−025.50 × 10−01
GOTERM_MF_DIRECTCadherin binding involved in cell–cell adhesion85.58.10 × 10−033.90 × 10−01
GOTERM_MF_DIRECTProtein kinase binding74.87.70 × 10−021.00 × 100
KEGG_PATHWAYBiosynthesis of antibiotics16111.20 × 10−081.50 × 10−06
KEGG_PATHWAYMetabolic pathways3322.61.50 × 10−066.10 × 10−05
KEGG_PATHWAYBiosynthesis of amino acids96.21.40 × 10−066.10 × 10−05
KEGG_PATHWAYCarbon metabolism96.24.10 × 10−051.20 × 10−03
KEGG_PATHWAYAminoacyl–tRNA biosynthesis74.89.90 × 10−052.40 × 10−03
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Dhong, K.-R.; Lee, J.-H.; Yoon, Y.-R.; Park, H.-J. Identification of TRPC6 as a Novel Diagnostic Biomarker of PM-Induced Chronic Obstructive Pulmonary Disease Using Machine Learning Models. Genes 2023, 14, 284. https://doi.org/10.3390/genes14020284

AMA Style

Dhong K-R, Lee J-H, Yoon Y-R, Park H-J. Identification of TRPC6 as a Novel Diagnostic Biomarker of PM-Induced Chronic Obstructive Pulmonary Disease Using Machine Learning Models. Genes. 2023; 14(2):284. https://doi.org/10.3390/genes14020284

Chicago/Turabian Style

Dhong, Kyu-Ree, Jae-Hyeong Lee, You-Rim Yoon, and Hye-Jin Park. 2023. "Identification of TRPC6 as a Novel Diagnostic Biomarker of PM-Induced Chronic Obstructive Pulmonary Disease Using Machine Learning Models" Genes 14, no. 2: 284. https://doi.org/10.3390/genes14020284

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop