Developing Sustainable Classification of Diseases via Deep Learning and Semi-Supervised Learning
Abstract
:1. Introduction
2. Literature Review
3. Methods
3.1. Semi-Supervised Learning with Deep Forest
3.2. Self-Training
4. Datasets
4.1. Lung Dataset
4.2. Breast Dataset
4.3. Prostate Dataset
5. Results
Discussion
6. Conclusions
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
References
- Wang, Q.; Zhou, Y.; Ding, W.; Zhang, Z.; Muhammad, K.; Cao, Z. Random Forest with Self-paced Bootstrap Learning in Lung Cancer Prognosis. ACM Trans. Multimed. Comput. Commun. Appl. (TOMM) 2020, 16, 34–45. [Google Scholar] [CrossRef]
- Algamal, Z.Y.; Lee, M.H. Penalized logistic regression with the adaptive LASSO for gene selection in high-dimensional cancer classification. Expert Syst. Appl. 2015, 42, 9326–9332. [Google Scholar] [CrossRef]
- Shang, H.; Liu, Z. Network-based prioritization of cancer genes by integrative ranks from multi-omics data. Comput. Biol. Med. 2020, 119, 103692–103699. [Google Scholar] [CrossRef] [PubMed]
- Krijger, P.H.L.; De Laat, W. Regulation of disease-associated gene expression in the 3D genome. Nat. Rev. Mol. Cell Biol. 2016, 17, 771–782. [Google Scholar] [CrossRef]
- Rodrigues, L.F.; Naldi, M.C.; Mari, J.F. Comparing convolutional neural networks and preprocessing techniques for HEp-2 cell classification in immunofluorescence images. Comput. Biol. Med. 2020, 116, 103542–103555. [Google Scholar] [CrossRef]
- McCarthy, D.J.; Campbell, K.R.; Lun, A.T.L.; Wills, Q.F. Scater: Pre-processing, quality control, normalization and visualization of single-cell RNA-seq data in R. Bioinformatics 2017, 33, 1179–1186. [Google Scholar] [CrossRef] [Green Version]
- Law, C.W.; Chen, Y.; Shi, W.; Smyth, G.K. Voom: Precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biol. 2014, 15, 1–17. [Google Scholar] [CrossRef] [Green Version]
- Mostafa, S.A.; Mustapha, A.; Mohammed, M.A.; Hamed, R.I.; Arunkumar, N.; Ghani, M.K.A.; Jaber, M.M.; Khaleefah, S.H. Examining multiple feature evaluation and classification methods for improving the diagnosis of Parkinson’s disease. Cogn. Syst. Res. 2019, 54, 90–99. [Google Scholar] [CrossRef]
- Wang, Q.; Zhou, Y.; Zhang, W.; Tang, Z.; Chen, X. Adaptive Sampling Using Self-paced Learning for Imbalanced Cancer Data Pre-diagnosis. Expert Syst. Appl. 2020, 152, 113334–113341. [Google Scholar] [CrossRef]
- Feng, C.; Xu, Y.; Liu, J.; Gao, Y.; Zheng, C. Supervised Discriminative Sparse PCA for Com-Characteristic Gene Selection and Tumor Classification on Multiview Biological Data. IEEE Trans. Neural Netw. Learn. Syst. 2019, 30, 2926–2937. [Google Scholar] [CrossRef] [Green Version]
- Ghosh, M.; Begum, S.; Sarkar, R.; Chakraborty, D.; Maulik, U. Recursive Memetic Algorithm for gene selection in microarray data. Expert Syst. Appl. 2019, 116, 172–185. [Google Scholar] [CrossRef]
- Chen, Y.; Li, Y.; Narayan, R.; Subramanian, A.; Xie, X. Gene expression inference with deep learning. Bioinformatics 2016, 32, 1832–1839. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Man, Y.; Liu, G.; Yang, K.; Zhou, X. SNFM: A semi-supervised NMF algorithm for detecting biological functional modules. Math. Bioences Eng. MBE 2019, 16, 1933–1948. [Google Scholar] [CrossRef] [PubMed]
- Tamposis, I.A.; Tsirigos, K.D.; Theodoropoulou, M.C.; Kontou, P.I.; Bagos, P.G. Semi-supervised learning of Hidden Markov Models for biological sequence analysis. Bioinformatics 2019, 35, 2208–2215. [Google Scholar] [CrossRef] [PubMed]
- Wang, Q.; Xia, L.Y.; Chai, H.; Zhou, Y. Semi-Supervised Learning with Ensemble Self-Training for Cancer Classification. In Proceedings of the 2018 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computing, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI), Guangzhou, China, 8–12 October 2018; pp. 796–803. [Google Scholar]
- Xia, C.; Han, K.; Qi, Y.; Zhang, Y.; Yu, D. A Self-Training Subspace Clustering Algorithm under Low-Rank Representation for Cancer Classification on Gene Expression Data. IEEE/ACM Trans. Comput. Biol. Bioinform. 2018, 15, 1315–1324. [Google Scholar] [CrossRef]
- Lomsadze, A.; Terhovhannisyan, V.; Chernoff, Y.O.; Borodovsky, M. Gene identification in novel eukaryotic genomes by self-training algorithm. Nucleic Acids Res. 2005, 33, 6494–6506. [Google Scholar] [CrossRef]
- Kong, Y.; Yu, T. A Deep Neural Network Model using Random Forest to Extract Feature Representation for Gene Expression Data Classification. Sci. Rep. 2018, 8, 16477. [Google Scholar] [CrossRef] [Green Version]
- Gao, F.; Wang, W.; Tan, M.; Zhu, L.; Wang, X. DeepCC: A novel deep learning-based framework for cancer molecular subtype classification. Oncogenesis 2019, 8, 44–56. [Google Scholar] [CrossRef]
- Zhou, Z.; Feng, J. Deep Forest: Towards An Alternative to Deep Neural Networks. In Proceedings of the International Joint Conference on Artificial Intelligence, Melbourne, Australia, 19–25 August 2017; pp. 3553–3559. [Google Scholar]
- Xia, L.Y.; Wang, Q.Y.; Cao, Z.; Liang, Y. Descriptor selection improvements for quantitative structure-activity relationships. Int. J. Neural Syst. 2019, 29, 1950016–1950032. [Google Scholar] [CrossRef]
- Deng, H.; Runger, G. Gene selection with guided regularized random forest. Pattern Recognit. 2013, 46, 3483–3489. [Google Scholar] [CrossRef] [Green Version]
- Fang, H.; Huang, C.; Zhao, H.; Deng, M. CCLasso: Correlation Inference for Compositional Data through Lasso. Bioinformatics 2015, 31, 3172–3180. [Google Scholar] [CrossRef] [Green Version]
- Gunst, M.C.M.D. Identification of context-specific gene regulatory networks with GEMULA—Gene expression modeling using LAsso. Bioinformatics 2012, 28, 214–221. [Google Scholar]
- Sulaimanov, N.; Kumar, S.; Burdet, F.; Ibberson, M.; Pagni, M.; Koeppl, H. Inferring gene expression networks with hubs using a degree weighted Lasso approach. Bioinformatics 2019, 35, 987–994. [Google Scholar] [CrossRef] [Green Version]
- Xin, B.; Hu, L.; Wang, Y.; Gao, W. Stable feature selection from brain sMRI. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, Austin, TX, USA, 25–30 January 2015; pp. 1910–1916. [Google Scholar]
- Wolberg, W.H.; Mangasarian, O.L. Multisurface method of pattern separation for medical diagnosis applied to breast cytology. Proc. Natl. Acad. Sci. USA 1990, 87, 9193–9196. [Google Scholar] [CrossRef] [Green Version]
- Penareyes, C.A.; Sipper, M. Fuzzy CoCo: A cooperative-coevolutionary approach to fuzzy modeling. IEEE Trans. Fuzzy Syst. 2001, 9, 727–737. [Google Scholar] [CrossRef] [Green Version]
- Karabatak, M. A new classifier for breast cancer detection based on Naïve Bayesian. Measurement 2015, 72, 32–36. [Google Scholar] [CrossRef]
- Sirinukunwattana, K.; Raza, S.; Tsang, Y.W.; Snead, D.; Cree, I.; Rajpoot, N. Locality Sensitive Deep Learning for Detection and Classification of Nuclei in Routine Colon Cancer Histology Images. IEEE Trans. Med. Imaging 2016, 35, 1196–1206. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Bejnordi, B.E.; Veta, M.; Van Diest, P.J.; Van Ginneken, B.; Karssemeijer, N.; Litjens, G.J.S.; Der Laak, J.A.W.M.V.; Hermsen, M.; Manson, Q.F.; Balkenhol, M.; et al. Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer. JAMA 2017, 318, 2199–2210. [Google Scholar] [CrossRef] [PubMed]
- Utkin, L.V.; Ryabinin, M.A. A Siamese deep forest. Knowl.-Based Syst. 2018, 139, 13–22. [Google Scholar] [CrossRef]
- Feng, J.; Zhou, Z.H. AutoEncoder by Forest. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; pp. 2967–2973. [Google Scholar]
- Utkin, L.V.; Ryabinin, M.A. Discriminative Metric Learning with Deep Forest. Int. J. Artif. Intell. Tools 2019, 28, 1950007–1950019. [Google Scholar] [CrossRef] [Green Version]
- Zhou, M.; Zeng, X.; Chen, A. Deep Forest Hashing for Image Retrieval. Pattern Recognit. 2019, 95, 114–127. [Google Scholar] [CrossRef]
- Guo, Y.; Liu, S.; Li, Z.; Shang, X. BCDForest: A boosting cascade deep forest model towards the classification of cancer subtypes based on gene expression data. BMC Bioinform. 2018, 19, 118–120. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Chisanga, D.; Keerthikumar, S.; Mathivanan, S.; Chilamkurti, N. Integration of heterogeneous ‘omics’ data using semi-supervised network labelling to identify essential genes in colorectal cancer. Comput. Electr. Eng. 2018, 67, 267–277. [Google Scholar] [CrossRef]
- Chai, H.; Li, Z.N.; Meng, D.Y.; Xia, L.Y.; Liang, Y. A new semi-supervised learning model combined with Cox and SP-AFT models in cancer survival analysis. Sci. Rep. 2017, 7, 13053–13062. [Google Scholar] [CrossRef]
- McClosky, D.; Charniak, E.; Johnson, M. Effective self-training for parsing. In Proceedings of the Main Conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, New York, NY, USA, 4–9 June 2006; pp. 152–159. [Google Scholar]
- Yu, Y.; Ji, Z.; Li, X.; Guo, J.; Zhang, Z.; Ling, H.; Wu, F. Transductive zero-shot learning with a self-training dictionary approach. IEEE Trans. Cybern. 2018, 48, 2908–2919. [Google Scholar] [CrossRef]
- Wu, D.; Shang, M.; Luo, X.; Xu, J.; Yan, H.; Deng, W.; Wang, G. Self-training semi-supervised classification based on density peaks of data. Neurocomputing 2018, 275, 180–191. [Google Scholar] [CrossRef]
- Sali, L.; Delsanto, S.; Sacchetto, D.; Correale, L.; Falchini, M.; Ferraris, A.; Gandini, G.; Grazzini, G.; Iafrate, F.; Iussich, G.; et al. Computer-based self-training for CT colonography with and without CAD. Eur. Radiol. 2018, 28, 4783–4791. [Google Scholar] [CrossRef]
- Tanha, J.; van Someren, M.; Afsarmanesh, H. Semi-supervised self-training for decision tree classifiers. Int. J. Mach. Learn. Cybern. 2017, 8, 355–370. [Google Scholar] [CrossRef] [Green Version]
- Liu, X.; Wang, S.; Zhang, H.; Zhang, H.; Yang, Z.; Liang, Y. Novel regularization method for biomarker selection and cancer classification. IEEE/Acm Trans. Comput. Biol. Bioinform. 2019, 17, 1329–1340. [Google Scholar] [CrossRef]
- Mordelet, F.; Horton, J.R.; Hartemink, A.J.; Engelhardt, B.E.; Gordân, R. Stability selection for regression-based models of transcription factor—DNA binding specificity. Bioinformatics 2013, 29, 117–125. [Google Scholar] [CrossRef]
- Tibshirani, R. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B 1996, 58, 267–288. [Google Scholar] [CrossRef]
- Rosset, S.; Zhu, J. Piecewise linear regularized solution paths. Ann. Stat. 2007, 35, 1012–1030. [Google Scholar] [CrossRef] [Green Version]
- Zhang, T. Analysis of Multi-stage Convex Relaxation for Sparse Regularization. J. Mach. Learn. Res. 2010, 11, 1081–1107. [Google Scholar]
- Zeng, J.; Xu, Z.; Zhang, B.; Hong, W.; Wu, Y. Accelerated L1/2 regularization based SAR imaging via BCR and reduced Newton skills. Signal Process. 2013, 93, 1831–1844. [Google Scholar] [CrossRef]
- Zou, H.; Hastie, T. Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B-Stat. Methodol. 2005, 67, 301–320. [Google Scholar] [CrossRef] [Green Version]
- Chen, C.M.; Liu, Y.C.; Chen, Y.J.; Chou, H.C. Genome-Wide Analysis of DNA Methylation in Hyperoxia- Exposed Newborn Rat Lung. Lung 2017, 195, 661–669. [Google Scholar] [CrossRef]
- Zhang, X.; Qian, Y.; Li, F.; Bei, S.; Li, M.; Feng, L. microRNA-9 selectively targets LMX1A to promote gastric cancer cell progression. Biochem. Biophys. Res. Commun. 2018, 505, 405–412. [Google Scholar] [CrossRef]
- Agarwal, S.; Hynes, P.G.; Tillman, H.; Lake, R.; Aboukheir, W.; Fang, L.; Casey, O.; Ameri, A.H.; Martin, P.; Yin, J.J.; et al. Identification of Different Classes of Luminal Progenitor Cells within Prostate Tumors. Cell Rep. 2015, 13, 2147–2158. [Google Scholar] [CrossRef] [Green Version]
Dataset | Disease Type | No. of Samples | No. of Genes | Microarray Platform | Class |
---|---|---|---|---|---|
1 | Lung | 187 | 22,215 | Affymetrix Human Genome U133A Array | Normal/Tumour |
2 | Breast | 310 | 54,677 | Affymetrix Human Genome U133 Plus 2.0 Array | Normal/Tumour |
3 | Prostate | 102 | 12,600 | Hybridization to U95Av2 arrays | Normal/Tumour |
Dataset | Disease Type | Labelled Samples | Unlabelled Samples | Testing Samples | No. of Genes |
---|---|---|---|---|---|
1 | Lung | 65 | 65 | 57 | 22,215 |
2 | Breast | 109 | 109 | 92 | 54,677 |
3 | Prostate | 36 | 36 | 30 | 12,600 |
Dataset | Model | Accuracy | AUC | Recall | Precision | F1-Score |
---|---|---|---|---|---|---|
Lung cancer | LR | 0.5926 | 0.5885 | 0.4074 | 0.6471 | 0.5000 |
SVM | 0.6481 | 0.6406 | 0.4815 | 0.7222 | 0.5778 | |
RF | 0.5926 | 0.6036 | 0.8148 | 0.5641 | 0.6667 | |
DNNs | 0.6023 | 0.6173 | 0.5555 | 0.9259 | 0.6944 | |
DF | 0.6618 | 0.6708 | 0.7037 | 0.6333 | 0.6667 | |
DSST | 0.7389 | 0.7209 | 0.7778 | 0.7000 | 0.7368 | |
Breast cancer | LR | 0.7128 | 0.7091 | 0.8909 | 0.7000 | 0.7840 |
SVM | 0.5957 | 0.5921 | 0.9818 | 0.5934 | 0.7397 | |
RF | 0.7447 | 0.7245 | 0.9818 | 0.7013 | 0.8182 | |
DNNs | 0.7021 | 0.7170 | 0.7636 | 0.7368 | 0.7500 | |
DF | 0.7766 | 0.7702 | 0.8182 | 0.8036 | 0.8108 | |
DSST | 0.8085 | 0.8093 | 0.8545 | 0.8246 | 0.8393 | |
Prostate cancer | LR | 0.6333 | 0.6429 | 0.6667 | 0.6250 | 0.6452 |
SVM | 0.5862 | 0.6381 | 0.6667 | 0.5882 | 0.6250 | |
RF | 0.6552 | 0.6762 | 0.7333 | 0.6471 | 0.6875 | |
DNNs | 0.6333 | 0.6286 | 0.7333 | 0.6111 | 0.6667 | |
DF | 0.6897 | 0.7238 | 0.8000 | 0.6667 | 0.7273 | |
DSST | 0.7931 | 0.7857 | 0.7333 | 0.8462 | 0.7857 |
Gene Name | Gene Symbol | Stable Score | p-Value |
---|---|---|---|
USP6 N-terminal like | (USP6NL) | 1 | <0.01 |
acyl-CoA oxidase 2 | (ACOX2) | 0.98 | <0.01 |
agouti related neuropeptide | (AGRP) | 0.53 | <0.01 |
HECT, UBA and WWE domain containing 1, E3 ubiquitin protein ligase | (HUWE1) | 0.99 | <0.01 |
calcium/calmodulin dependent protein kinase II beta | (CAMK2B) | 1 | <0.01 |
tripartite motif containing 5 | (TRIM5) | 1 | <0.01 |
Janus kinase 3 | (JAK3) | 1 | <0.01 |
sperm antigen with calponin homology and coiled-coil domains 1 like | (SPECC1L) | 0.96 | <0.01 |
echinoderm microtubule associated protein like 3 | (EML3) | 1 | <0.01 |
glycosylphosphatidylinositol anchor attachment protein 1 homolog (yeast) pseudogene | (LOC100288570) | 1 | <0.01 |
Gene Name | Gene Symbol | Stable Score | p-Value |
---|---|---|---|
LIM homeobox transcription factor 1 alpha | (LMX1A) | 0.96 | <0.01 |
tRNA methyltransferase 44 homolog (S. cerevisiae) | (TRMT44) | 0.95 | <0.01 |
NLR family pyrin domain containing 1 | (NLRP1) | 0.69 | <0.01 |
ret finger protein like 2 | (RFPL2) | 1 | <0.01 |
C-C motif chemokine ligand 16 | (CCL16) | 0.92 | <0.01 |
opioid receptor mu 1 | (OPRM1) | 0.7 | <0.01 |
ubiquitin conjugating enzyme E2 H | (UBE2H) | 0.81 | <0.01 |
potassium calcium-activated channel subfamily N member 3 | (KCNN3) | 0.98 | <0.01 |
haemoglobin subunit mu | (HBM) | 1 | <0.01 |
E2F transcription factor 4 | (E2F4) | 1 | <0.01 |
Gene Name | Gene Symbol | Stable Score | p-Value |
---|---|---|---|
fms related tyrosine kinase 1 | (FLT1) | 0 | <0.01 |
tumour protein p63 | (TP63) | 0.56 | <0.01 |
UDP glucuronosyltransferase family 1 member A10 | (UGT1A10) | 0.73 | <0.01 |
T-box 5 | (TBX5) | 1 | <0.01 |
potassium voltage-gated channel subfamily A member 5 | (KCNA5) | 1 | <0.01 |
myosin XVI | (MYO16) | 0.83 | <0.01 |
inhibitor of DNA binding 1, HLH protein | (ID1) | 0.86 | <0.01 |
cathepsin G | (CTSG) | 1 | <0.01 |
X-box binding protein 1 | (XBP1) | 0.95 | <0.01 |
fibroblast growth factor receptor 1 | (FGFR1) | 1 | <0.01 |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yin, C.; Chen, Z. Developing Sustainable Classification of Diseases via Deep Learning and Semi-Supervised Learning. Healthcare 2020, 8, 291. https://doi.org/10.3390/healthcare8030291
Yin C, Chen Z. Developing Sustainable Classification of Diseases via Deep Learning and Semi-Supervised Learning. Healthcare. 2020; 8(3):291. https://doi.org/10.3390/healthcare8030291
Chicago/Turabian StyleYin, Chunwu, and Zhanbo Chen. 2020. "Developing Sustainable Classification of Diseases via Deep Learning and Semi-Supervised Learning" Healthcare 8, no. 3: 291. https://doi.org/10.3390/healthcare8030291
APA StyleYin, C., & Chen, Z. (2020). Developing Sustainable Classification of Diseases via Deep Learning and Semi-Supervised Learning. Healthcare, 8(3), 291. https://doi.org/10.3390/healthcare8030291