pcPromoter-CNN: A CNN-Based Prediction and Classification of Promoters
Abstract
:1. Introduction
- Selection and creation of benchmark dataset
- Numerical expression of dataset and DNA Sequence
- Proposal of powerful prediction architecture
- Performance evaluation of predictor using cross-validation
- Development of a web server to provide public access to predictor
2. Benchmark Dataset
3. Numerical Expression of DNA Sequence
T (0, 1, 0, 0)
C (0, 0, 1, 0)
G (0, 0, 0, 1)
4. Proposed Methodology
4.1. Model Setup
4.2. Proposed CNN Architecture
5. Results and Discussion
5.1. Evaluation Metrics
5.2. Performance Evaluation
6. Webserver
7. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Busby, R.H.; Ebright, S. Promoter structure, promoter recognition, and transcription activation in prokaryotes. Cell 1994, 79, 743–746. [Google Scholar] [CrossRef]
- Jishage, M.; Ishihama, A. Regulation of RNA polymerase sigma subunit synthesis in Escherichia coli: Intracellular levels of sigma 70 and sigma 38. J. Bacteriol. 1995, 177, 6832–6835. [Google Scholar] [CrossRef] [Green Version]
- Bunney, P.; Zink, A.; Holm, A.; Billington, C.; Kotz, C. Orexin activation counteracts decreases in nonexercise activity thermogenesis (NEAT) caused by high-fat diet. Physiol. Behav. 2017, 176, 139–148. [Google Scholar] [CrossRef] [PubMed]
- Goldman, S.R.; Nair, N.U.; Wells, C.D.; Nickels, B.E.; Hochschild, A. The primary σ factor in Escherichia coli can access the transcription elongation complex from solution in vivo. eLife 2015, 4, 443. [Google Scholar] [CrossRef]
- Towsey, M.; Timms, P.; Hogan, J.; Mathews, S.A. The cross-species prediction of bacterial promoters using a support vector machine. Comput. Biol. Chem. 2008, 32, 359–366. [Google Scholar] [CrossRef] [PubMed]
- Nizami, I.F.; Rehman, M.U.; Majid, M.; Anwar, S.M. Natural scene statistics model independent no-reference image quality assessment using patch based discrete cosine transform. Multimed. Tools Appl. 2020, 79, 26285–26304. [Google Scholar] [CrossRef]
- Nizami, I.F.; Majid, M.; Rehman, M.U.; Anwar, S.M.; Nasim, A.; Khurshid, K. No-reference image quality assessment using bag-of-features with feature selection. Multimed. Tools Appl. 2020, 79, 7811–7836. [Google Scholar] [CrossRef]
- Abbas, Z.; Rehman, M.-U.; Najam, S.; Rizvi, S.D. An Efficient Gray-Level Co-Occurrence Matrix (GLCM) based Approach towards Classification of Skin Lesion. In Proceedings of the 2019 Amity International Conference on Artificial Intelligence (AICAI), Dubai, United Arab Emirates, 4–6 February 2019; pp. 317–320. [Google Scholar]
- Rehman, M.U.; Abbas, Z.; Khan, S.H.; Ghani, S.H.; Najam. Diabetic retinopathy fundus image classification using discrete wavelet transform. In Proceedings of the 2018 2nd International Conference on Engineering Innovation (ICEI), Bangkok, Thailand, 5–6 July 2018; pp. 75–80. [Google Scholar]
- Khan, A.; Ilyas, T.; Umraiz, M.; Mannan, Z.I.; Kim, H. CED-Net: Crops and Weeds Segmentation for Smart Farming Using a Small Cascaded Encoder-Decoder Architecture. Electronics 2020, 9, 1602. [Google Scholar] [CrossRef]
- Lin, H.; Liang, Z.-Y.; Tang, H.; Chen, W. Identifying Sigma70 Promoters with Novel Pseudo Nucleotide Composition. IEEE/ACM Trans. Comput. Biol. Bioinform. 2019, 16, 1316–1321. [Google Scholar] [CrossRef]
- Song, K. Recognition of prokaryotic promoters based on a novel variable-window Z-curve method. Nucleic Acids Res. 2011, 40, 963–971. [Google Scholar] [CrossRef] [Green Version]
- Coelho, R.V.; Silva, S.D.A.E.; Echeverrigaray, S.; Delamare, A.P.L. Bacillus subtilis promoter sequences data set for promoter prediction in Gram-positive bacteria. Data Brief 2018, 19, 264–270. [Google Scholar] [CrossRef] [PubMed]
- Silva, S.D.A.E.; Forte, F.; Sartor, I.T.; Andrighetti, T.; Gerhardt, G.J.L.; Delamare, A.P.L.; Echeverrigaray, L. DNA duplex stability as discriminative characteristic for Escherichia coli σ54- and σ28- dependent promoter sequences. Biologicals 2014, 42, 22–28. [Google Scholar] [CrossRef]
- Koumakis, L. Deep learning models in genomics; are we there yet? Comput. Struct. Biotechnol. J. 2020, 18, 1466–1473. [Google Scholar] [CrossRef] [PubMed]
- Le, N.Q.K.; Yapp, E.K.Y.; Nagasundaram, N.; Yeh, H.-Y. Classifying Promoters by Interpreting the Hidden Information of DNA Sequences via Deep Learning and Combination of Continuous FastText N-Grams. Front. Bioeng. Biotechnol. 2019, 7, 1–9. [Google Scholar] [CrossRef] [Green Version]
- Rahman, S.; Aktar, U.; Jani, R.; Shatabda, S. iPromoter-FSEn: Identification of bacterial σ70 promoter sequences using feature subspace based ensemble classifier. Genomics 2019, 111, 1160–1166. [Google Scholar] [CrossRef] [PubMed]
- Umarov, R.K.; Solovyev, V. Recognition of prokaryotic and eukaryotic promoters using convolutional deep learning neural networks. PLoS ONE 2017, 12, e0171410. [Google Scholar] [CrossRef]
- Liu, B.; Yang, F.; Huang, D.-S.; Chou, K.-C. iPromoter-2L: A two-layer predictor for identifying promoters and their types by multi-window-based PseKNC. Bioinformatics 2018, 34, 33–40. [Google Scholar] [CrossRef]
- Zhang, M.; Li, F.; Marquez-Lago, T.T.; Leier, A.; Fan, C.; Kwoh, C.K.; Chou, K.-C.; Song, J.; Jia, C. MULTiPly: A novel multi-layer predictor for discovering general and specific types of promoters. Bioinformatics 2019, 35, 2957–2965. [Google Scholar] [CrossRef]
- Amin, R.; Rahman, C.R.; Ahmed, S.; Sifat, H.R.; Liton, N.K.; Rahman, M.; Khan, Z.H.; Shatabda, S. iPromoter-BnCNN: A novel branched CNN-based predictor for identifying and classifying sigma promoters. Bioinformatics 2020, 36, 4869–4875. [Google Scholar] [CrossRef]
- Chen, W.; Tang, H.; Ye, J.; Lin, H.; Chou, K.-C. iRNA-PseU: Identifying RNA pseudouridine sites. Mol. Ther. Nucleic Acids 2016, 5, e332. [Google Scholar] [CrossRef]
- Feng, P.; Ding, H.; Yang, H.; Chen, W.; Lin, H.; Chou, K.-C. iRNA-PseColl: Identifying the Occurrence Sites of Different RNA Modifications by Incorporating Collective Effects of Nucleotides into PseKNC. Mol. Ther. Nucleic Acids 2017, 7, 155–163. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Jia, J.; Liu, Z.; Xiao, X.; Liu, B.; Chou, K.-C. iSuc-PseOpt: Identifying lysine succinylation sites in proteins by incorporating sequence-coupling effects into pseudo components and optimizing imbalanced training dataset. Anal. Biochem. 2016, 497, 48–56. [Google Scholar] [CrossRef] [PubMed]
- Chou, K.-C. Some remarks on protein attribute prediction and pseudo amino acid composition. J. Theor. Biol. 2011, 273, 236–247. [Google Scholar] [CrossRef] [PubMed]
- Gama-Castro, S.; Salgado, H.; Santos-Zavaleta, A.; Ledezma-Tejeida, D.; Muñiz-Rascado, L.; García-Sotelo, J.S.; Alquicira-Hernández, K.; Martínez-Flores, I.; Pannier, L.; Castro-Mondragón, J.A.; et al. RegulonDB version 9.0: High-level integration of gene regulation, coexpression, motif clustering and beyond. Nucleic Acids Res. 2016, 44, D133–D143. [Google Scholar] [CrossRef] [Green Version]
- Santos-Zavaleta, A.; Salgado, H.; Gama-Castro, S.; Sánchez-Pérez, M.; Gómez-Romero, L.; Ledezma-Tejeida, D.; García-Sotelo, J.S.; Alquicira-Hernández, K.; Muñiz-Rascado, L.J.; Peña-Loredo, P.; et al. RegulonDB v 10.5: Tackling challenges to unify classic and high throughput knowledge of gene regulation in E. coli K-12. Nucleic Acids Res. 2019, 47, D212–D220. [Google Scholar] [CrossRef] [Green Version]
- Lecun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef]
- Rehman, M.-U.; Khan, S.H.; Abbas, Z.; Rizvi, S.D. Classification of Diabetic Retinopathy Images Based on Customised CNN Architecture. In Proceedings of the 2019 Amity International Conference on Artificial Intelligence (AICAI), Dubai, United Arab Emirates, 4–6 February 2019; pp. 244–248. [Google Scholar]
- Rehman, M.U.; Khan, S.H.; Rizvi, S.M.D.; Abbas, Z.; Zafar, A. Classification of Skin Lesion by Interference of Segmentation and Convolotion Neural Network. In Proceedings of the 2018 2nd International Conference on Engineering Innovation (ICEI), Bangkok, Thailand, 5–6 July 2018; pp. 81–85. [Google Scholar]
- Wahab, A.; Mahmoudi, O.; Kim, J.; Chong, K.T. DNC4mC-Deep: Identification and Analysis of DNA N4-Methylcytosine Sites Based on Different Encoding Schemes by Using Deep Learning. Cells 2020, 9, 1756. [Google Scholar] [CrossRef]
- Abbas, Z.; Tayara, H.; Chong, K.T. SpineNet-6mA: A Novel Deep Learning Tool for Predicting DNA N6-Methyladenine Sites in Genomes. IEEE Access 2020, 8, 201450–201457. [Google Scholar] [CrossRef]
- Wahab, A.; Ali, S.D.; Tayara, H.; Chong, K.T. iIM-CNN: Intelligent Identifier of 6mA Sites on Different Species by Using Convolution Neural Network. IEEE Access 2019, 7, 178577–178583. [Google Scholar] [CrossRef]
- Ali, S.D.; Alam, W.; Tayara, H.; Chong, K. Identification of Functional piRNAs Using a Convolutional Neural Network. IEEE/ACM Trans. Comput. Biol. Bioinform. 2020, 14, 1. [Google Scholar] [CrossRef]
- Park, S.; Wahab, A.; Nazari, I.; Ryu, J.H.; Chong, K.T. i6mA-DNC: Prediction of DNA N6-Methyladenosine sites in rice genome based on dinucleotide representation using deep learning. Chemom. Intell. Lab. Syst. 2020, 204, 104102. [Google Scholar] [CrossRef]
- Rehman, M.U.; Chong, K.T. DNA6mA-MINT: DNA-6mA Modification Identification Neural Tool. Genes 2020, 11, 898. [Google Scholar] [CrossRef] [PubMed]
- Mahmoudi, O.; Wahab, A.; Chong, K.T. iMethyl-Deep: N6 Methyladenosine Identification of Yeast Genome with Automatic Feature Extraction Technique by Using Deep Learning Algorithm. Genes 2020, 11, 529. [Google Scholar] [CrossRef] [PubMed]
- Nazari, I.; Tayara, H.; Chong, K.T. Branch Point Selection in RNA Splicing Using Deep Learning. IEEE Access 2018, 7, 1800–1807. [Google Scholar] [CrossRef]
- Oubounyt, M.; Louadi, Z.; Tayara, H.; Chong, K.T. DeePromoter: Robust Promoter Predictor Using Deep Learning. Front. Genet. 2019, 10, 286. [Google Scholar] [CrossRef] [Green Version]
- Tayara, H.; Tahir, M.; Chong, K.T. Identification of prokaryotic promoters and their strength by integrating heterogeneous features. Genomics 2020, 112, 1396–1403. [Google Scholar] [CrossRef]
Classes | Benchmark Dataset | Independent Test Dataset |
---|---|---|
Promotor | 2860 | 256 |
Non-Promotor | 2860 | 0 |
σ70 | 1694 | 199 |
σ54 | 94 | 0 |
σ38 | 163 | 10 |
σ32 | 291 | 13 |
σ28 | 134 | 04 |
σ24 | 484 | 30 |
Binary Classifier | Positive Class | Negative Class |
---|---|---|
ProMN | Promoter | Non-Promoter |
Sigma70 | 70 | 54, 38, 32, 28, 24 |
Sigma24 | 24 | 54, 38, 32, 28, |
Sigma28 | 28 | 54, 38, 32, |
Sigma38 | 54, 38 | |
Sigma32 | 54 |
Number Convolution Filters | 8, 16, 32, 64, 128 |
Convolution Kernel Size | 3, 5, 7, 9, 11 |
Pooling Layer Kernel Size | 2, 4 |
Dropout Ratio | 0.15, 0.20, 0.25, 0.30, 0.35, 0.40, 0.45 |
Dense Layer Neurons | 8, 16, 32, 64 |
Methods | Sn (%) | Sp (%) | Acc (%) | MCC |
---|---|---|---|---|
IPromoter-2L | 79.2 | 84.2 | 81.7 | 0.637 |
MULTiPly | 87.3 | 86.6 | 86.9 | 0.739 |
IPromoter-BnCNN | 88.3 | 88.0 | 88.2 | 0.763 |
pcPromoter-CNN | 89.84 | 90.38 | 90.11 | 0.802 |
Metrics | 24 | 28 | 32 | 38 | 70 | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Pc | Bn | Mu | Pc | Bn | Mu | Pc | Bn | Mu | Pc | Bn | Mu | Pc | Bn | Mu | |
Acc (%) | 96.8 | 93.8 | 91.2 | 98.9 | 96.1 | 95.9 | 97.9 | 90.6 | 85.7 | 96.5 | 91.6 | 85.3 | 92.1 | 87.3 | 84.9 |
Sn (%) | 88.5 | 93.3 | 88.8 | 84.6 | 97.8 | 95.9 | 87.7 | 91.7 | 82.2 | 87.2 | 94.9 | 83.3 | 94.9 | 91.0 | 90.4 |
Sp (%) | 98.5 | 94.1 | 92.9 | 99.6 | 93.6 | 91.3 | 99.0 | 89.8 | 88.4 | 98.9 | 89.3 | 86.7 | 87.9 | 82.2 | 76.9 |
MCC | 0.885 | 0.873 | 0.818 | 0.875 | 0.918 | 0.876 | 0.881 | 0.90 | 0.708 | 0.882 | 0.833 | 0.699 | 0.836 | 0.737 | 0.668 |
Parameter | Promoter | 24 | 28 | 32 | 38 | 70 | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Pc | Bn | Mu | Pc | Bn | Mu | Pc | Bn | Mu | Pc | Bn | Mu | Pc | Bn | Mu | Pc | Bn | Mu | |
TP | 236 | 245 | 238 | 24 | 28 | 19 | 2 | 1 | 0 | 12 | 10 | 5 | 6 | 3 | 4 | 180 | 179 | 180 |
FN | 20 | 11 | 18 | 6 | 2 | 11 | 2 | 3 | 4 | 1 | 3 | 8 | 4 | 7 | 6 | 19 | 20 | 19 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Shujaat, M.; Wahab, A.; Tayara, H.; Chong, K.T. pcPromoter-CNN: A CNN-Based Prediction and Classification of Promoters. Genes 2020, 11, 1529. https://doi.org/10.3390/genes11121529
Shujaat M, Wahab A, Tayara H, Chong KT. pcPromoter-CNN: A CNN-Based Prediction and Classification of Promoters. Genes. 2020; 11(12):1529. https://doi.org/10.3390/genes11121529
Chicago/Turabian StyleShujaat, Muhammad, Abdul Wahab, Hilal Tayara, and Kil To Chong. 2020. "pcPromoter-CNN: A CNN-Based Prediction and Classification of Promoters" Genes 11, no. 12: 1529. https://doi.org/10.3390/genes11121529
APA StyleShujaat, M., Wahab, A., Tayara, H., & Chong, K. T. (2020). pcPromoter-CNN: A CNN-Based Prediction and Classification of Promoters. Genes, 11(12), 1529. https://doi.org/10.3390/genes11121529