iProm-Sigma54: A CNN Base Prediction Tool for σ54 Promoters
Abstract
:1. Introduction
- Most previous studies predicted promoter sequences from . Classifying the anticipated promoter sequences from is uncommon.
- Not every study has produced a user-friendly and publicly accessible web server, making it difficult for most experimental scientists to use in practice.
- The false-positive predictions of the above-mentioned studies are remarkable because of the imbalanced dataset.
- The advancement of high-throughput whole-genome sequencing and the integration of verified promoter sequences has resulted in the development of databases such as “Pro54DB” [38], a database of promoters. Therefore, there is a need for a computational model to identify promoters, because databases play a vital role in development of computational tools.
2. Benchmark Dataset
3. Feature Encoding Scheme
4. Proposed Methodology
CNN Architecture
5. Results and Discussion
5.1. Evaluation Metrics
5.2. Results and Comparison
6. Webserver
- It accepts input using two different methods: direct sequence input and file uploading with sequences with up to one thousand sequences in a FASTA format. To upload a file format must be “.fa”.
- Set the threshold value ranges from 0–1.
- Click on “submit Sequence” button for the prediction results
7. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Ortiz-Merino, R.A.; Kuanyshev, N.; Byrne, K.P.; Varela, J.A.; Morrissey, J.P.; Porro, D.; Wolfe, K.H.; Branduardi, P. Transcriptional response to lactic acid stress in the hybrid yeast Zygosaccharomyces parabailii. Appl. Environ. Microbiol. 2018, 84, e02294-17. [Google Scholar] [CrossRef] [Green Version]
- Barrios, H.; Valderrama, B.; Morett, E. Compilation and analysis of σ54-dependent promoter sequences. Nucleic Acids Res. 1999, 27, 4305–4313. [Google Scholar] [CrossRef]
- Wigneshweraraj, S.; Bose, D.; Burrows, P.C.; Joly, N.; Schumacher, J.; Rappas, M.; Pape, T.; Zhang, X.; Stockley, P.; Severinov, K.; et al. Modus operandi of the bacterial RNA polymerase containing the σ54 promoter-specificity factor. Mol. Microbiol. 2008, 68, 538–546. [Google Scholar] [CrossRef]
- Kustu, S.; Santero, E.; Keener, J.; Popham, D.; Weiss, D. Expression of sigma 54 (ntrA)-dependent genes is probably united by a common mechanism. Microbiol. Rev. 1989, 53, 367–376. [Google Scholar] [CrossRef]
- Gardan, R.; Rapoport, G.; Débarbouillé, M. Expression of therocDEFOperon Involved in Arginine Catabolism inBacillus subtilis. J. Mol. Biol. 1995, 249, 843–856. [Google Scholar] [CrossRef]
- Zielinski, N.A.; Maharaj, R.; Roychoudhury, S.; Danganan, C.; Hendrickson, W.; Chakrabarty, A. Alginate synthesis in Pseudomonas aeruginosa: Environmental regulation of the algC promoter. J. Bacteriol. 1992, 174, 7680–7688. [Google Scholar] [CrossRef] [Green Version]
- Matsumine, H.; Yamamura, Y.; Hattori, N.; Kobayashi, T.; Kitada, T.; Yoritaka, A.; Mizuno, Y. A microdeletion of D6S305 in a family of autosomal recessive juvenile parkinsonism (PARK2). Genomics 1998, 49, 143–146. [Google Scholar] [CrossRef]
- Touzain, F.; Schbath, S.; Debled-Rennesson, I.; Aigle, B.; Kucherov, G.; Leblond, P. SIGffRid: A tool to search for sigma factor binding sites in bacterial genomes using comparative approach and biologically driven statistics. BMC Bioinform. 2008, 9, 73. [Google Scholar] [CrossRef]
- Kim, J.w.; Zeller, K.I.; Wang, Y.; Jegga, A.G.; Aronow, B.J.; O’Donnell, K.A.; Dang, C.V. Evaluation of myc E-box phylogenetic footprints in glycolytic genes by chromatin immunoprecipitation assays. Mol. Cell. Biol. 2004, 24, 5923–5936. [Google Scholar] [CrossRef] [Green Version]
- Dahl, J.A.; Collas, P. A rapid micro chromatin immunoprecipitation assay (ChIP). Nat. Protoc. 2008, 3, 1032–1045. [Google Scholar] [CrossRef]
- Lin, H.; Deng, E.Z.; Ding, H.; Chen, W.; Chou, K.C. iPro54-PseKNC: A sequence-based predictor for identifying sigma-54 promoters in prokaryote with pseudo k-tuple nucleotide composition. Nucleic Acids Res. 2014, 42, 12961–12972. [Google Scholar] [CrossRef] [Green Version]
- Prestridge, D.S. Predicting Pol II promoter sequences using transcription factor binding sites. J. Mol. Biol. 1995, 249, 923–932. [Google Scholar] [CrossRef]
- Knudsen, S. Promoter2.0: For the recognition of PolII promoter sequences. Bioinformatics 1999, 15, 356–361. [Google Scholar] [CrossRef] [Green Version]
- Down, T.A.; Hubbard, T.J. Computational detection and location of transcription start sites in mammalian genomic DNA. Genome Res. 2002, 12, 458–461. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Hutchinson, G. The prediction of vertebrate promoter regions using differential hexamer frequency analysis. Bioinformatics 1996, 12, 391–398. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Scherf, M.; Klingenhoff, A.; Werner, T. Highly specific localization of promoter regions in large genomic sequences by PromoterInspector: A novel context analysis approach. J. Mol. Biol. 2000, 297, 599–606. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Ohler, U.; Harbeck, S.; Niemann, H.; Reese, M.G. Interpolated markov chains for eukaryotic promoter recognition. Bioinformatics 1999, 15, 362–369. [Google Scholar] [CrossRef] [Green Version]
- Ioshikhes, I.P.; Zhang, M.Q. Large-scale human promoter mapping using CpG islands. Nat. Genet. 2000, 26, 61–63. [Google Scholar] [CrossRef]
- Davuluri, R.V.; Grosse, I.; Zhang, M.Q. Computational identification of promoters and first exons in the human genome. Nat. Genet. 2001, 29, 412–417. [Google Scholar] [CrossRef]
- Ponger, L.; Mouchiroud, D. CpGProD: Identifying CpG islands associated with transcription start sites in large genomic mammalian sequences. Bioinformatics 2002, 18, 631–633. [Google Scholar] [CrossRef] [Green Version]
- Yang, Y.; Zhang, R.; Singh, S.; Ma, J. Exploiting sequence-based features for predicting enhancer–promoter interactions. Bioinformatics 2017, 33, i252–i260. [Google Scholar] [CrossRef] [Green Version]
- Bharanikumar, R.; Premkumar, K.A.R.; Palaniappan, A. PromoterPredict: Sequence-based modelling of Escherichia coli σ70 promoter strength yields logarithmic dependence between promoter strength and sequence. PeerJ 2018, 6, e5862. [Google Scholar] [CrossRef] [Green Version]
- Kanhere, A.; Bansal, M. A novel method for prokaryotic promoter prediction based on DNA stability. BMC Bioinform. 2005, 6, 1. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Khan, A.; Ilyas, T.; Umraiz, M.; Mannan, Z.I.; Kim, H. Ced-net: Crops and weeds segmentation for smart farming using a small cascaded encoder-decoder architecture. Electronics 2020, 9, 1602. [Google Scholar] [CrossRef]
- Shah, A.A.; Malik, H.A.M.; Mohammad, A.; Khan, Y.D.; Alourani, A. Machine learning techniques for identification of carcinogenic mutations, which cause breast adenocarcinoma. Sci. Rep. 2022, 12, 11738. [Google Scholar] [CrossRef]
- Shujaat, M.; Aslam, N.; Noreen, I.; Ehsan, M.K.; Qureshi, M.A.; Ali, A.; Naz, N.; Qadeer, I. Intelligent and Integrated Framework for Exudate Detection in Retinal Fundus Images. Intell. Autom. Soft Comput. 2021, 30, 663–672. [Google Scholar] [CrossRef]
- Zeng, L.; Liu, Y.; Yu, Z.G.; Liu, Y. iEnhancer-DLRA: Identification of enhancers and their strengths by a self-attention fusion strategy for local and global features. Briefings Funct. Genom. 2022, 21, 399–407. [Google Scholar] [CrossRef]
- Lin, H.; Liang, Z.Y.; Tang, H.; Chen, W. Identifying sigma70 promoters with novel pseudo nucleotide composition. IEEE/ACM Trans. Comput. Biol. Bioinform. 2017, 16, 1316–1321. [Google Scholar] [CrossRef] [PubMed]
- Song, K. Recognition of prokaryotic promoters based on a novel variable-window Z-curve method. Nucleic Acids Res. 2012, 40, 963–971. [Google Scholar] [CrossRef] [Green Version]
- Rahman, M.S.; Aktar, U.; Jani, M.R.; Shatabda, S. iPromoter-FSEn: Identification of bacterial σ70 promoter sequences using feature subspace based ensemble classifier. Genomics 2019, 111, 1160–1166. [Google Scholar] [CrossRef]
- He, W.; Jia, C.; Duan, Y.; Zou, Q. 70ProPred: A predictor for discovering sigma70 promoters based on combining multiple features. BMC Syst. Biol. 2018, 12, 44. [Google Scholar] [CrossRef]
- Coppens, L.; Lavigne, R. SAPPHIRE: A neural network based classifier for σ70 promoter prediction in Pseudomonas. BMC Bioinform. 2020, 21, 415. [Google Scholar] [CrossRef] [PubMed]
- Liu, B.; Li, K. iPromoter-2L2. 0: Identifying promoters and their types by combining smoothing cutting window algorithm and sequence-based features. Mol. Ther.-Nucleic Acids 2019, 18, 80–87. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Zhang, M.; Li, F.; Marquez-Lago, T.T.; Leier, A.; Fan, C.; Kwoh, C.K.; Chou, K.C.; Song, J.; Jia, C. MULTiPly: A novel multi-layer predictor for discovering general and specific types of promoters. Bioinformatics 2019, 35, 2957–2965. [Google Scholar] [CrossRef]
- Amin, R.; Rahman, C.R.; Ahmed, S.; Sifat, M.H.R.; Liton, M.N.K.; Rahman, M.M.; Khan, M.Z.H.; Shatabda, S. iPromoter-BnCNN: A novel branched CNN-based predictor for identifying and classifying sigma promoters. Bioinformatics 2020, 36, 4869–4875. [Google Scholar] [CrossRef] [PubMed]
- Shujaat, M.; Wahab, A.; Tayara, H.; Chong, K.T. pcPromoter-CNN: A CNN-based prediction and classification of promoters. Genes 2020, 11, 1529. [Google Scholar] [CrossRef] [PubMed]
- Hernández, D.; Jara, N.; Araya, M.; Durán, R.E.; Buil-Aranda, C. PromoterLCNN: A Light CNN-Based Promoter Prediction and Classification Model. Genes 2022, 13, 1126. [Google Scholar] [CrossRef]
- Liang, Z.Y.; Lai, H.Y.; Yang, H.; Zhang, C.J.; Yang, H.; Wei, H.H.; Chen, X.X.; Zhao, Y.W.; Su, Z.D.; Li, W.C.; et al. Pro54DB: A database for experimentally verified sigma-54 promoters. Bioinformatics 2017, 33, 467–469. [Google Scholar] [CrossRef] [Green Version]
- Fu, L.; Niu, B.; Zhu, Z.; Wu, S.; Li, W. CD-HIT: Accelerated for clustering the next-generation sequencing data. Bioinformatics 2012, 28, 3150–3152. [Google Scholar] [CrossRef]
- Alam, W.; Tayara, H.; Chong, K.T. i4mC-Deep: An intelligent predictor of n4-methylcytosine sites using a deep learning approach with chemical properties. Genes 2021, 12, 1117. [Google Scholar] [CrossRef]
- Kim, J.; Shujaat, M.; Tayara, H. iProm-Zea: A two-layer model to identify plant promoters and their types using convolutional neural network. Genomics 2022, 114, 110384. [Google Scholar] [CrossRef] [PubMed]
- Shujaat, M.; Jin, J.S.; Tayara, H.; Chong, K.T. iProm-phage: A two-layer model to identify phage promoters and their types using a convolutional neural network. Front. Microbiol. 2022, 13, 1061122. [Google Scholar] [CrossRef] [PubMed]
- Oubounyt, M.; Louadi, Z.; Tayara, H.; Chong, K.T. DeePromoter: Robust promoter predictor using deep learning. Front. Genet. 2019, 10, 286. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Ilyas, T.; Khan, A.; Umraiz, M.; Kim, H. Seek: A framework of superpixel learning with cnn features for unsupervised segmentation. Electronics 2020, 9, 383. [Google Scholar] [CrossRef] [Green Version]
- Rashid, R.; Akram, M.U.; Hassan, T. Fully convolutional neural network for lungs segmentation from chest X-rays. In Proceedings of the International Conference Image Analysis and Recognition, Póvoa de Varzim, Portugal, 27–29 June 2018; Springer: Berlin/Heidelberg, Germany, 2018; pp. 71–80. [Google Scholar]
- Shah, A.A.; Alturise, F.; Alkhalifah, T.; Khan, Y.D. Deep Learning Approaches for Detection of Breast Adenocarcinoma Causing Carcinogenic Mutations. Int. J. Mol. Sci. 2022, 23, 11539. [Google Scholar] [CrossRef]
- Chipofya, M.; Tayara, H.; Chong, K.T. Drug Therapeutic-Use Class Prediction and Repurposing Using Graph Convolutional Networks. Pharmaceutics 2021, 13, 1906. [Google Scholar] [CrossRef]
- Chipofya, M.; Tayara, H.; Chong, K.T. Deep probabilistic learning model for prediction of ionic liquids toxicity. Int. J. Mol. Sci. 2022, 23, 5258. [Google Scholar] [CrossRef]
- Chantsalnyam, T.; Lim, D.Y.; Tayara, H.; Chong, K.T. ncRDeep: Non-coding RNA classification with convolutional neural network. Comput. Biol. Chem. 2020, 88, 107364. [Google Scholar] [CrossRef]
- Nazari, I.; Tayara, H.; Chong, K.T. Branch point selection in RNA splicing using deep learning. IEEE Access 2018, 7, 1800–1807. [Google Scholar] [CrossRef]
Class | Benchmark Dataset | Test Dataset | Sequence Length |
---|---|---|---|
Promoter | 168 | 42 | 81 bp |
Non-Promoter | 2288 | 500 | 81 bp |
Parameters | Range |
---|---|
Number of Conv1D | [2, 3, 4, 5] |
Filters Size in Conv1D | [8, 12, 16, 22, 32, 42, 64, 128] |
Kernel Sizes in Conv1D | [2, 3, 4, 5, 6, 7, 8, 10, 12, 14] |
Max-pooling Pool Size | [2, 4, 6] |
Max-pooling Stride length | [2, 4] |
Values of Dropout | [0.2, 0.25, 0.3, 0.35, 0.4, 0.45, 0.5] |
Neurons of Dense Layer | [8, 16, 32, 64, 80, 100] |
Model | Acc (%) | Sn (%) | Sp (%) | MCC |
---|---|---|---|---|
iPromoter-2L | 94.04 | 53.19 | 99.57 | 0.65 |
iPro54-PseKNC | 78.57 | 86.96 | 70.19 | 0.58 |
iPromoter-BnCNN | 99.3 | 74.4 | 99.8 | 0.78 |
PromoterLCNN | 99.4 | 68.0 | 99.9 | 0.80 |
iProm-Sigma54 | 95.45 | 96.53 | 90.64 | 0.858 |
Model | Acc (%) | Sn (%) | Sp (%) | MCC |
---|---|---|---|---|
iPromoter-2L | 81.23 | 92.27 | 63.57 | 0.483 |
iPro54-PseKNC | 78.52 | 97.56 | 76.95 | 0.436 |
iPromoter-BnCNN | 92.98 | 57.14 | 95.23 | 0.516 |
iProm-Sigma54 | 98.40 | 95.12 | 97.19 | 0.9113 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Shujaat, M.; Kim, H.; Tayara, H.; Chong, K.T. iProm-Sigma54: A CNN Base Prediction Tool for σ54 Promoters. Cells 2023, 12, 829. https://doi.org/10.3390/cells12060829
Shujaat M, Kim H, Tayara H, Chong KT. iProm-Sigma54: A CNN Base Prediction Tool for σ54 Promoters. Cells. 2023; 12(6):829. https://doi.org/10.3390/cells12060829
Chicago/Turabian StyleShujaat, Muhammad, Hoonjoo Kim, Hilal Tayara, and Kil To Chong. 2023. "iProm-Sigma54: A CNN Base Prediction Tool for σ54 Promoters" Cells 12, no. 6: 829. https://doi.org/10.3390/cells12060829
APA StyleShujaat, M., Kim, H., Tayara, H., & Chong, K. T. (2023). iProm-Sigma54: A CNN Base Prediction Tool for σ54 Promoters. Cells, 12(6), 829. https://doi.org/10.3390/cells12060829