Advanced Computational Frameworks for Characterizing Abnormal DNA Architectures and Their Implications in Genome Dynamics †
Abstract
1. Introduction
2. Genome-Wide Databases and Computational Resources for DNA Structures
| Database | Type of Structures Cataloged | Data Sources | Experimental Validation | Key Features | References |
|---|---|---|---|---|---|
| G4Hunter | G-quadruplex (G4) | Genomic sequences | In silico (scoring based on guanine content and sequence) | Predict potential G4-forming sequences, | [5,9] |
| Non-B DNA | Various non-B DNA structures (Z-DNA, G4, triplexes, cruciform, etc.) | Genomic sequences (e.g., human, mouse, bacterial) | Experimental data integrated alongside in silico predictions | collection of non-B DNA motifs, links to disease associations | [10] |
| QuadBase2 | G-quadruplex (G4) | Human genome, model organisms (plants, yeast, etc.) | Experimental data from literature and high-throughput sequencing | Contains experimentally validated G-quadruplex | [11] |
| Triplex- Inspector | Triplex-forming oligonucleotides (TFOs) and triplex DNA | Genomic sequences (custom uploads, model organisms) | Based on sequence features, In silico predictions | Detect triplex DNA, gene regulation | [11] |
3. Machine Learning and Bioinformatics Approaches for Predicting Non-B DNA Structures
3.1. Common Machine Learning Algorithms in Genomics
3.1.1. K-Nearest Neighbors (KNN)
3.1.2. Artificial Neural Networks (ANNs)
3.1.3. Convolutional Neural Networks (CNNs)
3.1.4. Random Forest (RF)
3.1.5. Support Vector Machines (SVMs)
| Technique | Model Used | Purpose | Accuracy | Reference |
|---|---|---|---|---|
| Prediction of G-quadruplexes | Convolutional Neural Network (CNN) | Predict G-quadruplex-forming regions in DNA sequences | 95.2% (AUC-ROC) | [17] |
| G4Boost: quadruplex identification and stability prediction | XGBoost regression model | Determine the sequences, nucleotide compositions, and estimated structural topologies of G4 motifs to forecast their secondary structure | 93% | [18] |
| Using omics data, a method for predicting functional Z-DNA areas | Convolution neural networks (CNNs) Recurrent neural networks (RNNs) Hybrid CNN–RNN models | DeepZ—Developed using chromosome accessibility, transcription factor/RNA polymerase binding, and epigenetic marker maps | 86.6% | [19] |
| Identifying proteins that bind to DNA using features based on composition and position is the focus of DNAPred_Prot. | Random Forest Support Vector Machine Artificial Neural Network | “DNAPred_Prot” DNA-binding protein using sequence features. | 91.47% | [20] |
| IoMT-based prediction of mitochondrial and inherited illnesses | Support vector machine (SVM) K-Nearest Neighbor (KNN) | Analysis of genetic data for early and accurate diagnosis. | 94.99% | [21] |
4. Applications of DNA Structure Prediction in Medicine and Disease
5. Conclusions and Future Directions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Wang, G.; Vasquez, K.M. Impact of Alternative DNA Structures on DNA Damage, DNA Repair, and Genetic Instability. DNA Repair 2014, 19, 143–151. [Google Scholar] [CrossRef]
- Lopez, C.R.; Singh, S.; Hambarde, S.; Griffin, W.C.; Gao, J.; Chib, S.; Yu, Y.; Ira, G.; Raney, K.D.; Kim, N. Yeast Sub1 and Human PC4 Are G-Quadruplex Binding Proteins That Suppress Genome Instability at Co-Transcriptionally Formed G4 DNA. Nucleic Acids Res. 2017, 45, 5850–5862. [Google Scholar] [CrossRef]
- Ha, S.C.; Lowenhaupt, K.; Rich, A.; Kim, Y.-G.; Kim, K.K. Crystal Structure of a Junction between B-DNA and Z-DNA Reveals Two Extruded Bases. Nature 2005, 437, 1183–1186. [Google Scholar] [CrossRef] [PubMed]
- Schümperli, D.; Pillai, R.S. The Special Sm Core Structure of the U7 snRNP: Far-Reaching Significance of a Small Nuclear Ribonucleoprotein. CMLS Cell. Mol. Life Sci. 2004, 61, 2560–2570. [Google Scholar] [CrossRef] [PubMed]
- Brázda, V.; Laister, R.C.; Jagelská, E.B.; Arrowsmith, C. Cruciform Structures Are a Common DNA Feature Important for Regulating Biological Processes. BMC Mol. Biol. 2011, 12, 33. [Google Scholar] [CrossRef]
- Puig Lombardi, E.; Londoño-Vallejo, A. A Guide to Computational Methods for G-Quadruplex Prediction. Nucleic Acids Res. 2020, 48, 1–15. [Google Scholar] [CrossRef] [PubMed]
- Georgakopoulos-Soares, I.; Morganella, S.; Jain, N.; Hemberg, M.; Nik-Zainal, S. Noncanonical Secondary Structures Arising from Non-B DNA Motifs Are Determinants of Mutagenesis. Genome Res. 2018, 28, 1264–1271. [Google Scholar] [CrossRef]
- Havlík, J.; Brázda, V.; Staněk, K.; Ježek, M.; Št’astný, J. Feature-Overlapper: The Tool for DNA Analysis Overlap. Softw. Impacts 2023, 16, 100498. [Google Scholar] [CrossRef]
- Bedrat, A.; Lacroix, L.; Mergny, J.-L. Re-Evaluation of G-Quadruplex Propensity with G4Hunter. Nucleic Acids Res. 2016, 44, 1746–1759. [Google Scholar] [CrossRef]
- Cer, R.Z.; Donohue, D.E.; Mudunuri, U.S.; Temiz, N.A.; Loss, M.A.; Starner, N.J.; Halusa, G.N.; Volfovsky, N.; Yi, M.; Luke, B.T.; et al. Non-B DB v2.0: A Database of Predicted Non-B DNA-Forming Motifs and Its Associated Tools. Nucleic Acids Res. 2012, 41, D94–D100. [Google Scholar] [CrossRef]
- Dhapola, P.; Chowdhury, S. QuadBase2: Web Server for Multiplexed Guanine Quadruplex Mining and Visualization. Nucleic Acids Res. 2016, 44, W277–W283. [Google Scholar] [CrossRef][Green Version]
- Le, D.-H. Machine Learning-Based Approaches for Disease Gene Prediction. Brief. Funct. Genom. 2020, 19, 350–363. [Google Scholar] [CrossRef] [PubMed]
- Sharma, B.R.; Kumar, V.; Gat, Y.; Kumar, N.; Parashar, A.; Pinakin, D.J. Microbial Maceration: A Sustainable Approach for Phytochemical Extraction. 3 Biotech 2018, 8, 401. [Google Scholar] [CrossRef] [PubMed]
- Zhi, J.; Sun, J.; Wang, Z.; Ding, W. Support Vector Machine Classifier for Prediction of the Metastasis of Colorectal Cancer. Int. J. Mol. Med. 2018, 41, 1419–1426. [Google Scholar] [CrossRef] [PubMed]
- Shon, H.S.; Yi, Y.; Kim, K.O.; Cha, E.-J.; Kim, K.-A. Classification of Stomach Cancer Gene Expression Data Using CNN Algorithm of Deep Learning. J. Biomed. Transl. Res. 2019, 20, 15–20. [Google Scholar] [CrossRef]
- Liew, D.; Lim, Z.W.; Yong, E.H. Machine Learning-Based Prediction of DNA G-Quadruplex Folding Topology with G4ShapePredictor. Sci. Rep. 2024, 14, 24238. [Google Scholar] [CrossRef]
- Yang, B.; Guneri, D.; Yu, H.; Wright, E.P.; Chen, W.; Waller, Z.A.E.; Ding, Y. Prediction of DNA I-Motifs via Machine Learning. Nucleic Acids Res. 2024, 52, 2188–2197. [Google Scholar] [CrossRef]
- Cagirici, H.B.; Budak, H.; Sen, T.Z. G4Boost: A Machine Learning-Based Tool for Quadruplex Identification and Stability Prediction. BMC Bioinform. 2022, 23, 240. [Google Scholar] [CrossRef]
- Beknazarov, N.; Jin, S.; Poptsova, M. Deep Learning Approach for Predicting Functional Z-DNA Regions Using Omics Data. Sci. Rep. 2020, 10, 19134. [Google Scholar] [CrossRef]
- Barukab, O.; Khan, Y.D.; Khan, S.A.; Chou, K.-C. DNAPred_Prot: Identification of DNA-Binding Proteins Using Composition- and Position-Based Features. Appl. Bionics Biomech. 2022, 2022, 5483115. [Google Scholar] [CrossRef]
- Rahman, A.; Nasir, M.U.; Gollapalli, M.; Alsaif, S.A.; Almadhor, A.S.; Mehmood, S.; Khan, M.A.; Mosavi, A. IoMT-Based Mitochondrial and Multifactorial Genetic Inheritance Disorder Prediction Using Machine Learning. Comput. Intell. Neurosci. 2022, 2022, 2650742. [Google Scholar] [CrossRef] [PubMed]
- Tabrizi, S.J.; Flower, M.D.; Ross, C.A.; Wild, E.J. Huntington Disease: New Insights into Molecular Pathogenesis and Therapeutic Opportunities. Nat. Rev. Neurol. 2020, 16, 529–546. [Google Scholar] [CrossRef] [PubMed]
- Kaur, G.; Rathod, S.S.S.; Ghoneim, M.M.; Alshehri, S.; Ahmad, J.; Mishra, A.; Alhakamy, N.A. DNA Methylation: A Promising Approach in Management of Alzheimer’s Disease and Other Neurodegenerative Disorders. Biology 2022, 11, 90. [Google Scholar] [CrossRef]
- Nussinov, R.; Jang, H.; Tsai, C.-J.; Cheng, F. Review: Precision Medicine and Driver Mutations: Computational Methods, Functional Assays and Conformational Principles for Interpreting Cancer Drivers. PLoS Comput. Biol. 2019, 15, e1006658. [Google Scholar] [CrossRef]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Masroor, S.; Dudeja, C.; Sanka, R.; Sabikhi, Y.; Singh, A.; Mishra, A.; Gupta, R. Advanced Computational Frameworks for Characterizing Abnormal DNA Architectures and Their Implications in Genome Dynamics. Chem. Proc. 2025, 18, 65. https://doi.org/10.3390/ecsoc-29-26886
Masroor S, Dudeja C, Sanka R, Sabikhi Y, Singh A, Mishra A, Gupta R. Advanced Computational Frameworks for Characterizing Abnormal DNA Architectures and Their Implications in Genome Dynamics. Chemistry Proceedings. 2025; 18(1):65. https://doi.org/10.3390/ecsoc-29-26886
Chicago/Turabian StyleMasroor, Sameen, Chhavi Dudeja, Richa Sanka, Yukti Sabikhi, Anshika Singh, Amish Mishra, and Richa Gupta. 2025. "Advanced Computational Frameworks for Characterizing Abnormal DNA Architectures and Their Implications in Genome Dynamics" Chemistry Proceedings 18, no. 1: 65. https://doi.org/10.3390/ecsoc-29-26886
APA StyleMasroor, S., Dudeja, C., Sanka, R., Sabikhi, Y., Singh, A., Mishra, A., & Gupta, R. (2025). Advanced Computational Frameworks for Characterizing Abnormal DNA Architectures and Their Implications in Genome Dynamics. Chemistry Proceedings, 18(1), 65. https://doi.org/10.3390/ecsoc-29-26886

