MDPI - Publisher of Open Access Journals

30 pages, 1715 KB

Open AccessArticle

A Novel Method for Predicting Oncogenic Types of Human Papillomavirus

by Songül Çeçen Kaynak and Hilal Arslan

Diagnostics 2025, 15(23), 3014; https://doi.org/10.3390/diagnostics15233014 - 27 Nov 2025

Viewed by 1017

Background and Objectives: Human Papillomavirus (HPV) is a leading cause of cervical and other anogenital cancers, with over 200 known genotypes classified into high-risk, probable high-risk, and low-risk groups. While conventional diagnostic and classification approaches often rely on sequence alignment, phylogenetic relationships, or [...] Read more.

Background and Objectives: Human Papillomavirus (HPV) is a leading cause of cervical and other anogenital cancers, with over 200 known genotypes classified into high-risk, probable high-risk, and low-risk groups. While conventional diagnostic and classification approaches often rely on sequence alignment, phylogenetic relationships, or protein structure analyses, these methods are limited in scalability, cost efficiency, and generalizability to emerging HPV types. This study aims to develop a novel, machine learning-based framework for classifying HPV genotypes by oncogenic risk using genome-derived numerical features. A key objective is to introduce TATA-box, CAAT-box, and CpG-island-based features to HPV risk prediction for the first time. Methods: We constructed a comprehensive feature set that integrates regulatory sequence motifs (TATA-box, CAAT-box, CpG islands) with dinucleotide and trinucleotide (k-mer) composition derived from full HPV genomes. Multiple machine learning algorithms were implemented to evaluate classification performance across all risk categories. Model accuracy, precision, recall, and F1-score were calculated to assess the effectiveness and robustness of the proposed feature set. Results: The proposed method achieves an average precision of 0.95, a recall of 0.95, an F1-score of 0.95, and an accuracy of 97.47%. The experimental findings indicate that the proposed method not only attains high classification accuracy across all HPV risk groups but also surpasses existing models in generalizability by utilizing genomic data and novel biologically informed features. Conclusions: This study introduces regulatory motif-based numerical features to HPV classification for the first time and demonstrates that integrating these with k-mer descriptors yields a highly accurate and scalable machine learning model. Unlike previous studies, which often focus on specific HPV genes or a limited subset of types, our method is scalable, robust, and capable of classifying known and emerging HPV types with high reliability. This highlights its potential for real-world deployment in large-scale epidemiological screening and vaccine development programs. Full article

(This article belongs to the Special Issue A New Era in Diagnosis: From Biomarkers to Artificial Intelligence)

► Show Figures

Figure 1

16 pages, 4084 KB

Open AccessArticle

The Supersymmetry Genetic Code Table and Quadruplet Symmetries of DNA Molecules Are Unchangeable and Synchronized with Codon-Free Energy Mapping during Evolution

by Marija Rosandić and Vladimir Paar

Genes 2023, 14(12), 2200; https://doi.org/10.3390/genes14122200 - 12 Dec 2023

Cited by 4 | Viewed by 2827

Abstract

The Supersymmetry Genetic code (SSyGC) table is based on five physicochemical symmetries: (1) double mirror symmetry on the principle of the horizontal and vertical mirror symmetry axis between all bases (purines [A, G) and pyrimidines (U, C)] and (2) of bases in the [...] Read more.

The Supersymmetry Genetic code (SSyGC) table is based on five physicochemical symmetries: (1) double mirror symmetry on the principle of the horizontal and vertical mirror symmetry axis between all bases (purines [A, G) and pyrimidines (U, C)] and (2) of bases in the form of codons; (3) direct–complement like codon/anticodon symmetry in the sixteen alternating boxes of the genetic code columns; (4) A + T-rich and C + G-rich alternate codons in the same row between both columns of the genetic code; (5) the same position between divided and undivided codon boxes in relation to horizontal mirror symmetry axis. The SSyGC table has a unique physicochemical purine–pyrimidine symmetry net which is as the core symmetry common for all, with more than thirty different nuclear and mitochondrial genetic codes. This net is present in the SSyGC table of all RNA and DNA living species. None of these symmetries are present in the Standard Genetic Code (SGC) table which is constructed on the alphabetic horizontal and vertical U-C-A-G order of bases. Here, we show that the free energy value of each codon incorporated as fundamentally mapping the “energy code” in the SSyGC table is compatible with mirror symmetry. On the other hand, in the SGC table, the same free energy values of codons are dispersed and a mirror symmetry between them is not recognizable. At the same time, the mirror symmetry of the SSyGC table and the DNA quadruplets together with our classification of codons/trinucleotides are perfectly imbedded in the mirror symmetry energy mapping of codons/trinucleotides and point out in favor of maintaining the integrity of the genetic code and DNA genome. We also argue that physicochemical symmetries of the SSyGC table in the manner of the purine–pyrimidine symmetry net, the quadruplet symmetry of DNA molecule, and the free energy of codons have remined unchanged during all of evolution. The unchangeable and universal symmetry properties of the genetic code, DNA molecules, and the energy code are decreasing disorder between codons/trinucleotides and shed a new light on evolution. Diversity in all living species on Earth is broad, but the symmetries of the Supersymmetry Genetic Code as the code of life and the DNA quadruplets related to the “energy code” are unique, unchangeable, and have the power of natural laws. Full article

(This article belongs to the Section Molecular Genetics and Genomics)

► Show Figures

Figure 1

19 pages, 5771 KB

Open AccessReview

The Evolution of Life Is a Road Paved with the DNA Quadruplet Symmetry and the Supersymmetry Genetic Code

by Marija Rosandić and Vladimir Paar

Int. J. Mol. Sci. 2023, 24(15), 12029; https://doi.org/10.3390/ijms241512029 - 27 Jul 2023

Cited by 4 | Viewed by 3269

Abstract

Symmetries have not been completely determined and explained from the discovery of the DNA structure in 1953 and the genetic code in 1961. We show, during 10 years of investigation and research, our discovery of the Supersymmetry Genetic Code table in the form [...] Read more.

Symmetries have not been completely determined and explained from the discovery of the DNA structure in 1953 and the genetic code in 1961. We show, during 10 years of investigation and research, our discovery of the Supersymmetry Genetic Code table in the form of 2 × 8 codon boxes, quadruplet DNA symmetries, and the classification of trinucleotides/codons, all built with the same physiochemical double mirror symmetry and Watson–Crick pairing. We also show that single-stranded RNA had the complete code of life in the form of the Supersymmetry Genetic Code table simultaneously with instructions of codons’ relationship as to how to develop the DNA molecule on the principle of Watson–Crick pairing. We show that the same symmetries between the genetic code and DNA quadruplet are highly conserved during the whole evolution even between phylogenetically distant organisms. In this way, decreasing disorder and entropy enabled the evolution of living beings up to sophisticated species with cognitive features. Our hypothesis that all twenty amino acids are necessary for the origin of life on the Earth, which entirely changes our view on evolution, confirms the evidence of organic natural amino acids from the extra-terrestrial asteroid Ryugu, which is nearly as old as our solar system. Full article

(This article belongs to the Special Issue The Structural and Dynamical Characterization of Biological Processes)

► Show Figures

Figure 1

21 pages, 2208 KB

Open AccessArticle

An Explanation of Exceptions from Chargaff’s Second Parity Rule/Strand Symmetry of DNA Molecules

by Marija Rosandić, Ines Vlahović, Ivan Pilaš, Matko Glunčić and Vladimir Paar

Genes 2022, 13(11), 1929; https://doi.org/10.3390/genes13111929 - 23 Oct 2022

Cited by 7 | Viewed by 3261

Abstract

In this article, we show that mono/oligonucleotide quadruplets, as basic structures of DNA, along with our classification of trinucleotides, disclose an organization of genomes based on purine–pyrimidine symmetry. Moreover, the structure and stability of DNA are influenced by the Watson–Crick pairing and the [...] Read more.

In this article, we show that mono/oligonucleotide quadruplets, as basic structures of DNA, along with our classification of trinucleotides, disclose an organization of genomes based on purine–pyrimidine symmetry. Moreover, the structure and stability of DNA are influenced by the Watson–Crick pairing and the natural law of DNA creation and conservation, according to which the same mono- or oligonucleotide insertion must be inserted simultaneously into both strands of DNA. Taken together, they lead to quadruplets with central mirror symmetry and bidirectional DNA strand orientation and are incorporated into Chargaff’s second parity rule (CSPR). Performing our quadruplet frequency analysis of all human chromosomes and of Neuroblastoma BreakPoint Family (NBPF) genes, which code Olduvai protein domains in the human genome, we show that the coding part of DNA violates CSPR. This may shed new light and give rise to a novel hypothesis on DNA creation and its evolution. In this framework, the logarithmic relationship between oligonucleotide order and minimal DNA sequence length, to establish the validity of CSPR, automatically follows from the quadruplet structure of the genomic sequence. The problem of the violation of CSPR in rare symbionts is discussed. Full article

(This article belongs to the Special Issue Non-coding DNA in Human Health and Diseases)

► Show Figures

Figure 1

15 pages, 2020 KB

Open AccessArticle

An RNA Sequencing Transcriptome Analysis and Development of EST-SSR Markers in Chinese Hawthorn through Illumina Sequencing

by Suliya Ma, Wenxuan Dong, Tong Lyu and Yingmin Lyu

Forests 2019, 10(2), 82; https://doi.org/10.3390/f10020082 - 22 Jan 2019

Cited by 22 | Viewed by 4593

Abstract

Chinese hawthorn (Crataegus pinnatifida) is an important ornamental and economic horticultural plant. However, the lack of molecular markers has limited the development and utilization of hawthorn germplasm resources. Simple sequence repeats (SSRs) derived from expressed sequence tags (ESTs) allow precise and [...] Read more.

Chinese hawthorn (Crataegus pinnatifida) is an important ornamental and economic horticultural plant. However, the lack of molecular markers has limited the development and utilization of hawthorn germplasm resources. Simple sequence repeats (SSRs) derived from expressed sequence tags (ESTs) allow precise and effective cultivar characterization and are routinely used for genetic diversity analysis. Thus, we first reported the development of polymorphic EST-SSR markers in C. pinnatifida with perfect repeats using Illumina RNA-Seq technique. In total, we investigated 14,364 unigenes, from which 5091 EST-SSR loci were mined. Di-nucleotides (2012, 39.52%) were the most abundant SSRs, followed by mono- (1989, 39.07%), and tri-nucleotides (1024, 20.11%). On the basis of these EST-SSRs, a total of 300 primer pairs were designed and used for polymorphism analysis in 70 accessions collected from different geographical regions of China. Of 239 (79.67%) pairs of primer-generated amplification products, 163 (54.33%) pairs of primers showed polymorphism. Finally, 33 primers with high polymorphism were selected for genetic diversity analysis and tested on 70 individuals with low-cost fluorescence-labeled M13 primers using capillary electrophoresis genotyping platform. A total of 108 alleles were amplified by 33 SSR markers, with the number of alleles (Na) ranging from 2 to 14 per locus (mean: 4.939), and the effective number of alleles (Ne) ranging from 1.258 to 3.214 (mean: 2.221). The mean values of gene diversity (He), observed heterozygosity (Ho), and polymorphism information content (PIC) were 0.524 (range 0.205–0.689), 0.709 (range 0.132–1.000), and 0.450 (range 0.184–0.642), respectively. Furthermore, the dendrogram constructed based on the EST-SSR separated the cultivars into two main clusters. In sum, our study was the first comprehensive study on the development and analysis of a large set of SSR markers in hawthorn. The results suggested that the use of NGS techniques for SSR development represented a powerful tool for genetic studies. Additionally, fluorescence-labeled M13 markers proved to be a valuable method for genotyping. All of these EST-SSR markers have agronomic potential and constitute a scientific basis for future studies on the identification, classification, and innovation of hawthorn germplasms. Full article

(This article belongs to the Special Issue Genetic Diversity of Tree Species in Forest and Conservation Management)

► Show Figures

Figure 1

14 pages, 11225 KB

Open AccessArticle

Comprehensive Analysis of Differentially Expressed Unigenes under NaCl Stress in Flax (Linum usitatissimum L.) Using RNA-Seq

by Jianzhong Wu, Qian Zhao, Guangwen Wu, Hongmei Yuan, Yanhua Ma, Hong Lin, Liyan Pan, Suiyan Li and Dequan Sun

Int. J. Mol. Sci. 2019, 20(2), 369; https://doi.org/10.3390/ijms20020369 - 16 Jan 2019

Cited by 30 | Viewed by 5220

Abstract

Flax (Linum usitatissimum L.) is an important industrial crop that is often cultivated on marginal lands, where salt stress negatively affects yield and quality. High-throughput RNA sequencing (RNA-seq) using the powerful Illumina platform was employed for transcript analysis and gene discovery to [...] Read more.

Flax (Linum usitatissimum L.) is an important industrial crop that is often cultivated on marginal lands, where salt stress negatively affects yield and quality. High-throughput RNA sequencing (RNA-seq) using the powerful Illumina platform was employed for transcript analysis and gene discovery to reveal flax response mechanisms to salt stress. After cDNA libraries were constructed from flax exposed to water (negative control) or salt (100 mM NaCl) for 12 h, 24 h or 48 h, transcription expression profiles and cDNA sequences representing expressed mRNA were obtained. A total of 431,808,502 clean reads were assembled to form 75,961 unigenes. After ruling out short-length and low-quality sequences, 33,774 differentially expressed unigenes (DEUs) were identified between salt-stressed and unstressed control (C) flax. Of these DEUs, 3669, 8882 and 21,223 unigenes were obtained from flax exposed to salt for 12 h (N1), 24 h (N2) and 48 h (N4), respectively. Gene function classification and pathway assignments of 2842 DEUs were obtained by comparing unigene sequences to information within public data repositories. qRT-PCR of selected DEUs was used to validate flax cDNA libraries generated for various durations of salt exposure. Based on transcriptome sequences, 1777 EST-SSRs were identified of which trinucleotide and dinucleotide repeat microsatellite motifs were most abundant. The flax DEUs and EST-SSRs identified here will serve as a powerful resource to better understand flax response mechanisms to salt exposure for development of more salt-tolerant varieties of flax. Full article

(This article belongs to the Special Issue Salinity Tolerance in Plants)

► Show Figures

Figure 1

12 pages, 1053 KB

Open AccessArticle

Genome-Wide Development of MicroRNA-Based SSR Markers in Medicago truncatula with Their Transferability Analysis and Utilization in Related Legume Species

by Xueyang Min, Zhengshe Zhang, Yisong Liu, Xingyi Wei, Zhipeng Liu, Yanrong Wang and Wenxian Liu

Int. J. Mol. Sci. 2017, 18(11), 2440; https://doi.org/10.3390/ijms18112440 - 18 Nov 2017

Cited by 37 | Viewed by 5542

Abstract

Microsatellite (simple sequence repeats, SSRs) marker is one of the most widely used markers in marker-assisted breeding. As one type of functional markers, MicroRNA-based SSR (miRNA-SSR) markers have been exploited mainly in animals, but the development and characterization of miRNA-SSR markers in plants [...] Read more.

Microsatellite (simple sequence repeats, SSRs) marker is one of the most widely used markers in marker-assisted breeding. As one type of functional markers, MicroRNA-based SSR (miRNA-SSR) markers have been exploited mainly in animals, but the development and characterization of miRNA-SSR markers in plants are still limited. In the present study, miRNA-SSR markers for Medicago truncatula (M. truncatula) were developed and their cross-species transferability in six leguminous species was evaluated. A total of 169 primer pairs were successfully designed from 130 M. truncatula miRNA genes, the majority of which were mononucleotide repeats (70.41%), followed by dinucleotide repeats (14.20%), compound repeats (11.24%) and trinucleotide repeats (4.14%). Functional classification of SSR-containing miRNA genes showed that all targets could be grouped into three Gene Ontology (GO) categories: 17 in biological process, 11 in molecular function, and 14 in cellular component. The miRNA-SSR markers showed high transferability in other six leguminous species, ranged from 74.56% to 90.53%. Furthermore, 25 Mt-miRNA-SSR markers were used to evaluate polymorphisms in 20 alfalfa accessions, and the polymorphism information content (PIC) values ranged from 0.39 to 0.89 with an average of 0.71, the allele number per marker varied from 3 to 18 with an average of 7.88, indicating a high level of informativeness. The present study is the first time developed and characterized of M. truncatula miRNA-SSRs and demonstrated their utility in transferability, these novel markers will be valuable for genetic diversity analysis, marker-assisted selection and genotyping in leguminous species. Full article

(This article belongs to the Section Molecular Plant Sciences)

► Show Figures

Graphical abstract

148 KB

Open AccessReview

The genetic basis of movement disorders

by E. M. Valentea, A. R. Bentivoglio and Alberto Albanesec

Swiss Arch. Neurol. Psychiatry Psychother. 1998, 149(4), 157-162; https://doi.org/10.4414/sanp.1998.01046 - 1 Jan 1998

Cited by 1 | Viewed by 14

Abstract

Recent developments in molecular genetics have had a profound influence on the diagnosis and classification of inherited movement disorders. Parkinson’s disease occurs in familial aggregation and one gene has recently been mapped. Eight genes responsible for different inherited dystonia syndromes have been mapped, [...] Read more.

Recent developments in molecular genetics have had a profound influence on the diagnosis and classification of inherited movement disorders. Parkinson’s disease occurs in familial aggregation and one gene has recently been mapped. Eight genes responsible for different inherited dystonia syndromes have been mapped, and two of them have been identified. Essential tremor also occurs in familial aggregation and a first gene has recently been mapped. Huntington’s disease is caused by the expansion of an unstable trinucleotide repeat sequence. Molecular diagnosis can easily be performed, and the study of the effects of the repeat expansion on the function of the encoded protein will help in understanding the pathogenesis of the disease. Wilson’s disease is caused by a large number of different mutations in a copper-binding ATPase gene. The genetic basis of Gilles de la Tourette’s syndrome is still obscure.The available data indicate that movement disorders are in most cases genetically heterogeneous. Molecular genetics will provide new classifications for these rather common disorders. Full article

Search Results (8)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (8)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI