Integrating Reverse Vaccinology with Immunoinformatics for Rational Vaccine Target Discovery in Mycoplasma genitalium

Taneja, Jyoti; Kant, Ravi; Saluja, Daman

doi:10.3390/venereology4030014

Open AccessArticle

Integrating Reverse Vaccinology with Immunoinformatics for Rational Vaccine Target Discovery in Mycoplasma genitalium

by

Jyoti Taneja

^1,†,

Ravi Kant

^2,3,† and

Daman Saluja

^2,*

¹

Laboratory of Reproductive Epidemiology and Infection Immunology, Department of Zoology, Daulat Ram College, University of Delhi, Delhi 110007, India

²

Medical Biotechnology Laboratory, Dr. B.R. Ambedkar Centre of Biomedical Research, University of Delhi, Delhi 110007, India

³

Molecular Microbiology, School of Clinical and Experimental Sciences, Faculty of Medicine, University of Southampton, Southampton SO16 6YD, UK

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Venereology 2025, 4(3), 14; https://doi.org/10.3390/venereology4030014

Submission received: 8 July 2025 / Revised: 3 September 2025 / Accepted: 19 September 2025 / Published: 22 September 2025

Download

Browse Figures

Versions Notes

Abstract

Background: The increasing prevalence of antibiotic-resistant Mycoplasma genitalium poses a significant challenge to global public health, necessitating the exploration of alternative therapeutic strategies, including vaccine development. Methods: In this study, we employed an immuno-informatics-based reverse vaccinology approach augmented with artificial intelligence-driven tools, to identify and characterize potential B-cell and T-cell epitopes from the hypothetical proteins (HPs) retrieved from the genome of the MG_G37T strain, a previously uncharacterized yet promising vaccine target. Using multiple softwares, a systematic pipeline was utilized to assess the sub-cellular localization, antigenicity, and allergenicity of the selected proteins. Results: Sub-cellular localization analysis identified the presence of several outer membrane and extracellular proteins in the genome of MG_G37T, indicating their surface association and accessibility to immune surveillance. Antigenicity and allergenicity prediction tools led to the identification of two top-scoring hypothetical proteins (fig|2097.71.peg.1 (UniProt ID: P22747) and fig|2097.70.peg.33 (UniProt ID: Q57081)) that demonstrated strong antigenic potential, non-allergenic properties, and suitability as vaccine candidates. Epitope mapping and structural modeling analyses further validated the immunogenic potential of these epitopes, highlighting their ability to interact with host immune components effectively. Comparative analyses with mouse allelic regions indicated the potential translational relevance of these predicted epitopes for preclinical studies. Conclusions: In particular, this study highlights the potential of these two hypothetical proteins as a promising vaccine candidate and provides a strong reason for experimental validation towards the design and development of effective vaccines to combat M. genitalium infections in the era of antimicrobial resistance.

Keywords:

Mycoplasma genitalium; vaccine candidates; reverse vaccinology; hypothetical proteins; bioinformatics analysis

1. Introduction

Mycoplasma genitalium (MG) is a pathogenic bacterium classified within the Mollicutes class, which is characterized by its small genome and lack of a cell wall, making it inherently resistant to beta-lactam antibiotics [1]. Since its discovery in 1981, M. genitalium has been associated with a range of sexually transmitted infections (STIs), most notably non-gonococcal urethritis (NGU) in men and cervicitis, pelvic inflammatory disease (PID), and tubal infertility in women [2]. The prevalence of M. genitalium as a sexually transmitted pathogen has been rising, with concerns regarding its increasing resistance to macrolide and fluoroquinolone antibiotics [3]. This growing resistance has spurred interest in alternative strategies for controlling and preventing M. genitalium infections, particularly through the development of vaccines [4].

Despite extensive genome sequencing efforts, a significant portion of the M. genitalium proteome remains poorly characterized, with numerous proteins classified as “hypothetical”. These hypothetical proteins lack experimental validation and functional annotation, representing a considerable gap in our understanding of the organism’s biology and its pathogenic mechanisms [5]. Functional characterization of these hypothetical proteins is crucial, as they may hold key roles in vital biological processes such as host–pathogen interactions, immune evasion, and virulence [6]. Identifying proteins that can serve as potential vaccine targets is particularly critical given the absence of an effective vaccine against M. genitalium.

Vaccine development for M. genitalium presents unique challenges, primarily due to its minimalistic genome, its ability to persist in host tissues, and its immune evasion tactics [7]. Traditional vaccine development approaches are not well-suited to pathogens like M. genitalium, where the identification of protective antigens is complex. Therefore, the advent of reverse vaccinology—a genomics-based approach to vaccine discovery—has proven revolutionary, particularly when combined with artificial intelligence (AI) to enhance predictive accuracy [8]. Reverse vaccinology enables the systematic identification of potential vaccine candidates by analyzing the entire genome of a pathogen, focusing on surface-exposed or secreted proteins that can trigger immune responses [9]. AI algorithms further augment this process by integrating vast datasets and predictive models to evaluate proteins based on their immunogenic potential, sub-cellular localization, and virulence factors [6,10]. This approach leverages machine learning models to analyze protein features such as structural domains, immunogenicity, and functional annotations [8,9,10,11]. For M. genitalium, applying such an approach to the previously uncharacterized hypothetical proteins provides an innovative avenue for identifying novel vaccine candidates that could potentially circumvent the challenges posed by antibiotic resistance [12].

In this study, we utilize an immuno-informatics-based reverse vaccinology approach to systematically prioritize vaccine candidates from M. genitalium, with a particular focus on its hypothetical proteins [13]. By integrating multiple bioinformatics tools—including ProtParam, PSORTb, TMHMM, VirulentPred, and others—we performed a detailed structural, functional, and immunological analysis of hypothetical proteins to predict their potential roles in pathogenesis and vaccine suitability. The main focus of this study is to uncover novel protein targets that are surface-exposed, immunogenic, and potentially capable of eliciting protective immune responses, laying the groundwork for the experimental validation of these candidates.

The functional annotation and immunological profiling of these hypothetical proteins could offer valuable insights into M. genitalium biology, revealing new avenues for therapeutic interventions, vaccine development, and improved understanding of STI-related infections [13]. As drug resistance continues to limit treatment options, prioritizing these proteins as potential vaccine candidates could be pivotal in addressing the unmet medical need for effective prevention and control of M. genitalium infections [14].

2. Materials and Methods

2.1. Sequence Retrieval

The FASTA sequences of the hypothetical proteins from the MG_G37T, along with their protein and gene IDs, were retrieved from the NCBI database “https://www.ncbi.nlm.nih.gov (accessed on 17 April 2024)”.These sequences (NCBI Assembly: GCF_000027345.1 (RefSeq) were then analyzed using the ProtParam tool on the ExPASy server [15] to predict various physicochemical properties, including molecular weight, theoretical isoelectric point (pI), amino acid composition, instability index, aliphatic index, and grand average of hydropathicity (GRAVY) [16].

To ensure the selection of novel or poorly characterized hypothetical proteins for downstream analysis, we applied a sequence similarity threshold of less than 95% against known proteins in the NCBI database using BLASTp. Specifically, the BLASTp search was conducted with an E-value threshold of 1× 10⁻⁵ and a minimum sequence identity cutoff of 35%, which are commonly used parameters for homology-based protein functional annotation. This relatively stringent cutoff was used to filter out proteins with high homology to previously characterized sequences, thereby prioritizing truly hypothetical proteins with limited functional annotation. This approach aims to uncover novel vaccine targets with minimal redundancy and a higher likelihood of contributing unique immunogenic features.

2.2. Sequence Homology Search

A sequence homology search was conducted using BLASTP [17] to determine the similarity of the hypothetical proteins from the M. genitalium genome to known proteins in the database. Sequences showing significant similarity to known hypothetical proteins, but less than 95% similarity to proteins of known function, were selected for further analysis. This step ensured the accurate characterization of the hypothetical proteins, confirming them as true hypothetical proteins suitable for subsequent functional annotation and vaccine target prioritization.

2.3. Sub-Cellular Localization Prediction

Sub-cellular localization prediction was carried out using several bioinformatics tools that categorize proteins based on their cellular compartments. Initially, CELLO V2.5 [18] was used to predict the localization of the hypothetical proteins from the M. genitalium genome. These predictions were cross validated using PSORTb v3.02 [19], another authenticated sub-cellular localization tool, to ensure the reliability of the results.

Additional tools such as TMHMM v.2.0 [20] and SignalP 6.0 [21] were employed to predict trans-membrane helices and signal peptides, respectively, which are crucial for determining the membrane association and secretory nature of the proteins. HMMTOP [22] and SOSUI [23] were also utilized for further validation.

These tools collectively help to determine whether the hypothetical proteins are likely to be cytoplasmic, membrane-associated, or extracellular, which is critical for assessing their potential as vaccine targets. Proteins predicted to be located on the plasma membrane can serve as promising vaccine targets, whereas cytoplasmic proteins may serve as potential drug targets. Predictions with consistent results from at least two of the tools were accepted, while inconsistent predictions were excluded from the study. This refined approach ensured a reliable and accurate sub-cellular localization prediction for the shortlisted hypothetical proteins, aiding in their prioritization as potential vaccine candidates. For detailed pipeline of the methodology see Supplementary Figure S1).

2.4. Transmembrane Helices Prediction

Transmembrane helices were predicted using TMHMM v.2.0 [20] and DeepTMHMM [24]. TMHMM v.2.0 uses a hidden Markov model to identify transmembrane segments, while DeepTMHMM leverages deep learning for enhanced accuracy. Consistent predictions from both tools were accepted to ensure reliable characterization of membrane-associated hypothetical proteins, aiding in their identification as potential vaccine candidates.

2.5. Physico-Chemical Properties Computation

The physicochemical properties of hypothetical proteins were computed using the ProtParam tool on the ExPASy server [15]. The following properties were analyzed:

Molecular Weight: Determined from the amino acid composition. Theoretical Isoelectric Point (pI): The pH at which the protein carries no net charge. Amino Acid Composition: The relative abundance of each amino acid. Instability Index: Predicts protein stability (values < 40 indicate stability). Aliphatic Index: Indicates thermostability based on aliphatic side chain volume. Grand Average of Hydropathicity (GRAVY): Measures hydrophobicity or hydrophilicity. These properties provide a detailed profile of the hypothetical proteins, supporting their functional characterization and prioritization as vaccine targets.

2.6. Virulence Factors Prediction

To identify potential virulence factors among the hypothetical proteins, we employed VirulentPred [25] and VICMpred [26] servers.

VirulentPred uses a support vector machine (SVM) based model to predict virulence factors by analyzing the sequence features of the proteins. VICMpred applies a similar SVM-based approach, supplemented with a comprehensive dataset of known virulence factors to enhance prediction accuracy.

Predicting virulence factors is crucial for understanding the pathogenic mechanisms of M. genitalium and identifying potential targets for vaccine development and therapeutic interventions. Both tools were used to cross-validate the predictions, ensuring careful identification of proteins with potential virulence properties. Proteins predicted as virulence factors by both tools were further considered for their potential role in pathogenicity and vaccine development.

2.7. Functional Annotation and Domain Prediction

To predict the function and identify conserved domains of the hypothetical proteins, we utilized several bioinformatics tools: Pfam v35.0 SMART, CDD, and ScanProsite. Pfam v35.0 [27] was used to identify protein families and domains based on hidden Markov models (HMMs). SMART [28] provided additional insights into the domain architecture of proteins, focusing on signaling domains and extracellular domains.

The Conserved Domain Database (CDD) [29] was employed to detect conserved domains within the protein sequences by comparing them with known conserved domain alignments. ScanProsite [30] was used to identify Prosite patterns and profiles, aiding in the functional annotation of protein sequences.

Predicting functions and identifying domains is vital for understanding the roles of hypothetical proteins in the biological processes of M. genitalium. These predictions contribute to the functional annotation of the hypothetical proteins, providing insights into their potential roles in pathogenicity and their suitability as vaccine candidates.

2.8. Signal Peptide Prediction

To predict signal peptides in the hypothetical proteins, we employed SignalP 6.0 [21], a deep learning-based tool to identify signal peptides and their cleavage sites. These peptides play a key role in protein secretion and membrane localization, making them critical for assessing the potential of proteins as vaccine targets. Detecting signal peptides thus helps pinpoint proteins likely involved in host–pathogen interactions.

2.9. Prediction of Antigenicity, Allergenicity, and Toxicity

For the final 26 proteins identified with a great degree of confidence (GDC), we predicted their antigenicity, allergenicity, and toxicity to evaluate their potential as safe and effective vaccine candidates.

2.9.1. Antigenicity Prediction

Antigenicity, which reflects a protein’s ability to stimulate an immune response, was assessed using the VaxiJen v2.0 server [31]. This tool predicts protective antigens from protein sequences based on auto- and cross-covariance transformation of protein sequences. We applied a bacteria-specific threshold of 0.4, selecting proteins with the highest antigenic scores for further vaccine target evaluation, as these are more likely to elicit a strong immune response.

2.9.2. Allergenicity Prediction

To ensure that the selected proteins would not induce an allergic response in the host, we used AllerTOP v2.0 [32], which classifies proteins as allergens or non-allergens based on their amino acid composition and similarity to known allergens. Proteins identified as allergens were excluded from further consideration to avoid potential adverse reactions.

2.9.3. Toxicity Prediction

Toxicity prediction was conducted using the ToxinPred3.0 server [33] and ToxiDL tool [34], which evaluates the potential toxicity of proteins based on their sequence. This analysis is crucial to ensure that the selected proteins do not pose a toxic risk to the host, thereby ensuring the safety of potential vaccine candidates. Proteins predicted to be toxic were removed from the candidate list (for stepwise pipeline used in this study see Supplementary Figure S1).

These analyses allowed us to refine our selection of vaccine candidates to include only those proteins that are antigenic, non-allergenic, and non-toxic, thereby optimizing their potential for safe and effective vaccine development against M. genitalium.

2.10. Homology Modeling of Two Candidate Proteins

Homology modeling was performed to predict the three-dimensional (3D) structure of the selected peptide region “HKNKVHALYQDPESGNIFSLKKRKQLASNYPLFELTSDNPISFTNNI” from fig|2097.70.peg.33 (UniProt ID: Q57081), which consists of 47 amino acid residues and another peptide region from fig|2097.71.peg.1 (UniProt ID: P22747) “FANTNLDWGENKQKQFVENQLGYKETTSTNSHNFHSKSFTQ” comprising 41 amino acid residues. The Swiss-Model server [35] was used to generate a reliable structural model based on template-based modeling. The peptide sequence was submitted to the Swiss-Model workspace, where the best structural template was automatically selected based on sequence similarity and structural quality. The final model was visualized and analyzed using PyMOL [36,37] to examine secondary structure elements such as alpha-helices, beta-strands, and loop regions. This homology modeling approach provided better structural insights into fig|2097.70.peg.33 (UniProt ID: Q57081) and fig|2097.71.peg.1 (UniProt ID: P22747) and supports their immunogenic potential and suitability as a vaccine candidates.

2.11. B-Cell and T-Cell Epitope Prediction

2.11.1. B-Cell Epitope Prediction

To predict linear B-cell epitopes within the selected antigenic, non-toxic, and non-allergenic proteins from M. genitalium, we utilized the BepiPred-2.0 server [38]. BepiPred-2.0 is designed to identify potential B-cell epitopes by analyzing the protein sequence for regions likely to be recognized by B-cells, which is crucial for vaccine design. The server provides a prediction score for each amino acid residue, where higher scores indicate a higher likelihood of being part of an epitope. We considered regions with scores above the threshold as potential B-cell epitopes, prioritizing those that are surface-exposed and likely to interact with the host immune system.

2.11.2. T-Cell Epitope Prediction

For T-cell epitope prediction, we used the Immune Epitope Database (IEDB) tools [39,40] to identify peptides that can bind to MHC Class I and Class II molecules, which are essential for eliciting a cellular immune response. Specifically, we employed the NetMHCpan tool [41] integrated within IEDB to predict MHC Class I binding peptides and the recommended MHC Class II binding prediction tool for MHC Class II epitopes. The analysis included identifying peptides with high binding affinity, which are likely to be presented on the surface of antigen-presenting cells and recognized by T-cells. The peptides with the strongest predicted binding affinities were selected as candidate T-cell epitopes.

This epitope prediction approach is fundamental to identifying peptides that could serve as key components of a vaccine, capable of inducing both humoral and cellular immune responses against M. genitalium.

2.12. Molecular Docking

The innate immune receptor Toll-like receptor 4 (TLR4) was selected as the target protein owing to its well-established role in recognizing bacterial components and initiating immune responses. The activation of TLR4 has been strongly associated with the development of peptide-based vaccine candidates, making it a rational choice for screening immunogenic peptides.

The three-dimensional structure of human TLR4 was retrieved from the Protein Data Bank (PDB ID: 3UL7) [42]. The peptides (fig|2097.70.peg.33 (UniProt ID: Q57081) and fig|2097.71.peg.1 (UniProt ID: P22747)) were modeled using the PEP-FOLD3 server [43] using their respective amino acid sequences and subjected to energy minimization. The protein-peptide molecular docking of the peptides with TLR4 was performed using HDock Server [44], and the best docked poses were selected based on binding energy scores and interaction patterns. Docked complexes were further visualized in PyMOL [37] to assess hydrogen bonding, hydrophobic contacts, and key interacting residues.

2.13. Molecular Dynamics (MD) Simulations

Molecular dynamics (MD) simulations were carried out to gain detailed insights into the stability and dynamic behavior of peptide–TLR4 interactions. The peptide—TLR4 complexes (fig|2097.70.peg.33 (UniProt ID: Q57081)–TLR4 and fig|2097.71.peg.1 (UniProt ID: P22747)–TLR4), derived from molecular docking studies, were subjected to all-atom MD simulations using GROMACS 2020.1 [45,46]. The CHARMM36 force field was employed to generate the protein and peptide topology files. Briefly, each complex was solvated in a triclinic box filled with TIP3P water molecules, and the systems were neutralized by adding counter-ions (Na⁺/Cl⁻). Energy minimization was performed using the steepest descent algorithm for 50,000 steps to relieve steric clashes. Following minimization, the complexes were equilibrated in two steps: NVT ensemble equilibration at 300 K for 100 ps using a V-rescale thermostat. NPT ensemble equilibration for 100 ps at 1 bar pressure using the Parrinello–Rahman barostat. Subsequently, 300 ns production MD simulations were performed for each peptide–TLR4 complex at 300 K temperature and 1 bar pressure with a 2 fs time step. The Particle Mesh Ewald (PME) method [47] was applied for long-range electrostatics with a real-space cut-off of 10 Å, PME order of six, and a relative tolerance of 10⁻⁶ kcal/mol. Short-range van der Waals interactions were treated with a 9 Å cut-off, and neighbor lists were updated every 10 steps. All covalent bonds involving hydrogen atoms were constrained using the LINCS algorithm [48].

Trajectory analyses were conducted after achieving system stabilization. The following parameters were computed: Root Mean Square Deviation (RMSD): to assess backbone stability. Root Mean Square Fluctuation (RMSF): to evaluate residue-level flexibility. Radius of Gyration (Rg): to measure compactness of the complexes and Hydrogen Bond Analysis: to quantify stability and persistence of peptide–TLR4 interactions. The final equilibrated structures were obtained by averaging representative snapshots from the stabilized trajectories.

2.14. Principal Component Analysis (PCA)

To gain deeper insights into the large-scale conformational dynamics of the TLR4–peptide complexes, we performed Principal Component Analysis (PCA) on the molecular dynamics (MD) trajectories. PCA was employed to extract the essential motions of the protein–peptide complexes, thereby reducing the dimensional complexity of the simulation data into dominant modes of motion [49].

The covariance matrix of the positional fluctuations of the Cα atoms was generated from the MD trajectories using the gmxcovar module in GROMACS. Eigenvectors and eigen values were computed to identify the dominant collective motions within the system. Subsequently, the first few principal components (PCs) that captured the maximum variance were extracted using the gmxanaeig module [50].

To visualize the conformational sampling, 2D projection plots of the first two principal components (PC1 vs. PC2) were generated. These plots provide an overview of the free energy landscape explored by each peptide–TLR4 complex. Additionally, the distribution of eigen values was analyzed to assess the relative contribution of each PC to the overall dynamics of the complexes.

2.15. Binding Free Energy Calculations (MM-GBSA)

The binding free energy of the TLR4–peptide complexes was evaluated using the Molecular Mechanics Generalized Born Surface Area (MM-GBSA) method, which decomposes the interaction energy into van der Waals (ΔE_vdW), electrostatic (ΔE_ele), polar solvation (ΔE_GB), and non-polar solvation (ΔE_SURF) contributions [51]. This approach provides an estimation of the overall binding free energy (ΔG_MM-GBSA), where negative values indicate favorable interactions.

3. Results

3.1. Identification and Characterization of Hypothetical Proteins

The genome of M. genitalium (MG_G37T strain) was thoroughly analyzed, identifying 308 hypothetical proteins (HPs) with unknown or poorly understood functions. These proteins were systematically evaluated to determine their potential as vaccine candidates. This evaluation involved a series of computational analyses to predict their physicochemical properties, localization, antigenicity, allergenicity, toxicity, and immune-relevant epitopes. Such a multi-step approach ensured a holistic understanding of each protein’s attributes.

3.2. Physicochemical Properties

The physicochemical properties of the hypothetical proteins were assessed to understand their stability and suitability for vaccine development. The molecular weights of these proteins ranged from 4.2 kDa to 215.9 kDa, reflecting significant diversity in their sizes. The theoretical isoelectric points (pI) varied between 4.15 and 12.01, indicating the presence of both acidic and basic proteins.

Stability, an essential parameter for vaccine formulation, was determined using the instability index. Based on the instability index values calculated using ProtParam, approximately 60% of the proteins (specifically those with an instability index < 40) were classified as stable, as detailed in Table 1. Proteins with instability indices above 40 were deprioritized to streamline the focus on more promising targets. Additionally, the aliphatic index and GRAVY values provided insights into the hydrophobicity and thermal stability of the proteins, further validating their potential as vaccine candidates (Table 1).

3.3. Sub-Cellular Localization

Sub-cellular localization predictions revealed a diverse distribution of hypothetical proteins within the bacterial cell. Among the 74 proteins analyzed, 26 (35.1%) were predicted to be extracellular, 19 (25.7%) were associated with the membrane, and 29 (39.2%) were cytoplasmic (Figure 1a).

Extracellular and membrane-associated proteins were prioritized due to their accessibility to the host immune system, which is critical for eliciting an effective immune response. Cytoplasmic proteins were deprioritized as they are less likely to interact directly with immune components during infection. This localization-based stratification narrowed the focus to 45 proteins (extracellular and membranous) for subsequent analyses (Table 2). These proteins are likely to be involved in key pathogenic processes, making them attractive targets for vaccine development. We further analyzed the membranous and extracellular proteins (45 Hypothetical proteins) for antigenicity, allergenicity, toxicity and virulence and out of 45 hypothetical proteins only 23 proteins passed these filters and were found suitable to be taken for further analysis.

3.4. Antigenicity and Virulence Assessment

To identify immunogenic proteins, antigenicity was evaluated using VaxiJen v2.0 [31] with a threshold of 0.4. All 23 prioritized proteins exceeded this threshold, demonstrating their potential to elicit an immune response. Notably, few proteins such as fig|2097.70.peg.1, fig|2097.70.peg.33 (UniProt ID: Q57081) achieved antigenic scores above 0.6, indicating significant immunogenic potential (Table 3).

Allergenicity assessments using AllerTOP v2.0 confirmed that none of the proteins were probable allergens, enhancing their safety profiles as vaccine candidates. Additionally, toxicity analyses via ToxinPred showed that all the 23 proteins were non-toxic, ensuring their compatibility with therapeutic applications.

Virulence analysis using VirulentPred highlighted that several proteins are linked to virulence factors, further emphasizing their role in the pathogenicity of M. genitalium. Proteins associated with virulence are particularly valuable as vaccine targets due to their functional relevance in infection and disease progression.

3.5. Functional Annotation

To gain insight into the functional roles of the identified hypothetical proteins, multiple bioinformatics tools were used for domain annotation and functional prediction. Conserved domains were identified using Conserved Domain Database CDD, ScanProsite, and Pfam v35.0, which facilitated the classification of proteins into known functional families (Figure 1b). These analyses provided essential information regarding potential structural features and biological roles, reinforcing the significance of these proteins in M. genitalium pathogenicity.

Pfam v35.0 analysis revealed that several proteins contained hidden Markov model (HMM)-based conserved domains, suggesting potential involvement in key cellular processes. CDD analysis further supported these findings by identifying domain alignments with known virulence-associated proteins. ScanProsite successfully detected functional motifs and Prosite patterns, contributing additional structural and functional insights. Collectively, these predictions enhance our understanding of hypothetical proteins, supporting their potential role as vaccine targets or virulence factors in M. genitalium.

3.6. Prioritization of Vaccine Candidates

Integrating all the aforementioned data from 23 hypothetical proteins, these shortlisted proteins emerged as promising vaccine candidates. These proteins exhibited favorable physicochemical properties, extracellular or membranous localization, high antigenicity, non-allergenicity, non-toxicity, and the presence of immunogenic epitopes. The hypothetical proteins fig|2097.71.peg.1 (UniProt ID: P22747) and fig|2097.70.peg.33 (UniProt ID: Q57081) were prioritized as the top-ranking candidates based on their high antigenicity scores, extracellular localization, virulence association, and strong predicted B-cell and T-cell epitope binding affinities (Table 3). Their potential to stimulate both humoral and cellular immune responses underscores their relevance for preclinical validation. Future studies will focus on experimental validation of these candidates using in vitro and in vivo models to confirm their immunogenic potential.

3.7. Homology Modeling of the Potential Vaccine Candidates

To further validate the structural characteristics of the top two prioritized vaccine candidates, fig|2097.71.peg.1 (UniProt ID: P22747) and fig|2097.70.peg.33 (UniProt ID: Q57081), homology modeling was performed using Swiss-Model. These candidates, associated with high antigenicity scores (0.67 and 0.61, respectively) were selected for 3D structure prediction. Specifically, a 47-amino acid peptide region (HKNKVHALYQDPESGNIFSLKKRKQLASNYPLFELTSDNPISFTNNI) from fig|2097.70.peg.33 (UniProt ID: Q57081) and a 41-amino acid peptide region (FANTNLDWGENKQKQFVENQLGYKETTSTNSHNFHSKSFTQ) from fig|2097.71.peg.1 (UniProt ID: P22747) were modeled to assess their conformational stability and structural suitability as vaccine targets (Figure 2).

The generated model exhibited a well-folded structure with stable secondary elements, including alpha-helices and beta-strands, which probably contribute to epitope stability and potential immunogenicity and suitability for future in vitro and in vivo validation. To gain more confidence in the predicted structures of both the peptides, we performed further structural validation analyses using Ramachandran Plot [52], ProSA Web Server (to obtain the Z-Score) [53] of the modeled structure and also the ERRAT analysis which supported and validated the authenticity and precision of the modeled structures [54]. These additional structural validation analyses are shown in Supplementary Figure S2.

3.8. Epitope Prediction

3.8.1. B-Cell Epitope Mapping

Linear B-cell epitopes were predicted for all the 23 HPs including top two prioritized vaccine candidates, i.e., fig|2097.71.peg.1 (UniProt ID: P22747) and fig|2097.70.peg.33 (UniProt ID: Q57081) using BepiPred-2.0, focusing on identifying regions with high antigenic potential. The extracellular protein fig|2097.70.peg.33 (UniProt ID: Q57081) contained a 47-amino acid epitope (“HKNKVHALYQDPESGNIFSLKKRKQLASNYPLFELTSDNPISFTNNI”) with high antigenicity score (Figure 3a–c). Such epitopes are critical for stimulating humoral immunity, which plays a central role in neutralizing pathogens. The detailed summary of the identified potential B-cell epitope regions and the predicted peptide sequences are provided in Table 4 (for top two proteins only) and Supplementary Table S1(for all 23 HPs). The predicted B-cell epitope regions for fig|2097.71.peg.1 (UniProt ID: P22747) are shown in Supplementary Figure S3.

In addition to linear B-cell epitope prediction using BepiPred, conformational epitope mapping was performed using the ElliPro tool, which integrates structural modeling and surface accessibility data. The predicted discontinuous epitopes corroborated the linear epitope findings, providing structural support for their potential immunogenicity.

3.8.2. T-Cell Epitope Mapping

Further T-cell epitopes were also predicted for both MHC-I and MHC-II molecules from all the 23 HPs using IEDB tools. High-affinity binding peptides were identified, such as “YTDEKKVPLINY” (MHC-I) and “PKPVVDLKPQRIEPR” (MHC-II) from fig|2097.70.peg.33 (UniProt ID: Q57081), indicating their potential to activate CD8+ and CD4+ T-cells, respectively (Figure 4a–c). These epitopes are essential for eliciting an effective cellular immune response, which complement humoral immunity in providing long-term protection. The summary of the identified potential epitope regions and the predicted peptide sequences are provided in Table 5 (for top two proteins only) and Supplementary Table S2 (for all 23 HPs). The predicted T-cell epitope regions for fig|2097.71.peg.1 (UniProt ID: P22747) are shown in Supplementary Figure S4.

Epitope predictions were further evaluated for conservancy, population coverage ensuring that the identified regions are broadly applicable across diverse human populations (Supplementary Figure S5 and Tables S3 and S4).

3.8.3. Epitope Conservancy Across M. genitalium Strains

To evaluate the cross-strain relevance of the predicted epitopes, we performed epitope conservancy analysis using the IEDB Epitope Conservancy Tool. High-scoring B-cell and T-cell epitopes derived from fig|2097.70.peg.33 (UniProt ID: Q57081) and fig|2097.71.peg.1 (UniProt ID: P22747) were analyzed against protein sequences from multiple M. genitalium strains available in the NCBI database. The results indicated that several epitopes showed high sequence identity (>85%) across the strains, suggesting that the shortlisted epitopes are broadly conserved and may offer cross-protective immune responses.

3.9. Molecular Docking

Molecular docking analysis revealed that both peptides exhibited favorable binding with TLR4. fig|2097.70.peg.33 (UniProt ID: Q57081)demonstrated a strong binding affinity with docking score of −9.6 kcal/mol, forming stable hydrogen bonds with residues within the LRR domain of TLR4, which is known to participate in ligand recognition. fig|2097.71.peg.1 (UniProt ID: P22747) also displayed significant binding with docking score of −7.4 kcal/mol but with fewer hydrogen bond interactions compared to fig|2097.70.peg.33 (UniProt ID: Q57081). Figure 5 illustrates the molecular docking of both peptides with TLR4, highlighting the binding of the two with key contact regions.

3.10. Molecular Dynamics (MD) Simulations

To further validate the docking results and assess dynamic stability of the docked complexes, MD simulations were performed for 300 ns. RMSD (Figure 6a): The backbone RMSD of the peptide–TLR4 complexes indicated that both systems attained stability after ~50 ns. Peptide fig|2097.71.peg.1 (UniProt ID: P22747)–TLR4 complex showed comparatively lower deviations, suggesting enhanced conformational stability over the simulation timeframe.

RMSF (Figure 6b): Residue-wise flexibility analysis demonstrated fluctuations mainly in the loop regions of TLR4. fig|2097.71.peg.1 (UniProt ID: P22747) binding resulted in reduced flexibility of critical residues compared to Peptide fig|2097.70.peg.33 (UniProt ID: Q57081), reflecting stronger stabilization of the receptor. Radius of Gyration (Figure 6c): Both complexes maintained overall structural compactness throughout 300 ns. The Peptide fig|2097.71.peg.1 (UniProt ID: P22747)–TLR4 complex exhibited slightly reduced Rg values, indicating a more compact and stable conformation. Hydrogen Bond Analysis (Figure 6d): The Peptide fig|2097.71.peg.1 (UniProt ID: P22747)–TLR4 complex consistently maintained a higher number of hydrogen bonds during the simulation, supporting the notion of stronger and more stable binding relative to Peptide fig|2097.70.peg.33 (UniProt ID: Q57081).

Collectively, the molecular docking and MD results suggest that Peptide fig|2097.71.peg.1 (UniProt ID: P22747) demonstrates stronger and more stable interactions with TLR4 compared to Peptide fig|2097.70.peg.33 (UniProt ID: Q57081), making it a promising candidate for further experimental validation.

3.11. Principal Component Analysis (PCA)

PCA revealed significant differences in the conformational space sampled by the TLR4 receptor bound to the two peptides. The eigenvalue distribution indicated that the first few principal components accounted for the majority of the structural variance, highlighting the dominance of large-scale collective motions in governing the complex dynamics.

The 2D projection plots (PC1 vs. PC2) demonstrated distinct clustering patterns for the two peptide–TLR4 complexes, reflecting their differential influence on receptor flexibility and stability. One peptide induced broader scattering in the conformational landscape, suggesting enhanced structural fluctuations and higher conformational plasticity. In contrast, the other peptide–TLR4 complex occupied a more confined region in the PC1–PC2 space, implying restricted dynamics and greater stabilization of the receptor upon peptide binding (Figure 7).

These findings indicate that while both peptides interact with TLR4, they modulate its conformational motions differently. Such differential behavior may have important implications for receptor activation, downstream signaling, and the overall stability of the receptor–peptide complexes.

3.12. BindingFree Energy Calculations (MMGBSA)

The binding free energy calculation studies shows Complex 1 (fig|2097.71.peg.1 (UniProt ID: P22747)–TLR4 complex) exhibited strong stabilizing van der Waals interactions (ΔEvdW = −98.65 kcal mol⁻¹) along with favorable electrostatic energy (ΔEele = −52.88 kcal mol⁻¹). Although the polar solvation term (ΔEGB = +58.01 kcal mol⁻¹) contributed unfavorably, the non-polar solvation (ΔE_SURF = −6.68 kcal mol⁻¹) partially compensated, resulting in a highly favorable total binding free energy (ΔG_MM-GBSA = −100.2 kcal mol⁻¹) Table 6. In comparison, Complex 2 (fig|2097.70.peg.33 (UniProt ID: Q57081)—TLR4 complex) showed slightly weaker van der Waals (ΔE_vdW = −95.89 kcal mol⁻¹) and electrostatic (ΔEele = −46.23 kcal mol⁻¹) interactions, with a stronger unfavorable polar solvation energy (ΔE_GB = +67.97 kcal mol⁻¹). The enhanced stabilization from non-polar solvation (ΔE_SURF = −15.08 kcal mol⁻¹) improved binding, but the net binding free energy (ΔG_MM-GBSA = −89.23 kcal mol⁻¹) was less favorable than that of Complex 1. These findings suggest that both complexes exhibit strong protein–ligand interactions, but Complex 1 is predicted to be thermodynamically more stable due to its more favorable balance of van der Waals and electrostatic contributions (Figure 8).

4. Discussion

M. genitalium is a clinically significant sexually transmitted pathogen linked to nongonococcal urethritis, cervicitis, and pelvic inflammatory disease, with rising macrolide and fluoroquinolone resistance that constrains therapy [55,56]. These pressures motivate vaccine discovery, yet conventional antigen selection is challenging in MG because of its reduced genome, antigenic variability, and immune evasion. Here, we systematically interrogated the largely uncharacterized fraction of the MG proteome—hypothetical proteins (HPs)—using a reverse vaccinology pipeline to prioritize antigens with favorable localization, physicochemical properties, antigenicity, and predicted B- and T-cell epitopes. Our emphasis on extracellular and membrane-associated HPs reflects their higher likelihood of immune exposure and vaccine tractability, consistent with observations across diverse pathogens that surface/secreted proteins tend to be more protective [6,57,58,59].

HPs remain an underexplored reservoir of potential virulence and immunogenicity in MG [60]. By combining sub-cellular localization (PSORTb/CELLO V2.5, transmembrane/signal predictions (TMHMM/DeepTMHMM/SignalP 6.0), and stability/solubility proxies (ProtParam) with immunoinformatics (VaxiJen v2.0, AllerTOP v2.0, ToxinPred) and epitope mapping (IEDB), we implemented a multi-parametric selection strategy rather than relying on a single criterion [12]. This integrative framework enabled us to triage candidates that simultaneously scored as antigenic, non-allergenic, non-toxic, surface-accessible, and—importantly—harbored strong MHC-class I and II binders, thereby increasing the probability of translational success [12,57,58,59].While Khalid et al. [4] made a valuable contribution by applying a reverse vaccinology pipeline to M. genitalium, our study differs significantly in scope and methodology as we have exclusively focused on hypothetical proteins. Moreover, using a sequential, multi-parametric approach encompassing physicochemical profiling, sub-cellular localization, antigenicity, allergenicity, virulence prediction, and epitope mapping (B-cell, MHC-I, MHC-II), we narrowed down 74 HPs to two high-value candidates. These high-scoring epitope regions were further assessed through 3D homology modeling to evaluate surface accessibility and structural feasibility. By performing molecular docking and MD simulations our study demonstrates that Peptide fig|2097.71.peg.1 (UniProt ID: P22747) exhibit stronger and more stable interactions with TLR4 compared to Peptide fig|2097.70.peg.33 (UniProt ID: Q57081), making it a promising candidate for further experimental validation.

Antigenicity screening identified top-ranking HPs whose VaxiJen v2.0 scores exceeded the bacterial threshold, while AllerTOP and ToxinPred excluded sequences with allergenic/toxic signatures, strengthening safety prospects. Since newer platforms (ToxinPred2, AlgPred2) leverage expanded datasets and improved classifiers; while our original screens used widely adopted versions, we plan to incorporate these upgrades for sensitivity analyses in subsequent iterations to further de-risk candidate selection. Virulence screens (VirulentPred/VICMpred) highlighted a subset of HPs with putative roles in pathogenicity; although Pfam v35.0 CDD did not always reveal canonical virulence domains—typical for HPs—membrane association, surface exposure, and epitope density supported their relevance to host–pathogen interaction [60,61]. Where domain calls were equivocal, we prioritized convergent evidence from localization, virulence prediction, and epitope quality, and we now explicitly flag these cases as high-value targets for experimental functional annotation.

Epitope mapping provided convergent immunological evidence. For example, fig|2097.70.peg.33 (UniProt Q57081) harbored linear B-cell regions with favorable surface propensity and flexibility, and yielded top-scoring T-cell epitopes (MHC-I peptide “YTDEKKVPLINY”, score 0.953554; MHC-II peptide “PKPVVDLKPQRIEPR”, score 0.9528) in IEDB (class I/II) analyses. To complement linear predictions, we applied ElliPro to identify conformational B-cell epitopes on modeled structures, thereby integrating sequence- and structure-based antigenicity evidence. We also evaluated epitope conservation using the IEDB conservancy tool and observed high sequence conservation (≥85% identity across available MG sequences) for several T- and B-cell epitopes from Q57081, which supports their suitability as cross-strain targets despite genomic variability [57,58,59]. However, deeper allele-stratified population coverage and broader clinical-isolate surveys remain priorities and will be added in a future update to refine geographic/ethnic applicability.

To strengthen structural confidence, we modeled immunogenic peptide segments from the two top-ranking proteins (fig|2097.71.peg.1, UniProt P22747; fig|2097.70.peg.33, UniProt Q57081) using Swiss-Model and performed standard quality checks. Ramachandran plots showed >90% residues in favored/allowed regions, ProSA Z-scores fell within the range of experimentally solved peptides of comparable size, and ERRAT indicated acceptable non-bonded interaction profiles, collectively supporting structural plausibility and epitope presentation. These data complement our ElliPro results and increase confidence that the predicted epitopes can adopt immunogenic conformations. While our workflow employs machine-learned predictors (e.g., VaxiJen v2.0, SignalP-6.0, DeepTMHMM), it does not yet incorporate de novo deep-learning 3D modeling (e.g., AlphaFold2) or end-to-end AI ensemble ranking. Integrating such models for full-length HPs is part of our planned pipeline expansion.

Comparison with prior reverse vaccinology work underscores two contributions. First, our focus on HPs extends beyond the well-studied MgPa adhesins that dominate MG antigen literature, thereby broadening the antigenic space and potentially mitigating issues of immune escape tied to hypervariable adhesins [7,62]. Second, unlike single-metric screens, our multi-criterion ranking (localization, stability, antigenicity, allergenicity, virulence, B/T-epitopes, and conservation) offers a more stringent filter before experimental investment [12,57,59]. Parallel findings in other bacteria suggest this strategy is productive: in Klebsiella pneumoniae, systematic HP mining nominated essential and surface-exposed proteins as putative drug/vaccine targets [63], and gain-of-function screens in Burkholderia pseudomallei uncovered HPs with anti-macrophage activity linked to virulence [64]. Our recent immuno-informatics study in Neisseria gonorrhoeae likewise identified promising HP-derived epitopes, lending cross-pathogen support for HP-centric antigen discovery [6].

By prioritizing surface/extracellular proteins we intentionally de-emphasized cytoplasmic antigens; however, cytosolic proteins released during infection or lysis can be immunogenic and contribute to protection in other bacteria. Future iterations will include a “conditional cytosolic” tier (e.g., moonlighting proteins, non-classically secreted antigens) with additional filters to manage specificity. Reverse vaccinology and in silico epitope discovery are hypothesis-generating; predictions require orthogonal validation (ELISA with synthetic peptides, T-cell proliferation/ELISpot, cytokine profiling by flow cytometry, and bactericidal/neutralization assays), followed by in vivo testing in mouse models using appropriate adjuvants and dosing regimens.To minimize autoimmunity risk, we screened candidates against human homologs and will extend this with whole-proteome similarity scans and tolerance-risk heuristics in the next phase. Population coverage analyses will be reported with quantitative HLA allele metrics (global/regional) and uncertainty bounds in future versions to meet field standards established in recent vaccine modeling studies.

Positioning relative to known antigens and hybrid strategies;Focusing solely on HPs may overlook valuable, experimentally validated antigens (e.g., MgPa). A hybrid strategy that combines conserved, high-scoring HP epitopes with benchmark MG antigens could improve breadth and robustness. We therefore plan comparative antigenicity/immune-simulation analyses (including epitope overlap) between HPs and MgPa-like proteins to guide multi-epitope construct design. Where KEGG/STRING annotations are sparse (a common limitation for MG HPs), we will prioritize experimental functional assays and proteomics to place leading HPs in pathway context and to detect infection-stage expression/processing, thereby connecting immunogenicity to biological role.

Although in silico approaches facilitate the rapid prioritization of putative vaccine candidates, rigorous experimental validation is indispensable to ascertain the immunogenicity and protective efficacy of the predicted epitopes.Keeping that in mind, we outline an experimental validation path: (i) synthesize top epitopes; (ii) confirm IgG/IgA reactivity in ELISA using patient/animal sera; (iii) validate T-cell activation (ELISpot/flow cytometry for IFN-γ, IL-2, TNF-α); (iv) assess MHC restriction with tetramers; (v) test multiepitope constructs incorporating strong adjuvants (e.g., TLR agonists) and PADRE/linkers; and (vi) evaluate protection and bacterial load reduction in murine infection models. Structural refinement of longer HP fragments (I-TASSER/RaptorX, GalaxyRefine) and expanded validation (MolProbity/PROCHECK) will complement our current Swiss-Model + Ramachandran/ProSA/ERRAT checks, and AlphaFold2 predictions will be explored for full-length HPs to better contextualize conformational epitopes.Finally, we will broaden isolate sampling for epitope conservation, include Mollicutes-level comparisons (e.g., M. pneumoniae, M. hominis), and report quantitative population coverage by geography and ethnicity to support equitable vaccine design.

In summary, this study advances MG vaccine discovery by systematically elevating hypothetical proteins—validated in silico for accessibility, safety, virulence association, and epitope quality—as credible antigen candidates. By coupling conservative exclusion criteria with orthogonal epitope evidence and initial conservation analysis, we deliver a tractable short-list for laboratory validation while transparently acknowledging the limits of prediction-only approaches. As antimicrobial resistance in MG worsens, diversifying beyond canonical antigens and integrating HPs within hybrid vaccine designs may yield broader, more durable protection [55,56,57,58,59,62,63,64].

5. Conclusions

In conclusion, this study identifies a novel set of hypothetical proteins from M. genitalium with strong immunological potential, particularly highlighting fig|2097.71.peg.1 (UniProt ID: P22747) and fig|2097.70.peg.33 (UniProt ID: Q57081) as strong vaccine candidates based on combined immune-informatics, structural, and antigenic evidence. In addition, we highlight the potential of hypothetical proteins as novel vaccine candidates. Furthermore, the AI-augmented reverse vaccinology pipeline established in this work can be adapted and applied to other emerging and neglected pathogens facing similar challenges of limited treatment options and rising resistance. By addressing key gaps in the current understanding of M. genitalium and providing a focused list of candidates for further investigation, this work lays the foundation for developing effective vaccines to mitigate the growing burden of M. genitalium infections and contribute broadly to the fight against sexually transmitted diseases.

Limitations of Study

Despite the strengths of the reverse vaccinology approach, it is important to recognize its inherent limitations. This strategy relies heavily on computational predictions, which are influenced by the quality and completeness of available genome and protein databases, as well as the accuracy of bioinformatic algorithms. Consequently, some potentially important vaccine targets—especially those that are intracellular but may indirectly contribute to virulence—could be excluded during early filtering steps such as sub-cellular localization screening. Additionally, predicted antigenicity and epitope binding scores do not always correlate with immunogenicity in vivo, as epitope processing, presentation, and recognition can vary significantly between hosts due to differences in MHC haplotypes and immune landscapes. Furthermore, the in silico models often fail to capture conformational epitopes or dynamic aspects of protein folding that may influence immune recognition. Despite these challenges, reverse vaccinology remains a powerful and efficient preliminary tool for narrowing large proteomes to a subset of promising candidates for further experimental validation.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/venereology4030014/s1, Figure S1: Figure showing the stepwise pipeline used in this study—from proteome retrieval and filtration steps to obtain Great degree of confidence hypothetical proteins (HPs) to functional annotation, subcellular localization, antigenicity assessment, epitope prediction.; Figure S2: Structural validation of the predicted peptide models; Figure S3: Predicted linear B-cell epitopes for fig|2097.70.peg.1.; Figure S4: Predicted T-cell epitopes for fig|2097.70.peg.1.; Figure S5: Population coverage analysis of predicted epitopes based on HLA distribution; Table S1: Bepipred 2.0 linear B-cell epitope prediction results for all 23 GDC hypothetical proteins; Table S2: Prediction of potential T-cell epitopes predicted for GDC hypothetical proteins; Table S3: Epitope conservancy analysis of predicted epitopes.; Table S4: Predicted population coverage of predicted epitopes across diverse HLA alleles.

Author Contributions

Conceptualization, J.T. and D.S.; Writing—original draft, J.T. and R.K.; Methodology, J.T., R.K. and D.S.; Formal analysis and Data curation, R.K. and J.T.; Reviewing &Editing, R.K., J.T. and D.S., Visualization, R.K., J.T. and D.S.; Supervision, D.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable as this study did not involve humans or animals.

Informed Consent Statement

Not applicable as this study did not involve any human participants.

Data Availability Statement

Data is provided within the manuscript and Supplementary Materials.

Acknowledgments

Jyoti Taneja gratefully acknowledges Savita Roy (Principal, Daulat Ram College, University of Delhi) for her logistical support, computational facility and cooperation. Jyoti Taneja thanks CSIR for providing logistic support. Ravi Kant thankfully acknowledges the School of Clinical & Experimental Sciences at Faculty of Medicine, University of Southampton and Dr. B.R. Ambedkar Centre for Biomedical Research, University of Delhi for computational resources and IT support. Daman Saluja acknowledges Dr. B.R. Ambedkar Centre for Biomedical Research, University of Delhi and Department of Biotechnology for logistic support (Project No. BT/PR40195/BTIS/137/58/2023 for Bioinformatics Facility (DBT-BIF).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Sethi, S.; Singh, G.; Samanta, P.; Sharma, M. Mycoplasma genitalium: An emerging sexually transmitted pathogen. Indian. J. Med. Res. 2012, 136, 942–955. [Google Scholar]
Ona, S.; Molina, R.L.; Diouf, K. Mycoplasma genitalium: An Overlooked Sexually Transmitted Pathogen in Women? Infect. Dis. Obstet. Gynecol. 2016, 2016, 4513089. [Google Scholar] [CrossRef]
Doelman, T.A.; Adriaens, N.; Westerhuis, B.M.; Bruisten, S.M.; Vergunst, C.E.; Bouwman, F.M.; van Dam, A.P. Phenotypic antibiotic resistance of Mycoplasma genitalium and its variation between different macrolide resistance-associated mutations. J. Antimicrob. Chemother. 2025, 80, 465–471. [Google Scholar] [CrossRef] [PubMed]
Khalid, K.; Hussain, T.; Jamil, Z.; Alrokayan, K.S.; Ahmad, B.; Waheed, Y. Vaccinomics-Aided Development of a Next-Generation Chimeric Vaccine against an Emerging Threat: Mycoplasma genitalium. Vaccines 2022, 10, 1720. [Google Scholar] [CrossRef] [PubMed]
Razin, S.; Yogev, D.; Naot, Y. Molecular biology and pathogenicity of mycoplasmas. Microbiol. Mol. Biol. Rev. 1998, 62, 1094–1156. [Google Scholar] [CrossRef] [PubMed]
Kant, R.; Khan, M.S.; Chopra, M.; Saluja, D. Artificial intelligence-driven reverse vaccinology for Neisseria gonorrhoeae vaccine: Prioritizing epitope-based candidates. Front. Mol. Biosci. 2024, 11, 1442158. [Google Scholar] [CrossRef]
Yueyue, W.; Feichen, X.; Yixuan, X.; Lu, L.; Yiwen, C.; Xiaoxing, Y. Pathogenicity and virulence of Mycoplasma genitalium: Unraveling Ariadne’s Thread. Virulence 2022, 13, 1161–1183. [Google Scholar] [CrossRef]
Rappuoli, R. Reverse vaccinology, a genome-based approach to vaccine development. Vaccine 2001, 19, 2688–2691. [Google Scholar] [CrossRef]
Gloanec, N.; Guyard-Nicodème, M.; Chemaly, M.; Dory, D. Reverse vaccinology: A strategy also used for identifying potential vaccine antigens in poultry. Vaccine 2025, 48, 126756. [Google Scholar] [CrossRef]
Chen, L.; Li, Q.; Nasif, K.F.A.; Xie, Y.; Deng, B.; Niu, S.; Pouriyeh, S.; Dai, Z.; Chen, J.; Xie, C.Y. AI-Driven Deep Learning Techniques in Protein Structure Prediction. Int. J. Mol. Sci. 2024, 25, 8426. [Google Scholar] [CrossRef]
Olawade, D.B.; Teke, J.; Fapohunda, O.; Weerasinghe, K.; Usman, S.O.; Ige, A.O.; David-Olawade, A.C. Leveraging artificial intelligence in vaccine development: A narrative review. J. Microbiol. Methods 2024, 224, 106998. [Google Scholar] [CrossRef]
Fatoba, A.J.; Okpeku, M.; Adeleke, M.A. Subtractive Genomics Approach for Identification of Novel Therapeutic Drug Targets in Mycoplasma genitalium. Pathogens 2021, 10, 921. [Google Scholar] [CrossRef] [PubMed]
Nogueira, W.G.; Jaiswal, A.K.; Tiwari, S.; Ramos, R.T.; Ghosh, P.; Barh, D.; Azevedo, V.; Soares, S.C. Computational identification of putative common genomic drug and vaccine targets in Mycoplasma genitalium. Genomics 2021, 113, 2730–2743. [Google Scholar] [CrossRef] [PubMed]
Jansen, K.U.; Gruber, W.C.; Simon, R.; Wassil, J.; Anderson, A.S. The impact of human vaccines on bacterial antimicrobial resistance. A review. Environ. Chem. Lett. 2021, 19, 4031–4062. [Google Scholar] [CrossRef] [PubMed]
Expasy—SIB Swiss Institute of Bioinformatics. Available online: https://www.expasy.org/ (accessed on 5 July 2025).
GRAVY Calculator. Available online: https://www.gravy-calculator.de/ (accessed on 5 July 2025).
BLAST: Basic Local Alignment Search Tool. Available online: https://blast.ncbi.nlm.nih.gov/Blast.cgi (accessed on 5 July 2025).
Yu, C.S.; Cheng, C.W.; Su, W.C.; Chang, K.C.; Huang, S.W.; Hwang, J.K.; Lu, C.H. CELLO V2.5CELLO V2.5 V2.52GO: A web server for protein sub-cellular Localization prediction with functional gene ontology annotation. PLoS ONE 2014, 9, e99368. [Google Scholar] [CrossRef]
Yu, N.Y.; Wagner, J.R.; Laird, M.R.; Melli, G.; Rey, S.; Lo, R.; Dao, P.; Sahinalp, S.C.; Ester, M.; Foster, L.J.; et al. PSORTb 3.0: Improved protein sub-cellular localization prediction with refined localization subcategories and predictive capabilities for all prokaryotes. Bioinformatics 2010, 26, 1608–1615. [Google Scholar] [CrossRef]
TMHMM 2.0—DTU Health Tech—Bioinformatic Services. Available online: https://services.healthtech.dtu.dk/services/TMHMM-2.0/ (accessed on 5 July 2025).
Teufel, F.; Almagro Armenteros, J.J.; Johansen, A.R.; Gíslason, M.H.; Pihl, S.I.; Tsirigos, K.D.; Winther, O.; Brunak, S.; von Heijne, G.; Nielsen, H. SignalP 6.0 predicts all five types of signal peptides using protein language models. Nat. Biotechnol. 2022, 40, 1023–1025. [Google Scholar] [CrossRef]
Tusnády, G.E.; Simon, I. The HMMTOP transmembrane topology prediction server. Bioinformatics 2001, 17, 849–850. [Google Scholar] [CrossRef]
Imai, K.; Asakawa, N.; Tsuji, T.; Akazawa, F.; Ino, A.; Sonoyama, M.; Mitaku, S. SOSUI-GramN: High performance prediction for sub-cellular localization of proteins in gram-negative bacteria. Bioinformation 2008, 2, 417–421. [Google Scholar] [CrossRef]
Hallgren, J.; Tsirigos, K.D.; Pedersen, M.D.; Almagro Armenteros, J.J.; Marcatili, P.; Nielsen, H.; Krogh, A.; Winther, O. DeepTMHMM predicts alpha and beta transmembrane proteins using deep neural networks. bioRxiv 2022. [Google Scholar] [CrossRef]
Sharma, A.; Garg, A.; Ramana, J.; Gupta, D. VirulentPred 2.0: An improved method for prediction of virulent proteins in bacterial pathogens. Protein Sci. 2023, 32, e4808. [Google Scholar] [CrossRef] [PubMed]
Saha, S.; Raghava, G.P.S. VICMpred: An SVM-based method for the prediction of functional proteins of Gram-negative bacteria using amino acid patterns and composition. Genom. Proteom. Bioinform. 2006, 4, 42–47. [Google Scholar] [CrossRef] [PubMed]
Mistry, J.; Chuguransky, S.; Williams, L.; Qureshi, M.; Salazar, G.A.; Sonnhammer, E.L.; Tosatto, S.C.; Paladin, L.; Raj, S.; Richardson, L.J.; et al. Pfam v35.0Pfam v35.0 v35.0: The protein families database in 2021. Nucleic Acids Res. 2021, 49, D412–D419. [Google Scholar] [CrossRef] [PubMed]
Letunic, I.; Khedkar, S.; Bork, P. SMART: Recent updates, new developments and status in 2020. Nucleic Acids Res. 2021, 49, D458–D460. [Google Scholar] [CrossRef]
Conserved Domains Database (CDD) and Resources. Available online: https://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml (accessed on 5 July 2025).
De Castro, E.; Sigrist, C.J.; Gattiker, A.; Bulliard, V.; Langendijk-Genevaux, P.S.; Gasteiger, E.; Bairoch, A.; Hulo, N. ScanProsite: Detection of PROSITE signature matches and ProRule-associated functional and structural residues in proteins. Nucleic Acids Res. 2006, 34, W362–W365. [Google Scholar] [CrossRef]
Doytchinova, I.A.; Flower, D.R. VaxiJen v2.0VaxiJen v2.0 v2.0: A server for prediction of protective antigens, tumour antigens and subunit vaccines. BMC Bioinform. 2007, 8, 4. [Google Scholar] [CrossRef]
Dimitrov, I.; Bangov, I.; Flower, D.R.; Doytchinova, I. AllerTOP v.2—A server for in silico prediction of allergens. J. Mol. Model. 2014, 20, 2278. [Google Scholar] [CrossRef]
Rathore, A.S.; Choudhury, S.; Arora, A.; Tijare, P.; Raghava, G.P.S. ToxinPred 3.0: An improved method for predicting the toxicity of peptides. Comput. Biol. Med. 2024, 179, 108926. [Google Scholar] [CrossRef]
Pan, X.; Zuallaert, J.; Wang, X.; Shen, H.B.; Campos, E.P.; Marushchak, D.O.; De Neve, W. ToxDL: Deep learning using primary structure and domain embeddings for assessing protein toxicity. Bioinformatics 2021, 36, 5159–5168. [Google Scholar] [CrossRef]
Waterhouse, A.; Bertoni, M.; Bienert, S.; Studer, G.; Tauriello, G.; Gumienny, R.; Heer, F.T.; de Beer, T.A.P.; Rempfer, C.; Bordoli, L.; et al. SWISS-MODEL: Homology modelling of protein structures and complexes. Nucleic Acids Res. 2018, 46, W296–W303. [Google Scholar] [CrossRef]
Yuan, S.; Chan, H.S.; Hu, Z. Using PyMOL as a Platform for Computational Drug Design. Wiley Interdiscip. Rev. Comput. Mol. Sci. 2017, 7, e1298. [Google Scholar] [CrossRef]
PyMOL|Pymol.org. Available online: https://www.pymol.org/ (accessed on 5 July 2025).
Jespersen, M.C.; Peters, B.; Nielsen, M.; Marcatili, P. BepiPred-2.0: Improving sequence-based B-cell epitope prediction using conformational epitopes. Nucleic Acids Res. 2017, 45, W24–W29. [Google Scholar] [CrossRef]
Immune Epitope Database and Analysis Resource (IEDB)|NIAID: National Institute of Allergy and Infectious Diseases. Available online: https://www.niaid.nih.gov/research/immune-epitope-database (accessed on 5 July 2025).
Vita, R.; Blazeska, N.; Marrama, D.; IEDB Curation Team Members; Duesing, S.; Bennett, J.; Greenbaum, J.; De Almeida Mendes, M.; Mahita, J.; Wheeler, D.K.; et al. The Immune Epitope Database (IEDB): 2024 update. Nucleic Acids Res. 2025, 53, D436–D443. [Google Scholar] [CrossRef]
Hoof, I.; Peters, B.; Sidney, J.; Pedersen, L.E.; Sette, A.; Lund, O.; Buus, S.; Nielsen, M. NetMHCpan, a method for MHC class I binding prediction beyond humans. Immunogenetics 2009, 61, 1–13. [Google Scholar] [CrossRef]
Han, J.; Kim, H.J.; Lee, S.C.; Hong, S.; Park, K.; Jeon, Y.H.; Kim, D.; Cheong, H.K.; Kim, H.S. Structure-Based Rational Design of a Toll-like Receptor 4 (TLR4) Decoy Receptor with High Binding Affinity for a Target Protein. PLoS ONE 2012, 7, e30929. [Google Scholar] [CrossRef] [PubMed]
Lamiable, A.; Thévenet, P.; Rey, J.; Vavrusa, M.; Derreumaux, P.; Tufféry, P. PEP-FOLD3: Faster de novo structure prediction for linear peptides in solution and in complex. Nucleic Acids Res. 2016, 44, W449–W454. [Google Scholar] [CrossRef]
Yan, Y.; Tao, H.; He, J.; Huang, S.-Y. The HDOCK server for integrated protein–protein docking. Nat. Protoc. 2020, 15, 1829–1852. [Google Scholar] [CrossRef] [PubMed]
Rashidieh, B.; Valizadeh, M.; Assadollahi, V.; Ranjbar, M.M. Molecular dynamics simulation on the low sensitivity of mutants of NEDD-8 activating enzyme for MLN4924 inhibitor as a cancer drug. Am. J. Cancer Res. 2015, 5, 3400–3406. [Google Scholar]
Abraham, M.J.; Murtola, T.; Schulz, R.; Páll, S.; Smith, J.C.; Hess, B.; Lindahl, E. GROMACS: High performance molecular simulations through multi-level parallelism from laptops to supercomputers. SoftwareX 2015, 1–2, 19–25. [Google Scholar] [CrossRef]
Essmann, U.; Perera, L.; Berkowitz, M.L.; Darden, T.; Lee, H.; Pedersen, L.G. A smooth particle mesh Ewald method. J. Chem. Phys. 1995, 103, 8577–8593. [Google Scholar] [CrossRef]
Hess, B.; Bekker, H.; Berendsen, H.J.C.; Fraaije, J.G.E.M. LINCS: A linear constraint solver for molecular simulations. J. Comput. Chem. 1997, 18, 1463–1472. [Google Scholar] [CrossRef]
David, C.C.; Jacobs, D.J. Principal Component Analysis: A Method for Determining the Essential Dynamics of Proteins. Methods Mol Biol. 2014, 1084, 193–226. [Google Scholar]
Jaadi, Z. Principal Component Analysis (PCA): Explained Step-by-Step. Built In, 1 April 2021. Available online: https://builtin.com/data-science/step-step-explanation-principal-component-analysis (accessed on 29 August 2025).
Adelusi, T.I.; Bolaji, O.Q.; Ojo, T.O.; Adegun, I.P.; Adebodun, S. Molecular Mechanics with Generalized Born Surface Area (MMGBSA) Calculations and Docking Studies Unravel some Antimalarial Compounds Using Heme O Synthase as Therapeutic Target. ChemistrySelect 2023, 8, e202303686. [Google Scholar] [CrossRef]
Sobolev, O.V.; Afonine, P.V.; Moriarty, N.W.; Hekkelman, M.L.; Joosten, R.P.; Perrakis, A.; Adams, P.D. A global Ramachandran score identifies protein structures with unlikely stereochemistry. Structure 2020, 28, 1249–1258.e2. [Google Scholar] [CrossRef] [PubMed]
Wiederstein, M.; Sippl, M.J. ProSA-Web: Interactive Web Service for the Recognition of Errors in Three-Dimensional Structures of Proteins. Nucleic Acids Res. 2007, 35, W407–W410. [Google Scholar] [CrossRef] [PubMed]
Colovos, C.; Yeates, T.O. Verification of protein structures: Patterns of nonbonded atomic interactions. Protein Sci. 1993, 2, 1511–1519. [Google Scholar] [CrossRef] [PubMed]
Haggerty, C.L.; Taylor, B.D. Mycoplasma genitalium: An emerging cause of pelvic inflammatory disease. Infect. Dis. Obstet. Gynecol. 2011, 2011, 959816. [Google Scholar] [CrossRef]
Barik, K.; Arya, P.K.; Singh, A.K.; Kumar, A. Potential therapeutic targets for combating Mycoplasma genitalium. 3 Biotech 2023, 13, 9. [Google Scholar] [CrossRef]
Duan, Y.; Hao, Y.; Feng, H.; Shu, J.; He, Y. Research progress on Haemophilus parasuis vaccines. Front. Vet. Sci. 2025, 12, 1492144. [Google Scholar] [CrossRef]
Mukhopadhyay, H.; Bairagi, A.; Mukherjee, A.; Prasad, A.K.; Roy, A.D.; Nayak, A. Multidrug Resistant Acinetobacter Baumannii: A Study on Its Pathogenesis and Therapeutics. Curr. Res. Microb. Sci. 2025, 8, 100331. [Google Scholar] [CrossRef]
Calderwood, S.K.; Gong, J.; Murshid, A. Extracellular HSPs: The Complicated Roles of Extracellular HSPs in Immunity. Front. Immunol. 2016, 7, 159. [Google Scholar] [CrossRef]
Sen, T.; Verma, N.K. Functional Annotation and Curation of Hypothetical Proteins Present in A Newly Emerged Serotype 1c of Shigella flexneri: Emphasis on Selecting Targets for Virulence and Vaccine Design Studies. Genes 2020, 11, 340. [Google Scholar] [CrossRef] [PubMed]
Lazar, V.; Oprea, E.; Ditu, L.-M. Resistance, Tolerance, Virulence and Bacterial Pathogen Fitness-Current State and Envisioned Solutions for the Near Future. Pathogens 2023, 12, 746. [Google Scholar] [CrossRef] [PubMed]
Guarra, F.; Colombo, G. Computational Methods in Immunology and Vaccinology: Design and Development of Antibodies and Immunogens. J. Chem. Theory Comput. 2023, 19, 5315–5333. [Google Scholar] [CrossRef] [PubMed]
Pranavathiyani, G.; Prava, J.; Rajeev, A.C.; Pan, A. Novel Target Exploration from Hypothetical Proteins of Klebsiella pneumoniae MGH 78578 Reveals a Protein Involved in Host-Pathogen Interaction. Front. Cell. Infect. Microbiol. 2020, 10, 109. [Google Scholar] [CrossRef]
Dowling, A.J. Novel gain of function approaches for vaccine candidate identification in Burkholderia pseudomallei. Front. Cell. Infect. Microbiol. 2012, 2, 139. [Google Scholar] [CrossRef]

Figure 1. Functional annotation of hypothetical proteins in M. genitalium. (a) Sub-cellular localization distribution of hypothetical proteins predicted using PSORTb. The plot slices represent the number of proteins as per the predicted localization categories (extracellular, cytoplasmic, and membrane-associated). (b) Functional annotation and domain prediction of hypothetical proteins. Conserved domains were identified using CDD, Pfam v35.0, and ScanProsite. The color codes represent distinct functional categories (enzymes, transporters, binding proteins, and uncharacterized domains), allowing classification of proteins into known functional families.

Figure 2. Homology modeling of prioritized peptide regions from M. genitalium hypothetical proteins. (a) Modeled 3D structure of the 41-amino acid peptide from fig|2097.71.peg.1 (UniProt ID: P22747). (b) Modeled 3D structure of the 47-amino acid peptide from fig|2097.70.peg.33 (UniProt ID: Q57081). Both models were generated using Swiss-Model. Structural elements are highlighted with α-helices shown in red, loops in blue. The models suggest structurally stable conformations suitable for further evaluation as vaccine targets.

Figure 3. Predicted linear B-cell epitopes using BepiPred-2.0of the fig|2097.70.peg.33 (UniProt ID: Q57081). Yellow peaks indicate high-scoring epitope regions, with greater immunogenic potential, while the baseline represents non-epitope regions (a–c).

Figure 4. Predicted T-cell epitopes of fig|2097.70.peg.33 (UniProt ID: Q57081) using IEDB tools. Yellow color regions indicate high-affinity MHC-binding epitopes, with stronger binding scores representing greater immunogenic potential for eliciting T-cell responses (a–c).

Figure 5. Figure illustrating the molecular docking of fig|2097.70.peg.33 (UniProt ID: Q57081) with human TLR4 (a) and fig|2097.71.peg.1 (UniProt ID: P22747) with human TLR4 (b). The yellow-colored surface conformation shows the respective peptides docked at the rainbow-colored receptor molecule.

Figure 6. Plots to investigate the conformational stability, dynamics, and interactions of TLR4 bound to peptides: (a) represents RMSD, (b) represents RMS fluctuations, (c) radius of gyration, and (d) hydrogen bonds. The red color shows the fig|2097.70.peg.33 (UniProt ID: Q57081) bound with TLR4, whereas the green colors represent fig|2097.71.peg.1 (UniProt ID: P22747)—TLR4 complex.

Figure 7. Principal component analysis (PCA) 2D scatter plot representing the projection of principal component 1 (PC1) and principal component 2 (PC2), depicting the dominant motions of the Cα atoms of the TLR4 receptor in complex with the two docked peptides. The distribution of conformations highlights the dynamic fluctuations and conformational space explored during the molecular dynamics simulations.

Figure 8. Binding free energy calculations using the MM-GBSA (Molecular Mechanics/Generalized Born Surface Area) method for two peptide–receptor complexes. The energy components are as follows: ΔEvdW (van der Waals energy), ΔEele (electrostatic energy), ΔEGB (polar solvation free energy from the Generalized Born model), ΔESURF (non-polar solvation free energy from solvent-accessible surface area), and ΔGMM-GBSA (total binding free energy). Brown represents Complex 1, whereas green represents Complex 2.

Table 1. Physicochemical properties of the selected hypothetical proteins. The instability index is used to classify protein stability; proteins with values < 40 are considered stable and are marked accordingly.

S. No.	Seq.ID	No. of Amino Acid Residues	Mol. Wt.	Theoretical pI	Total Number of Negatively Charged Residues	Total Number of Positively Charged Residues	Extinction Coefficients	Instability Index	Aliphatic Index	Grand Average of Hydropathicity (GRAVY)
1	fig\|2097.70.peg.1	193	20,901.76	7.02	16	16	26,930	20.04	58.08	−0.628
2	fig\|2097.70.peg.2	87	10,066.52	5.36	11	8	5960	21.22	106.32	−0.272
3	fig\|2097.70.peg.3	286	33,582.61	5.98	50	49	32,430	37.88	69.97	−1.049
4	fig\|2097.70.peg.12	129	15,317.56	5.09	18	14	7450	26.78	87.67	−0.195
5	fig\|2097.70.peg.28	835	96,004.28	4.15	168	69	67,285	51.44	67.05	−0.918
6	fig\|2097.70.peg.32	320	37,185.56	9.89	8	27	59,485	26.17	129.75	0.736
7	fig\|2097.70.peg.33	599	68,777.93	7.31	74	74	55,240	55.84	71.8	−0.907
8	fig\|2097.70.peg.34	218	24,708.47	9.81	15	20	6990	53.28	88.53	−0.417
9	fig\|2097.70.peg.35	178	20,555.91	9.83	14	24	11,920	49.76	89.33	−0.043
10	fig\|2097.70.peg.36	286	32,450.21	9.8	13	21	47,900	31.56	126.47	0.735
11	fig\|2097.70.peg.40	59	7252.32	10.71	1	9	23,950	23.22	59.66	−0.605
12	fig\|2097.70.peg.41	254	29,759.81	9.69	12	24	50,880	16.96	101.65	0.426
13	fig\|2097.70.peg.46	756	88,407.1	4.61	164	101	39,880	39.47	86.56	−1.028
14	fig\|2097.70.peg.49	212	25,044.97	9.49	22	31	30,940	36.27	91.04	−0.338
15	fig\|2097.70.peg.60	336	37,119.09	9.16	27	33	63,370	27.92	64.14	−0.678
16	fig\|2097.71.peg.1	409	45,698.09	9.06	33	39	73,340	26.13	73.67	−0.535
17	fig\|2097.71.peg.2	1444	159,334.7	8.56	131	137	215,090	30.96	74.25	−0.489
18	fig\|2097.71.peg.9	92	10,079.51	9.44	4	8	8940	18.16	75.22	−0.311
19	fig\|2097.71.peg.1	259	28,803.14	8.87	23	27	45,045	34.68	71.08	−0.727
20	fig\|2097.71.peg.1	319	35,001.18	9.52	28	38	33,460	33.04	63.51	−0.732
21	fig\|2097.71.peg.48	154	18,148.6	5.38	13	12	24,410	23.09	129.22	0.763
22	fig\|2097.71.peg.50	409	48,520.11	8.69	61	67	37,275	32.36	97.02	−0.59
23	fig\|2097.71.peg.51	375	43,187.64	9.36	21	33	38,390	30.73	125.49	0.74
24	fig\|2097.71.peg.54	279	31,687.37	10.15	18	40	40,450	36.16	104.44	0.016
25	fig\|2097.71.peg.57	90	10,472.54	9.85	9	18	3105	17.2	120.22	−0.258
26	fig\|2097.71.peg.59	1113	130,580.1	6.81	146	144	157,945	34.53	93.44	−0.393
27	fig\|2097.71.peg.60	108	11,335.68	9.8	4	10	5960	22.38	67.78	−0.501
28	fig\|2097.71.peg.61	123	12,983.36	9.57	8	12	15,470	43.25	64.31	−0.68
29	fig\|2097.71.peg.62	40	4645.41	6.04	5	5	5500	38.74	109.5	−0.185
30	fig\|2097.71.peg.63	176	19,106.87	6.31	14	13	26,930	16.55	59.83	−0.569
31	fig\|2097.69.peg.1	295	32,373.67	8.51	24	26	60,390	30.86	62.85	−0.633
32	fig\|2097.69.peg.2	67	7973.49	10.01	5	11	17,990	10.21	113.28	−0.021
33	fig\|2097.69.peg.3	196	21,380.8	9.43	14	19	27,960	42.28	68.16	−0.591
34	fig\|2097.69.peg.4	137	14,915.57	6.15	20	19	6990	48.74	58.32	−0.932
35	fig\|2097.69.peg.6	196	23,299.79	9.52	21	31	35,870	26.84	89.49	−0.626
36	fig\|2097.69.peg.7	347	40,052.08	5.92	40	36	48,820	34.76	86.8	−0.467
37	fig\|2097.69.peg.8	113	13,267.1	8.95	13	16	19,940	41.21	84.6	−0.566
38	fig\|2097.69.peg.10	44	5011.15	12.01	1	7	NA *	17.28	152.95	0.648
39	fig\|2097.69.peg.12	556	62,239.63	6.59	65	64	62,690	26.98	78.2	−0.474
40	fig\|2097.69.peg.13	262	29,206.1	6.76	28	28	10,430	32.75	94.2	−0.154
41	fig\|2097.69.peg.14	218	24,887.78	9.43	19	26	14,900	26.18	105.96	−0.041
42	fig\|2097.69.peg.16	970	108,126.5	8.59	70	76	131,015	28.48	102.71	0.166
43	fig\|2097.69.peg.25	340	39,661.11	8.57	52	55	15,930	40.38	82.91	−0.843
44	fig\|2097.69.peg.25	340	39,661.11	8.57	52	55	15,930	40.38	82.91	−0.843
45	fig\|2097.69.peg.27	115	13,060.69	9.7	3	7	22,460	25.83	133.13	1.027
46	fig\|2097.69.peg.34	76	9189.53	7.97	18	19	NA *	54.49	73.03	−1.301
47	fig\|2097.69.peg.35	83	8385.38	10	2	8	2980	24.3	70.6	−0.447
48	fig\|2097.69.peg.36	173	19,009.5	9.82	11	23	39,085	39.31	69.88	−0.657
49	fig\|2097.69.peg.37	162	18,141.3	9.4	12	16	27,960	32.91	73.4	−0.591
50	fig\|2097.69.peg.38	135	14,805.63	7.87	21	22	6990	53.49	61.41	−0.897
51	fig\|2097.69.peg.43	256	30,481.32	9.77	17	29	42,400	22.24	99.3	0.094
52	fig\|2097.69.peg.44	550	64,213.61	9.1	50	61	67,630	30.66	100.27	−0.039
53	fig\|2097.69.peg.56	226	26,256.39	9.43	14	22	26,720	25.91	127.26	0.617
54	fig\|2097.69.peg.57	630	74,234.51	6.24	83	79	77,030	42.19	100.98	−0.23
55	fig\|2097.69.peg.58	620	72,815.2	8.96	69	80	67,185	29.89	96.66	−0.216
56	fig\|2097.69.peg.62	294	34,572.15	7.69	35	36	13,535	28.33	98.78	−0.235
57	fig\|2097.69.peg.73	85	9072.92	7.88	7	8	13,980	41.4	50.47	−0.831
58	fig\|2097.69.peg.78	411	47,823.61	9.46	37	55	25,445	21.63	96.74	−0.237
59	fig\|2097.69.peg.81	102	11,303.66	9.3	7	10	18,450	35	67.94	−0.31
60	fig\|2097.69.peg.83	345	39,454.58	5.03	53	37	14,440	36.9	94.06	−0.59
61	fig\|2097.69.peg.84	1802	215,902.4	8.66	308	321	82,060	41.81	80.1	−1.164
62	fig\|2097.69.peg.92	147	17,470.55	4.91	32	23	14,440	28.41	79.66	−0.941
63	fig\|2097.69.peg.94	216	25,626.25	8.77	30	34	51,005	33.82	74.95	−0.73
64	fig\|2097.69.peg.98	167	19,345.06	7.12	14	14	27,055	39.95	92.22	−0.421
65	fig\|2097.69.peg.103	122	13,513.01	5.18	18	15	2980	50.58	83.11	−0.676
66	fig\|2097.69.peg.112	42	4607.35	10	1	6	2980	24.91	97.38	−0.388
67	fig\|2097.69.peg.113	167	18,121.11	7.96	15	16	22,460	42.51	70.72	−0.762
68	fig\|2097.69.peg.114	47	5275.49	9.52	2	5	11,000	10.95	142.77	0.84
69	fig\|2097.69.peg.115	165	18,406.57	9.16	13	16	27,960	34.63	69.09	−0.602
70	fig\|2097.69.peg.116	77	8395.48	9.35	11	14	5500	32.06	60.91	−0.906
71	fig\|2097.69.peg.117	59	7015.26	10.01	5	12	23,490	33.69	85.59	−0.58
72	fig\|2097.69.peg.118	33	4217.09	10.67	0	11	11,460	69.82	73.94	−1.364
73	fig\|2097.69.peg.119	55	6996.4	10.75	4	18	13,980	61.56	70.91	−1.387
74	fig\|2097.69.peg.120	713	76,591.36	5.56	68	62	68,300	27.37	77.64	−0.287

* Not Available.

Table 2. Shortlisted GDC hypothetical proteins (23) withnon-cytoplasmic localization.

S. No.	Seq ID	Final Location	TM Helix	Signal Peptide	Toxicity
S. No.	Seq ID	Final Location	TM Helix	Signal Peptide	ToxinPred	ToxiDL
1	fig\|2097.70.peg.1	Extracellular	No	No	Non-Toxic	Non-Toxic
2	fig\|2097.70.peg.3	Extracellular	No	No	Non-Toxic	Non-Toxic
3	fig\|2097.70.peg.33	Extracellular	No	No	Non-Toxic	Non-Toxic
4	fig\|2097.70.peg.60	Extracellular	No	No	Non-Toxic	Non-Toxic
5	fig\|2097.71.peg.1	Extracellular	Yes	Yes	Non-Toxic	Non-Toxic
6	fig\|2097.71.peg.9	Extracellular	No	No	Non-Toxic	Non-Toxic
7	fig\|2097.71.peg.1	Extracellular	No	No	Non-Toxic	Non-Toxic
8	fig\|2097.71.peg.1	Extracellular	No	No	Non-Toxic	Non-Toxic
9	fig\|2097.71.peg.61	Extracellular	No	No	Non-Toxic	Non-Toxic
10	fig\|2097.71.peg.63	Extracellular	No	No	Non-Toxic	Non-Toxic
11	fig\|2097.69.peg.1	Extracellular	No	No	Non-Toxic	Non-Toxic
12	fig\|2097.69.peg.3	Extracellular	No	No	Non-Toxic	Non-Toxic
13	fig\|2097.69.peg.4	Extracellular	No	No	Non-Toxic	Non-Toxic
14	fig\|2097.69.peg.35	Extracellular	No	No	Non-Toxic	Non-Toxic
15	fig\|2097.69.peg.36	Extracellular	No	No	Non-Toxic	Non-Toxic
16	fig\|2097.69.peg.37	Extracellular	No	No	Non-Toxic	Non-Toxic
17	fig\|2097.69.peg.38	Extracellular	No	No	Non-Toxic	Non-Toxic
18	fig\|2097.69.peg.73	Extracellular	No	No	Non-Toxic	Non-Toxic
19	fig\|2097.69.peg.81	Extracellular	Yes	Yes	Non-Toxic	Non-Toxic
20	fig\|2097.69.peg.113	Extracellular	No	No	Non-Toxic	Non-Toxic
21	fig\|2097.69.peg.115	Extracellular	No	No	Non-Toxic	Non-Toxic
22	fig\|2097.69.peg.116	Extracellular	No	No	Non-Toxic	Non-Toxic
23	fig\|2097.69.peg.117	Extracellular	No	No	Non-Toxic	Non-Toxic

Table 3. Antigenicity, virulence, and allergenicity prediction output of GDC hypothetical proteins (23 HPs).

Seq ID	Virulence	Antigenicity	Antigenicity Scores	Allergenicity
fig\|2097.70.peg.1	Virulent	Antigenic	0.6703	PROBABLE NON-ALLERGEN
fig\|2097.70.peg.33	Virulent	Antigenic	0.6128	PROBABLE NON-ALLERGEN
fig\|2097.69.peg.81	Virulent	Antigenic	0.4724	PROBABLE NON-ALLERGEN
fig\|2097.70.peg.3	Virulent	Antigenic	0.4723	PROBABLE NON-ALLERGEN
fig\|2097.71.peg.63	Virulent	Antigenic	0.472	PROBABLE NON-ALLERGEN
fig\|2097.69.peg.36	Virulent	Antigenic	0.4696	PROBABLE NON-ALLERGEN
fig\|2097.69.peg.113	Virulent	Antigenic	0.463	PROBABLE NON-ALLERGEN
fig\|2097.70.peg.60	Virulent	Antigenic	0.451	PROBABLE NON-ALLERGEN
fig\|2097.69.peg.35	Virulent	Antigenic	0.4507	PROBABLE NON-ALLERGEN
fig\|2097.69.peg.116	Virulent	Antigenic	0.4492	PROBABLE NON-ALLERGEN
fig\|2097.69.peg.115	Virulent	Antigenic	0.4478	PROBABLE NON-ALLERGEN
fig\|2097.71.peg.1	Virulent	Antigenic	0.4312	PROBABLE NON-ALLERGEN
fig\|2097.71.peg.9	Virulent	Antigenic	0.4298	PROBABLE NON-ALLERGEN
fig\|2097.71.peg.1	Virulent	Antigenic	0.4122	PROBABLE NON-ALLERGEN
fig\|2097.69.peg.4	Virulent	Antigenic	0.3864	PROBABLE NON-ALLERGEN
fig\|2097.69.peg.73	Virulent	Antigenic	0.386	PROBABLE NON-ALLERGEN
fig\|2097.71.peg.1	Virulent	Antigenic	0.3701	PROBABLE NON-ALLERGEN
fig\|2097.71.peg.61	Virulent	Antigenic	0.3603	PROBABLE NON-ALLERGEN
fig\|2097.69.peg.38	Virulent	Antigenic	0.3585	PROBABLE NON-ALLERGEN
fig\|2097.69.peg.117	Virulent	Antigenic	0.3464	PROBABLE NON-ALLERGEN
fig\|2097.69.peg.3	Virulent	Antigenic	0.338	PROBABLE NON-ALLERGEN
fig\|2097.69.peg.37	Virulent	Antigenic	0.3189	PROBABLE NON-ALLERGEN
fig\|2097.69.peg.1	Virulent	Antigenic	0.2927	PROBABLE NON-ALLERGEN

Table 4. Bepipred 2.0 linear B-cell epitope prediction results for two top-ranked proteins.

fig\|2097.71.peg.1
S. No.	Start	End	Predicted Peptide Regions	Length
1	5	45	FANTNLDWGENKQKQFVENQLGYKETTSTNSHNFHSKSFTQ	41
2	68	121	GSVGYDSSSSSSSTKDQALAWSTTTSLDSKTGYRDLVTNDTGLNGPINGSFSIQ	54
3	131	159	SGNHTNSSGSSGPIKTAYPVKKDQKSTVK	29
4	161	177	NSLINATPLNSYGDEGI	17
5	188	189	QG	2
fig\|2097.70.peg.33
No.	Start	End	Predicted Peptide Regions	Length
1	5	12	QKAKINKA	8
2	21	22	NK	2
3	35	81	HKNKVHALYQDPESGNIFSLKKRKQLASNYPLFELTSDNPISFTNNI	47

Table 5. Prediction of potential T-cell epitopes for the two top-ranked proteins.

T-Cell Epitope for MHC-I
Seq ID	Start	End	Length	Predicted Peptide Regions	Score
fig\|2097.70.peg.33	118	129	12	YTDEKKVPLINY	0.953554
fig\|2097.71.peg.1	140	148	9	SSGPIKTAY	0.654153
T-Cell Epitope for MHC-II
Seq ID	Start	End	Length	Predicted Peptide Regions	Score
fig\|2097.70.peg.33	317	331	15	PKPVVDLKPQRIEPR	0.9528
fig\|2097.71.peg.1	97	111	15	KTGYRDLVTNDTGLN	0.6913

Table 6. Binding Free Energy of two peptide-receptor complexes.

Peptide-Receptor Complex	ΔEvdW	ΔE_ele	ΔE_GB	ΔE_SURF	ΔG_MM-GBSA
Complex 1	−98.65	−52.88	58.01	−6.68	−100.2
Complex 2	−95.89	−46.23	67.97	−15.08	−89.23

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Taneja, J.; Kant, R.; Saluja, D. Integrating Reverse Vaccinology with Immunoinformatics for Rational Vaccine Target Discovery in Mycoplasma genitalium. Venereology 2025, 4, 14. https://doi.org/10.3390/venereology4030014

AMA Style

Taneja J, Kant R, Saluja D. Integrating Reverse Vaccinology with Immunoinformatics for Rational Vaccine Target Discovery in Mycoplasma genitalium. Venereology. 2025; 4(3):14. https://doi.org/10.3390/venereology4030014

Chicago/Turabian Style

Taneja, Jyoti, Ravi Kant, and Daman Saluja. 2025. "Integrating Reverse Vaccinology with Immunoinformatics for Rational Vaccine Target Discovery in Mycoplasma genitalium" Venereology 4, no. 3: 14. https://doi.org/10.3390/venereology4030014

APA Style

Taneja, J., Kant, R., & Saluja, D. (2025). Integrating Reverse Vaccinology with Immunoinformatics for Rational Vaccine Target Discovery in Mycoplasma genitalium. Venereology, 4(3), 14. https://doi.org/10.3390/venereology4030014

Article Menu

Integrating Reverse Vaccinology with Immunoinformatics for Rational Vaccine Target Discovery in Mycoplasma genitalium

Abstract

1. Introduction

2. Materials and Methods

2.1. Sequence Retrieval

2.2. Sequence Homology Search

2.3. Sub-Cellular Localization Prediction

2.4. Transmembrane Helices Prediction

2.5. Physico-Chemical Properties Computation

2.6. Virulence Factors Prediction

2.7. Functional Annotation and Domain Prediction

2.8. Signal Peptide Prediction

2.9. Prediction of Antigenicity, Allergenicity, and Toxicity

2.9.1. Antigenicity Prediction

2.9.2. Allergenicity Prediction

2.9.3. Toxicity Prediction

2.10. Homology Modeling of Two Candidate Proteins

2.11. B-Cell and T-Cell Epitope Prediction

2.11.1. B-Cell Epitope Prediction

2.11.2. T-Cell Epitope Prediction

2.12. Molecular Docking

2.13. Molecular Dynamics (MD) Simulations

2.14. Principal Component Analysis (PCA)

2.15. Binding Free Energy Calculations (MM-GBSA)

3. Results

3.1. Identification and Characterization of Hypothetical Proteins

3.2. Physicochemical Properties

3.3. Sub-Cellular Localization

3.4. Antigenicity and Virulence Assessment

3.5. Functional Annotation

3.6. Prioritization of Vaccine Candidates

3.7. Homology Modeling of the Potential Vaccine Candidates

3.8. Epitope Prediction

3.8.1. B-Cell Epitope Mapping

3.8.2. T-Cell Epitope Mapping

3.8.3. Epitope Conservancy Across M. genitalium Strains

3.9. Molecular Docking

3.10. Molecular Dynamics (MD) Simulations

3.11. Principal Component Analysis (PCA)

3.12. BindingFree Energy Calculations (MMGBSA)

4. Discussion

5. Conclusions

Limitations of Study

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI