Exploring the Papillomaviral Proteome to Identify Potential Candidates for a Chimeric Vaccine against Cervix Papilloma Using Immunomics and Computational Structural Vaccinology

The human papillomavirus (HPV) 58 is considered to be the second most predominant genotype in cervical cancer incidents in China. HPV type-restriction, non-targeted delivery, and the highcost of existing vaccines necessitate continuing research on the HPV vaccine. We aimed to explore the papillomaviral proteome in order to identify potential candidates for a chimeric vaccine against cervix papilloma using computational immunology and structural vaccinology approaches. Two overlapped epitope segments (23–36) and (29–42) from the N-terminal region of the HPV58 minor capsid protein L2 are selected as capable of inducing both cellular and humoral immunity. In total, 318 amino acid lengths of the vaccine construct SGD58 contain adjuvants (Flagellin and RS09), two Th epitopes, and linkers. SGD58 is a stable protein that is soluble, antigenic, and non-allergenic. Homology modeling and the structural refinement of the best models of SGD58 and TLR5 found 96.8% and 93.9% favored regions in Rampage, respectively. The docking results demonstrated a HADDOCK score of −62.5 ± 7.6, the binding energy (−30 kcal/mol) and 44 interacting amino acid residues between SGD58-TLR5 complex. The docked complex are stable in 100 ns of simulation. The coding sequences of SGD58 also show elevated gene expression in Escherichia coli with 1.0 codon adaptation index and 59.92% glycine-cysteine content. We conclude that SGD58 may prompt the creation a vaccine against cervix papilloma.


Introduction
Currently, viral infection contributes to about 20% of the global burden of human cancer. Among other virus types, the human papillomavirus (HPV) is reported in about 5% of all human cancers,

Protein Sequences
The L2 protein of HPV58 (Accession No.: P26538), the Flagellin protein of Salmonella enterica serovar Dublin (Accession No.: Q06971), and human TLR5 (Accession No.: O60602) sequences were obtained from the Swiss-Prot reviewed universal protein knowledgebase (UniProt) database [35]. The designed chimeric vaccine was named SGD58, using the name of the first and principal authors along with the strain number.

MHC-I Binding Epitope Segments Prediction
Two servers, IEDB and NetMHCv4.0, have been exploited for the identification of major histocompatibility complex class I (MHC-I) binding epitopes from the N-terminal region of the L2 sequence. Specific human MHC-I alleles such as the human leukocyte antigen (HLA)-A* (01:01, 02:01, 02:07, 11:01, 24:01), HLA-B* (46:01, 58:01) and HLA-C* (07:02, 12:03) were abundantly diagnosed in different regions of China, including Guizhou, Henan, Taihu River Basin, Tibetan, Yunnan, Wenzhou, and Wuhan. These alleles were selected for epitope prediction [36][37][38][39][40][41][42]. IEDB [43] is a freely available analysis resource with specified algorithms for the identification and determination of immunogenic epitopes. A consensus method was implemented to predict the MHC-I binding epitopes and its production pathway [44]. In this consensus method, three algorithms namely, the neural network (artificial), matrix method (stabilized), and peptide libraries (combinatorial) were combined to predict the promising CTL epitope segments. The epitopes involve proteasomal cleavage (pCle), a transporter associated with antigen processing (TAP), and the MHC-I binding pathway. The lowest percentile rank (<10%) indicated the good binding efficiency of epitopes with the restricted alleles. NetMHCV4.0 [45] is another potential tool implemented to find MHC-I binding peptides with the best Pearson's correlation coefficient (PCC) of 0.895, based on the combined neural network. The strong and weak binding peptides were predicted based on the thresholds of <0.5 and <2, respectively.

CTL Epitope and TCR -Peptide/Peptide -MHC Interfaces Prediction
The CTLPred tool is a direct method for the prediction of CTL epitope segments instead of MHC binders. The prominent combined approaches were implemented to find the epitopes, based on both the artificial neural networks (ANN) trained by Stuttgart neural network simulator (SNNS) and support vector machine (SVM) methods. The combined methods demonstrate a higher level of accuracy (75.8 %) compared with other individual methods of prediction such as ANN (72.2%) and SVM (75.4%). The default cutoff scores of 0.51 of ANN and 0.36 of SVM were used to find the epitopes or non-epitopes at which the sensitivity and specificity of the predictions are almost similar [46]. A web server PAComplex provides access to examine and visualize the TCR-peptide and peptide-MHC interface (pMHC), respectively. For a given viral protein query sequence, the joint Z-value is obtained with a threshold 2.5. Moreover, it allows the selection of only limited allotypes of MHC class I such as HLA-A*(02:01), HLA-B*(08:01, 35:01, 35:08, 44:05), and HLA-E, respectively. The Z-value was calculated using the following formula: where Z MHC and Z TCR are the score of a TCR-pMHC complex, calculated by (E-µ)/σ. E denotes the interaction score, µ denotes the mean, and σ denotes the standard deviation from 10,000 random interfaces [47].

Linear B-Cell Epitope Prediction
ABCPred is used to predict linear B-cell epitopes. It provides 65.93% of accuracy with the involvement of the recurrent neural network (RNN) algorithm. It consists of 700 B-cell and non-B-cell epitope segment datasets each with a length of 20 amino acids [51].

Allergenicity Prediction
AllerTOP is the first proper alignment-free allergenicity server. In this, five machine learning methods such as partial least squares, logistic regression, decision tree, naive Bayes, or k nearest neighbors (kNN = 1) were implemented to find the allergen. It shows 88.7%, 90.7%, and 86.7% accuracy, specificity, and sensitivity, respectively [52]. AllergenFPv 1.0 is another essential tool for allergenicity prediction based on novel descriptor fingerprint approaches. Twenty naturally existing amino acids in the protein sequences were classified into five descriptors (E) such as E1 (hydrophobicity), E2 (size), E3 (helix-forming propensity), E4 (relative abundance of amino acids), and E5 (β-strand forming propensity). Based on this, the strings were transformed into normal vectors by auto cross covariance (ACC) transformation to find the allergen protein. It exhibits accuracy (87%), specificity (89%), and sensitivity (86%) [53].

Antigenicity
ANTIGENpro is the potential alignment-free and sequence-based antigenicity prediction server with 79% accuracy and an area under curve (AUC) of 0.89. It shows results based on amino acid composition and the random-forest algorithm. The datasets were trained using 5-fold cross-validation. It consists of both protective antigen (193) and non-antigen (193) sequences. It predicts whether the given query epitope segments are antigenic or non-antigenic with their respective probability [54].

Cross-Reactivity Analysis with Human Proteomes
The presence or absence of similarity in predicted epitope segments with the human proteome was analyzed using the peptide-matching program of PIR [55].

Assessment of the Physicochemical Properties of SGD58
The complete chimeric vaccine was designed by joining the optimized epitope segments (02), TLR adjuvants (02), and Th epitopes (02) with suitable amino acid linkers. Moreover, it is required to find the solubility of the designed chimeric vaccine on overexpression in E. coli. SOLpro is a useful tool to find the solubility of protein based on the two-stage SVM algorithm. It achieves an overall accuracy of 74%, which develops on standard evaluation metrics with 10-fold cross-validation. It predicts the query protein to be soluble or insoluble at P ≥ 0.5 [58]. A range of physiochemical characteristics of the designed chimeric vaccine was also determined through ProtParam [59].

Determination of Antigenicity
VaxiJen is the primary server used for the prediction of antigenicity of the input sequence against different targets such as virus, bacteria, fungi, parasites, and tumors. Antigenicity was calculated based on the physicochemical properties of the protein sequences. Every target organism dataset contained 100 antigens and non-antigens. Moreover, the model organisms were validated using leave-one-out cross-validation (LOO-CV); it provides 89% accuracy and an AUC of 0.964 respectively, at the threshold of 0.4 [60]. For homology modeling, I-TASSER and Robetta were the servers used to design the 3D structure of SGD58 and TLR5. I-TASSER is a potential server that depends on the secondary-structure-mediated program of "Profile-Profile threading alignment (PPA) and iterative implementation of the TASSER." It has predicted a number of protein structures on request basis from 35 countries worldwide. For the query inputs, the user obtains the confidence score, TM score (topology similarity assessments of the two various protein structures), root-mean-square deviation (RMSD), and cluster density values. Nevertheless, the higher C-score (ranging from −5 to +2) determines the best model with a higher confidence level. Moreover, the 3D structure of the modeled protein was visualized using UCSF Chimera [61]. The Robetta beta server was used to predict the full chain protein structure. This server (http://robetta.bakerlab.org) gives automated tools for the analysis and prediction of the protein structure. Robetta provides both the ab initio and comparative models of protein domains. The comparative models are built from structures detected and aligned by HHSEARCH, SPARKS, and Raptor. The loop regions are assembled from fragments and optimized to fit the aligned template structures. The de novo models are built using the Robetta de novo protocol. For structure prediction, the submitted query sequences were analyzed minutely into putative domains. For domain prediction, a hierarchical screening method called "Ginzu" was used [62]. Besides, due to the unavailability of the crystal structure of TLR5, we have chosen TLR5 (PDB ID: 3J0A) as a template model to perform the homology modeling.

3D Modeled Structure Refinement
The high C-score model of the designed vaccine from the I-TASSER and model 3 from Robetta beta was further refined using the GalaxyRefine and 3DRefine tools. The GalaxyRefine is a tool that is accessible in the GalaxyWeb server: it is useful to refine the structure of a protein from the given query sequences based on template-based modeling, and undergoes loop and terminus portion refinement through the ab initio modeling method. The ninth critical assessment of techniques for protein structure prediction (CASP9) optimizes refinement and produces consistent core structures [63]. Another tool is 3Drefine, which prompts iteration analysis for~300 amino acid residues efficiently in less than 5 min. It performs post-refinement model analysis with both or single MolProbity and random walk (RW) plus methods. The results are visualized using JSmol [64]. The top five models of each tool were used for further validation.

3D Refined Structures Validation
The refined 3D models from the above steps were validated using three interactive services namely, ProSA, Ramachandran plot analysis, and ERRAT. ProSA-web is a potential tool for the refinement, validation, prediction, and modeling of protein structures. It indicates the difference in the protein structures through the respective score and energy spot. It also facilitates the validation process of the protein structure that is acquired from X-ray scanning, nuclear magnetic resonance (NMR) spectroscopy analysis, and theoretical calculations. As an output, the Z-score corresponds to the overall feature of the validated model [65]. RAMPAGE tool is used to validate the percentage (%) of favored, allowed, and outlier region in the given query chimeric vaccine [66]. The statistics of noncovalent interactions between carbon, nitrogen, and oxygen atoms in the input sequence with best-resolution crystallographic structures were compared using the ERRAT tool. It implements an empirical atom-based approach for verification of the protein structure and is more sensitive to errors (1.5A) [67]. Similar steps were followed to validate the TLR5 model.

Conformational B-Cell Epitopes Prediction
DiscoTope 2.0 is a potential tool used to analyze the conformational (discontinuous) B-cell epitopes from the input sequence. It showed a highly significant prediction performance with an AUC of 0.824. The default −3.7 threshold limit provides significant specificity (0.75) and sensitivity (0.47). It was selected for the present analysis, and the final score was evaluated by the combined calculation of the propensity score (PS) and contact numbers [68]. The PP interactions are the midpoint for all the biochemical pathways that are involved in the biological functions. HADDOCK v2.2 is a server used for the docking of PP complexes [69]. The scoring function was executed based on the weighted sum of the various energy terms (Van der Waals energy, electrostatic energy, desolvation energy, restraints energy and buried surface area). In addition, intermolecular contacts such as hydrogen bonds (HB) and those that are non-bonded were determined by using the program for automatically plotting protein-protein interactions (LIGPLOT-DIMPLOT) v 4.5.3 [70]. The validated best model of the vaccine construct (Robetta model 3) and TLR5 (Robetta model 5) was chosen as a ligand and receptor, respectively.

MD Simulation
MD simulation determines the strength of the docked complex and the designed vaccine (SGD58). The GROMACS 5.1.2 package, with the CHARMM force field was used to perform MD simulation. The transferable intermolecular potential with three points (TIP3P) and simple point charge (SPC) in the cubic cell of the water model was resolved with protein, and with the addition of appropriate counter ions to satisfy electroneutrality. For the MD simulation, the system was implemented with volume-based canonical (NVT) and pressure-based isothermal-isobaric (NPT) ensembles [71]. The energy minimization of the system was performed with the steepest descent method, which facilitates 50,000 minimization steps and 1000 kJ mol −1 of tolerance. The Ewald method with a cutoff for short-range neighbor distance (1.0 nm), and Coulomb (1.0 nm) was used to calculate van der Waals (vdW) and electrostatic interactions [72]. SPC resolved the system and the final minimizations were calculated for a realistic structure concerning the geometry: and solvent orientations were used in the production of the MD simulation. SETTLE and LINCS algorithms were used to assist the geometry of the water molecules and bond angles [72,73]. The pressure of the system (300 K, 1bar) was embraced using the Parrinello-Rahman method and the temperature was regulated using the V-rescale method. Temperature and pressure equilibrated systems were employed for production run (100 ns) and time step (2 fs). The resulting structural coordinates were saved at every 2ps of an interval.

Analysis of Virtual Gene Expression and Cloning
EMBOSS Backtranseq v1.0 [74] is a suitable tool to uptake the query protein sequences, reverse translate, and return the optimizing coding sequences. Furthermore, the properties of the obtained coding sequences were analyzed to obtain the increased level of gene expression in the respective host. It is well known that the codon plays a crucial role in the expression of the recombinant proteins in various organisms (e.g., E. coli, Homo sapiens, Saccharomyces cerevisiae, Mus musculus, and Rattus norvegicus). The GenScript rare codon analysis [75] is a prominent tool for codon usage and its distribution analysis (codon adaptation index-CAI, glycine-cystine content-GC, and codon frequency distribution-CFD) in the individual expression host organism is based on the optimum Gene TM algorithm. The endonuclease NdeI (N-terminal) and BamHI (C-terminal) restriction enzyme sites were added to the respective cloning sequences in the host (E. coli).

Analysis of Selected Sequences
The retrieved L2 of HPV58, the Flagellin protein of Salmonella enterica serovar Dublin, and humanTLR5 contain 472, 505, and 858 amino acids residues, respectively. However, for the potential epitope prediction, an N-terminal region of 12-114 AA residues (highly conserved and virus surface exposed region) were selected from L2 protein sequences. In the case of the Flagellin protein, the N terminal (5-143 amino acids) and C-terminal (419-504 amino acids) regions were selected for the vaccine design.

CTL Epitopes and TCR-Peptide/Peptide-MHC Interfaces
The overlapped epitope segments obtained from the above MHC-I prediction were compared with the results of both CTLPred [46] and PAComplex [47] servers. Furthermore, the shared epitope segments obtained from the CTLPred and PAComplex were used for epitope selection, and vaccine design as shown in Table S1.

Continuous B-Cell Epitopes
ABCPred predicted the potential antigenic linear B-cell epitope segments. The overlapped B-cell epitopes and their respective position are given in Table S2.

Selection of the Overlapped Epitope Segments
According to the above results (MHC-1, CTL, MHC-11, INF-γ), only four potential antigenic epitope segments were selected from the HPV58 minor capsid protein. The epitope segments, namely,23-36, 30-43, 10-23, and 29-42 from the N-terminal region of the HPV58 L2 protein have been chosen for further studies (Table 1). The results of the analysis showed four overlapped epitope segments, among these the two epitope segments (23-36 and 29-42) indicated in bold were selected for the vaccine construction. The IFN-γ production ability of the overlapped epitope segments is presented in positive values.
Viruses 2019, 11, 63 10 of 25 3.2.6. Antigenicity, Allergenicity, and Cross-Reactivity of Selected Epitope Segments The start and end positions, epitope segments, pro-inflammatory cytokines (INF-γ) productivity, allergenicity, antigenicity, and cross-reactivity with human proteomes of the selected overlapped HPV58 are given in Table 1. Among the four, only AllergenFP and AllerTOP declared segments (23-36 and 29-42) as non-allergen, respectively. AntigenPro shows the selected epitope segments having antigenic potential. In addition, the cross-reactivity result of the epitope segments with the human proteomes has a zero similarity level. It confirmed that there was no distinctive match of overlapped epitope segments as found in Homo sapiens. It indicated that these epitope segments would not cause or induce any autoimmune disease or disorders. Based on the above overall analysis, the 23-36 and 29-42 epitope segments were selected to design the vaccine construct.

Designing of Chimeric Vaccine SGD58
The complete vaccine construct consists of (1) two selected epitope sequences (23-36 and 29-42); and (2) two different TLR adjuvants, Flagellin and RS09. Flagellin is recognized as the TLR5 agonist that involves the activation of innate immunity. The head and tail of the vaccine construct contain the N-terminal (5-143 amino acids) and C-terminal regions (419-504 amino acids) of the Flagellin. In addition, RS09, a synthetic short peptide of the TLR4 ligand, is also used; (3) two different T helper (Th) epitopes, namelyPADRE and TpD, are used in the construct. The pan HLA-DR-binding epitope (PADRE) is frequently employed for synthetic or recombinant vaccine development and another universal epitope, TpD, which has 31 amino acids, is also used; and (4) the seven different parts in the vaccine construct were associated with the linkers GPGPG, AAY, and EAAAK. Finally, the designed chimeric vaccine (SGD58) with 318 amino acid sequences was drawn using illustrator for biological sequences (IBS) v1.0 as illustrated in Figure 1. 3.2.6. Antigenicity, Allergenicity, and Cross-Reactivity of Selected Epitope Segments The start and end positions, epitope segments, pro-inflammatory cytokines (INF-γ) productivity, allergenicity, antigenicity, and cross-reactivity with human proteomes of the selected overlapped HPV58 are given in Table 1. Among the four, only AllergenFP and AllerTOP declared segments (23-36 and 29-42) as non-allergen, respectively. AntigenPro shows the selected epitope segments having antigenic potential. In addition, the cross-reactivity result of the epitope segments with the human proteomes has a zero similarity level. It confirmed that there was no distinctive match of overlapped epitope segments as found in Homo sapiens. It indicated that these epitope segments would not cause or induce any autoimmune disease or disorders. Based on the above overall analysis, the 23-36 and 29-42 epitope segments were selected to design the vaccine construct.

Epitopes Conservancy
In epitopes-based vaccine design, the conserved epitope segments would be essential in order to give wider cross-protection against various hrHPV strains. Supplementary Table S3 gives the comprehensive analysis of overlapped EP (≥30%), positions, subsequences identity, and hrHPV. The conservation of selected epitopes has cross-protection to the 15 hrHPV as shown in Supplementary Figure S1a. The overlapped epitope segments KVEGTTIADQILRY23-36 and IADQILRYGSLGVF29-42 with 15-hrHPV strains are illustrated in supplementary Figure S1b and S1c.

Designing of Chimeric Vaccine SGD58
The complete vaccine construct consists of (1) two selected epitope sequences (23-36 and 29-42); and (2) two different TLR adjuvants, Flagellin and RS09. Flagellin is recognized as the TLR5 agonist that involves the activation of innate immunity. The head and tail of the vaccine construct contain the N-terminal (5-143 amino acids) and C-terminal regions (419-504 amino acids) of the Flagellin. In addition, RS09, a synthetic short peptide of the TLR4 ligand, is also used; (3) two different T helper (Th) epitopes, namelyPADRE and TpD, are used in the construct. The pan HLA-DR-binding epitope (PADRE) is frequently employed for synthetic or recombinant vaccine development and another universal epitope, TpD, which has 31 amino acids, is also used; and (4) the seven different parts in the vaccine construct were associated with the linkers GPGPG, AAY, and EAAAK. Finally, the designed chimeric vaccine (SGD58) with 318 amino acid sequences was drawn using illustrator for biological sequences (IBS) v1.0 as illustrated in Figure 1.  All the segments were joined together by using the following linkers (GPGPG, AAY or EAAAK).

Homology Modeling
I-TASSER homology modeling indicated that model 3 of the chimeric vaccine (SGD58) has the highest C-score of −0.39. In addition, the TM and RMSD scores of 0.66 and 7.2Å of model 3 represent the standard similarity and accuracy of the modeled structures. Notably, there is no 3D structure available for the TLR5 on the protein data bank. I-TASSER was used to model the 3D conformational structure of TLR5. The C-score, TM, and RMSD scores of the best model 1 of TLR5 are −0.35, 0.82 and 6.8Å, respectively. Therefore, model 1 is suggested as the best model with a higher confidence level. In addition, modeling with Robetta, we have obtained five different models for TLR5 and the chimeric vaccine candidate. Models 1 and 4 are de novo models while models 2, 3, and 5 are ab initio models. The Ginzu domain prediction confidence score for TLR5 is 0.9375, and for SGD58 is 0.6502. All the models were selected for structural refinement analysis.

Structural Refinement
The acquired best-modeled 3D structures of SGD58 and TLR5 underwent structural refinement using servers GalaxyWEB and 3D refine. The GalaxyRefine program gives five best-refined models for the whole SGD58 and TLR5. In addition, the lowest 3Drefine score represents the good quality of the refined model, based on the 3D refine force field. All the refined models (1-5) of both GalaxyWEB and 3D refine were used for further structural validation.

Structural Validation
The refined 3D structure obtained in the above section underwent quality improvement using three potential tools: ProSA-web, RAMPAGE, and the ERRAT. The Z-score (ProSA), overall quality factor (ERRAT), and the favored, allowed, and outlier region (RAMPAGE) of the validated 3D structure of SGD58 are given in Table S4 and TLR5 in Table S5. Figure 2 illustrates the Z-score, overall quality factor and the favored, allowed, and outlier region of the selected best SGD58 model. From overall comparison of the obtained results, the Robetta model 3 of SGD58 ( Figure S2a) and the Robetta model 5 of TLR5 ( Figure S2b) using UCSF Chimera were selected for further analysis.

Structural Refinement
The acquired best-modeled 3D structures of SGD58 and TLR5 underwent structural refinement using servers GalaxyWEB and 3D refine. The GalaxyRefine program gives five best-refined models for the whole SGD58 and TLR5. In addition, the lowest 3Drefine score represents the good quality of the refined model, based on the 3D refine force field. All the refined models (1-5) of both GalaxyWEB and 3D refine were used for further structural validation.

Structural Validation
The refined 3D structure obtained in the above section underwent quality improvement using three potential tools: ProSA-web, RAMPAGE, and the ERRAT. The Z-score (ProSA), overall quality factor (ERRAT), and the favored, allowed, and outlier region (RAMPAGE) of the validated 3D structure of SGD58 are given in Table S4 and TLR5 in Table S5. Figure 2 illustrates the Z-score, overall quality factor and the favored, allowed, and outlier region of the selected best SGD58 model. From overall comparison of the obtained results, the Robetta model 3 of SGD58 ( Figure S2a) and the Robetta model 5 of TLR5 ( Figure S2b) using UCSF Chimera were selected for further analysis. The ProSA Z-score of the refined model is -6.65, which is shownby the dark black color spot. The values are presented in the range of native protein conformation. The dark blue and light blue color region represents the Nuclear magnetic resonance and X-ray spectroscopy determination of the experimental protein chains in the protein database (PDB). The X-and Y-axis represent the number of amino acid residues and Z-scores respectively; (B) In the Ramachandran plot of the refined model, we illustrated the favored in green circle (96.8%), the allowed in triangle (2.8%) and the outlier in yellow shaded circle (0.3%) regions. (C) The ERRAT Plot shows that the overall high quality factor of the refined SGD58 is 99.0033. * On the error axis, two lines are drawn to indicate the confidence with which it is possible to reject regions that exceed that error value. ** Expressed as the percentage of the protein for which the The ProSA Z-score of the refined model is -6.65, which is shownby the dark black color spot. The values are presented in the range of native protein conformation. The dark blue and light blue color region represents the Nuclear magnetic resonance and X-ray spectroscopy determination of the experimental protein chains in the protein database (PDB). The X-and Y-axis represent the number of amino acid residues and Z-scores respectively; (B) In the Ramachandran plot of the refined model, we illustrated the favored in green circle (96.8%), the allowed in triangle (2.8%) and the outlier in yellow shaded circle (0.3%) regions. (C) The ERRAT Plot shows that the overall high quality factor of the refined SGD58 is 99.0033. * On the error axis, two lines are drawn to indicate the confidence with which it is possible to reject regions that exceed that error value. ** Expressed as the percentage of the protein for which the calculated error value falls below the 95% rejection limit. Good high resolution structures generally produce values around 95% or higher. For lower resolutions (2.5 to 3A), the average overall quality factor is around 91%.

Conformational B-Cell Epitopes
The results of the Discotope 2.0 analysis demonstrated that 18 potential B-cell epitopes were obtained from 318 total residues. Table S6 explains the respective amino acid, residue with contact number, propensity, and Discotope score of the predicted B-cell epitopes.

Investigation of the Interaction between SGD58 and TLR5
3.6.1. PP Docking Interaction From the above findings, the best-refined model of SGD58 (Robetta model 3) and TLR5 (Robetta model 5) was used to perform molecular docking using the HADDOCK server. The binding cavity of TLR5 and Flagellin was obtained from a previous report [76,77]. The input TLR5 receptor contains 858 amino acids and SGD58 contains 2923 amino acids. The human TLR5 sequence contains 21 different leucine-rich repeats (LRR) segments with 443 amino acids. Flagellin contains two D1/D0 TLR-binding domains in the N and C terminals of the sequence.
The HADDOCK method directly permits the integration of biophysical information about the protein-protein complex in order to constrain docking. In this study, we docked the target receptor (TLR5) chimeric vaccine candidate (SGD58) to observe the interaction between the complexes. The HADDOCK method clustered 116 structures into 12 clusters, which represents 58.0% of the water-refined models that HADDOCK generated. The TLR5-SGD58 complex shows the highest HADDOCK score of −62.5 ± 7.6, representing the good affinity level between the target and vaccine. The buried surface area (BSA) of cluster 4 of the TLR5-SGD58 complex is 1914.4 ± 124.4, which indicates close proximity and a less water-exposed protein surface. The desolvation energy (43.1 ± 7.9), restraints violation energy (1192.9 ± 96.93), and BSA have a high-quality association with the docking score of the complex. The HADDOCK score, interaction energy, Van der Waals energy, electrostatic energy, desolvation energy, restraints violation energy, and BSA of the top tenclusters are given in Table S7.  Asp 118, and His 143 act as interacting residues present in the best four clusters of the TLR5-SGD58 complex( Figure 3). Thus, the TLR5-SGD58 complex docking analysis and the intermolecular hydrogen bonding patterns confirm that the interaction of the chimeric vaccine candidate with the target TLR5 can induce both cellular and humoral immunity and inhibit HPV progression.

MD Simulation
MD simulation demonstrated the stability of the TLR5_SGD58 docked complex in the active site of TLR5. RMSD is the known parameter by which to determine the obtained structure from the MD trajectory. This parameter was evaluated as a preliminary analysis of the backbone atoms of TLR5. The structure and dynamic proprieties of TLR5 were determined using the backbone RMSDs during the simulation period ( Figure 4). The RMSD of TLR5 was gradually increased until 20 ns and a nearly 8-100 ns period with 3-4 nm of deviation, and after 30-80 ns, it was stable. The RMSD curve of SGD58 indicates an insignificant variation from 0-20 ns at 5.0 nm, and after 20 ns, it was stable. SGD58 has more a stable RMSD value compared to TLR5 ( Figure 4A). The flexibility of each residue in the TLR5-SGD58 complex is calculated based on root mean square fluctuations (RMSF). Figure 4B shows that TLR5 has an insignificant variation of residues, which indicates that this molecule was stable during the MD simulation time of 100 ns. These residues have well-known interactions with the vaccine candidate. The ND1b domain (100-200 residues) of the Flagellin fragments of SGD58 shows a low flexibility, which can be attributed to their interaction with the TLR5 protein ( Figure 4C). In addition, CD1 and CD0 domains have more fluctuations (200-250 residues). Figure 4D illustrates hydrogen bond interactions throughout the simulation period, to understand the binding efficiency of TLR5 with SGD58. The average number of hydrogen bond interactions was observed in 2.0 nm. Figure S3 illustrates that the potential energy (PE), temperature, total energy (TE), and pressure of SGD58 was stable during the simulation period. The average TE of SGD58 is −7207307343.324 with a standard deviation of 4279.598082. In addition, the average PE of SGD58 is −8985342.697 with a standard deviation of 3370.264894. PE and TE attained equilibrium at a temperature of 300K. The result of the radius of gyration (Rg) analysis is shown in Figure S4. The simultaneous changes in the Rg plots of the complex with TLR5 ( Figure S4a) and SGD58 ( Figure S4b) indicate that the substantial nature of the complex frequently increases. The Rg plots compression of SGD58 with TLR5 is similar to the RMSD parameter, which indicates the effort of SGD58 to reach internal configuration in TLR5.

Virtual Gene Expression and Cloning
For the given SGD58, the optimized reverse-translated coding sequences were obtained using the EMBOSS Backtranseq tool. The maximal protein expression of this optimized coding sequence in the host (E. coli) was analyzed by the GenScript's OptimumGene TM codon optimization tool. Figure  S5 illustrates the CAI, GC, and CFD of the gene transcript. The gene (reverse translated coding sequence of the vaccine construct) having an ideal CAI value of 1.00 (>0.8) is more suitable for the above expression (E. coli) in the host organism. Moreover, 59.92% of ideal GC content is presented in the gene (between 30% and 70%). However, values outside of these peak ranges would severely inhibit the transcriptional and translational efficiency of the gene products. The CFD value of the gene is 100%, representing their highest codon frequency distribution in the preferred expression organism.

Discussion
Immunomics is an integrative area of computer science and experimental immunology and plays a vital role in vaccine development. Immunomics tools and databases are used to forecast the target epitope segments to enhance CTL or B-cell immunity in a cost-effective manner and less experimental time [78][79][80][81][82]. The computational vaccine design involves the engineering of potential non-pathogenic epitope segments with adjuvants to enhance the function of the human immune system against dreadful diseases, including cervical cancer. Unlike conventional vaccines, peptide or epitope vaccines have several advantages; they are synthetic (pathogen-free), have less unwanted side effects, minimize accidental allergenic reactions, and design and predict peptides with self or non-self antigen to elicit and balance the immune responses [83,84]. HPV58 is considered as the most predominant genotype causing cervical cancer incidences in China. HPV has type-restriction, non-targeted delivery, and a high-cost of existing vaccines, which makescontinuing research on HPV vaccine development necessary. Therefore, this study aimed at the design of a chimeric vaccine via targeting HPV through immunomics, PP docking, and MD simulation approaches.
Earlier reports suggest that the L2 protein is majorly buried or hidden under the surface of native and matured virions [32,85,86]. The initial interactions between L1/L2 are hydrophobic with coverage of small stretches of amino acid sequences. It exhibits potential effects during in vitro assembly [87]. However, the structural relation of L2 minor capsid protein to L1 in the virion particles is not clearly known. In another study, Henio et al. [88] reported that the L1/L2 proteins of

Virtual Gene Expression and Cloning
For the given SGD58, the optimized reverse-translated coding sequences were obtained using the EMBOSS Backtranseq tool. The maximal protein expression of this optimized coding sequence in the host (E. coli) was analyzed by the GenScript's OptimumGene TM codon optimization tool. Figure  S5 illustrates the CAI, GC, and CFD of the gene transcript. The gene (reverse translated coding sequence of the vaccine construct) having an ideal CAI value of 1.00 (>0.8) is more suitable for the above expression (E. coli) in the host organism. Moreover, 59.92% of ideal GC content is presented in the gene (between 30% and 70%). However, values outside of these peak ranges would severely inhibit the transcriptional and translational efficiency of the gene products. The CFD value of the gene is 100%, representing their highest codon frequency distribution in the preferred expression organism.

Discussion
Immunomics is an integrative area of computer science and experimental immunology and plays a vital role in vaccine development. Immunomics tools and databases are used to forecast the target epitope segments to enhance CTL or B-cell immunity in a cost-effective manner and less experimental time [78][79][80][81][82]. The computational vaccine design involves the engineering of potential non-pathogenic epitope segments with adjuvants to enhance the function of the human immune system against dreadful diseases, including cervical cancer. Unlike conventional vaccines, peptide or epitope vaccines have several advantages; they are synthetic (pathogen-free), have less unwanted side effects, minimize accidental allergenic reactions, and design and predict peptides with self or non-self antigen to elicit and balance the immune responses [83,84]. HPV58 is considered as the most predominant genotype causing cervical cancer incidences in China. HPV has type-restriction, non-targeted delivery, and a high-cost of existing vaccines, which makescontinuing research on HPV vaccine development necessary. Therefore, this study aimed at the design of a chimeric vaccine via targeting HPV through immunomics, PP docking, and MD simulation approaches.
Earlier reports suggest that the L2 protein is majorly buried or hidden under the surface of native and matured virions [32,85,86]. The initial interactions between L1/L2 are hydrophobic with coverage of small stretches of amino acid sequences. It exhibits potential effects during in vitro assembly [87]. However, the structural relation of L2 minor capsid protein to L1 in the virion particles is not clearly known. In another study, Henio et al. [88] reported that the L1/L2 proteins of HPV have various antigenic epitope segments such as 32-81, 212-231, 272-291, and 347-381 amino acids, and these could be accessible on the surface of L1/L2 virus-like particles. In particular, the N-terminal region of the L2 protein is highly conserved and has diverse functions: it mainly participates in the attachment of the virus particle and its genome assembly in the host system. The N-terminal region 1-12 amino acids are in the DNA binding domain, the 9-12 amino acids are in the furin-cleavage site, and the 22-45 amino acids are in the cyclophilin-B and β-actin-binding domain [89][90][91]. The L2 protein can act as the prospective target to design the next-generation prophylactic vaccines for HPV [92]. This strategy is prominently supported by the early evidence, such as the production of cross-neutralizing antibodies RG1 [93], K4L2, and K18L2 [94] against the target sites (17-36 AA and 20-38 AA) of L2 in various experimental models. The weak immunogenicity of L2 is a significant obstacles in epitope-based vaccine development; to date, no L2-VLPs-based prophylactic vaccine has been approved for clinical application [30]. Therefore, in this study, we selected the L2 protein to predict the potential epitope candidate and enhance the immunogenicity using adjuvants to design a chimeric vaccine.
Adjuvants have different roles including up-regulation of the immune response, increased action of neutralizing antibodies, processing of cytosolic MHC class-I restricted peptides and target presentation to a specific receptor, acting as an immunogen, and application in the preparation of a single dose [95]. To increase the immunogenicity of L2-derived peptides, the selection of a suitable adjuvant plays a vital role in the vaccine design. Instead of mixing the appropriate adjuvant directly, designing the peptide vaccine candidate using suitable linkers and adjuvants could be highly effective. Alhydrogel ® adjuvant 2%, known as "alum," is frequently used as an adjuvant in diphtheria-tetanus-pertussis (DTP), HPV, and hepatitis vaccination [96]. Although the alum adjuvants induced the Th2-mediated immune response, they are ineffective to the pathogen, which is indeed of the Th1 immunity. Later, the emulsion-based incomplete Freund's adjuvant (IFA) was developed, which induces potential Th2, and little Th1-mediated immune responses [97]. However, the application of the emulsion-derived adjuvants is not supported well in the vaccination program due to the induction of autoimmune disorders and an unclear mode of action [98]. To overcome these issues TLR-based ligands were developed and achieved success in the generation of both Th1 and Th2 immune responses in the experimental models. Alphs et al. [99] achieved potential immunogenicity of the synthetic lipo-peptide (HPV16 L2) vaccine through fusion with the Th epitope and TLR ligand. Bacterial Flagellin is a potential TLR5 ligand, which can induce the production of both Th1 and Th2 immune responses.
It is frequently used as an adjuvant in the recombinant vaccine production when fusing with antigenic particles [100,101]. Flagellin, a TLR5 ligand, binds to the particular domain (Toll/interleukin-1 receptor) of the TLR5 receptor in humans. Notably, another newly developed and licensed adjuvant used for human vaccine development is the adjuvant system 04 (AS04). Moreover, AS04 was developed by a combination of 3-O-desacyl-4 -monophosphoryl lipid A (MPL), which is a prominent TLR4 agonist and aluminum salt. In the presence of Cervarix, the AS04 adjuvant induced the function of NF-kappa and cytokine synthesis in cancer cells and animal model. It leads to the appearance of increased antigen-loaded dendritic cells and monocytes followed by CD4 + and B-cells in the injection site [102]. Moreover, as a result of two-dose schedule trial in young girls (9-14 years), the HPV16/18 plus AS04 adjuvant vaccine (Cervarix) was highly immunogenic and has been approved clinically for the prevention of HPV infection, precancerous CIN (I/II/II), and cervical cancer [103]. In 18-25-year-old Chinese women, AS04 adjuvant vaccines were reported as having immunogenicity and an acceptable harmless profile from the randomized-controlled trial [104]. RS09 (short synthetic peptide segments "APPHALS"), is a ligand to TLR4. RS09 does not contain any toxicological effects, and is devoid of skin irritation, serious eye damage, and carcinogenic properties, etc. It successfully enhances the nuclear factor of kappa-light-chain-enhancer of activated B cell (NF-κB) translocation pathways and enhances the pro-inflammatory cytokine and antibody serum concentration in macrophage cells and animal models [105]. TLR adjuvants play an advanced role in commercial vaccines [106]. Two universal Th epitopes (PADRE and TpD) were added to the chimeric vaccine to resolve the deficiency of Th responses. The pan HLA DR-binding epitope is known as "PADRE". It has a binding affinity with more than 15 MHC class-II allotypes and induces proliferative CD4 + in peripheral blood mononuclear cells (PBMC) from humans [107]. In this manner, it explains the issue raised by HLA polymorphism in the human population [108]. It is extensively studied for synthetic peptide-based vaccine development in C57BL/6 cervical cancer models and Phase I/II clinical trials [109][110][111]. TpD is another universal memory T-cell helper peptide. The immunization of TpD produces a promising antibody and enhances long-term CD4 + immune response in mice, Rhesus macaques, cynomolgus monkeys, and PBMC in humans [112]. Therefore, we have chosen two different Th epitopes (PADRE and TpD) and two TLR adjuvants (Flagellin and RS04) to enhance the immunogenicity in the chimeric vaccine.
A small flexible linker sequence was employed to join various segments of epitopes, the TLR agonist, and the Th epitopes in the vaccine construct. In this study, GGS linker was used to join the various segments in the vaccine design. The GGS linker facilitates the natural rotation or movement of the epitope segments and adjuvants and ameliorates their free identification by the surface receptor molecules [113,114]. GGS linkers contain nonpolar glycine (Gly) and polar serine (Ser) amino acid residues, which prohibit unnecessary complex formations between the linked partners and retain the function of the chimeric vaccine. A GGS spacer was presented in both natural and artificial linkers, to either increase the stability of the binding domain partners or stabilize the PP complex [115].
The targeted delivery of a vaccine can improve the efficiency and achieve a better outcome. In this study, weselected TLR5 as the target for the chimeric vaccine. Innate immunity-inducing TLR5 are mainly expressed on antigen presenting monocytes and dendritic cells while they encounter the entry of pathogenic microbes [116]. Moreover, TLRs are a well-categorized pattern recognition receptor (PPR) family, which involves the sensing of invading virulent pathogens entry into the host [117]. Horseshoe-shaped LRR are present in each TLRs conserved fold for binding to their respective legends [118]. Numerous studies report that after the binding of TLR5 to the specific ligand, it induces the myeloid differentiation gene 88 (MYD88), which triggers activation of the tumor downstream signaling pathways including NF-κB, mitogen-associated protein kinase (MAPK), and interferon regulatory factors (IRFs) [119]. Once the TLRs recognize their ligand, they become active and induces the production of pro-inflammatory cytokines such as tumor necrosis factor (TNF), interleukins (IL), and INFs [117,118]. In this manner, the host cell increases the capacity to eliminate the invading pathogens. Kim et al. [120] reported that TLR5 is a potential biomarker for the malignant transformation of cervical squamous cells. Therefore, in this study, we selected TLR5 as a target for the chimeric vaccine and its efficiency was determined using PP docking and MD simulation.
TLR5 is an excellent receptor for Flagellin, which is the major component of the bacterial Flagella [116]. Flagellin adjuvant has been extensively used in experimental HPV vaccination. Interestingly, when the host cell responds to Flagellin, TLR5 induces B-cell differentiation into the plasma producing B-cells. Earlier reports demonstrate the significance of Flagellin fused L2-multimer vaccines in experimental rabbits and mice [121,122]. Flagellin is madeup of four important domains: D0, D1, D2, and D3. The D0 and D1 domains are composed of highly conserved N-terminal (1 to 200 amino acids) and C-terminal (405-494 amino acids) regions, which is important for TLR5 agonist action. In addition, the centered hypervariable D2 and D3 regions show the vast differences, by their size and composition, among the various bacterial microorganisms [123][124][125]. Owing to the higher antigenicity and toxicity caused by the central D2/D3 domain, this antigenic part is removed or replaced by the optimized epitope segments or different adjuvants in the vaccine design. D2/D3 antigen replacement in Flagellin enhances mucosa-immunoglobulin productions in the experimental animal models through intranasal immunization [126]. Therefore, we selected the N and C-terminal regions of Flagellin in the design of the chimeric vaccine. Forstneric et al. [125] reported the appropriate identification of Flagellin by the homology modeled hTLR5 and mTLR5. The crystallographic complex structure of zebra fish TLR5 with the domains (D1, D2 or D3) of Salmonella sp. were also studied [77]. However, there was a lack of availability of the crystal structure of TLR5 until now. Therefore, in this study, homology modeling, structural refinement, and validation were performed and found using the Robetta model 5 of TLR5 obtained using the Robetta, 3DGalaxyRefine, ProSA-web, RAMPAGE, and ERRAT. Moreover, TLR5 is greatly presented in vertebrates [77,125,127]. It facilitates the PP interaction analysis of TLR5 with Flagellin using HADDOCK, as shown in Figure 4b. It shows the interacting residues of this complex observed between the LRR region of TLR5 and the D0/D1 domain of Flagellin. This finding is significantly supported by earlier reports [125,127] concerning TLR5 recognition of the D0 of Flagellin by the inflammasome receptor, for preventing the immune escape of invading pathogenic strains. The MD simulation results depict a constant and stable interaction between SGD58 and TLR5. During the MD simulation study, TLR5 was stable after 15 ns, whereas SGD58 exhibited insignificant variations and was then stable after 10 ns. The structural changes were observed to have gained the optimal sustainability of SGD58 and TLR5. In addition, very slight changes were noticed in the D0/D1 domain regions of SGD58. Finally, the results of the codon optimization and virtual cloning confirms the translated chimeric vaccine sequence in E. coli to be capable of regulating the higher level of gene expression. The successful expression of the designed virus-like particles of HPV in E. coli is reported in earlier studies [128,129], which can enhance the production of the vaccine at a cheaper cost.
From this report, the new chimeric vaccine candidate was engineered using various immunomics tools, PP docking, and MD simulation, which can reduce the experimental cost and time. The designed chimeric vaccine SGD58, has appropriate structurally refined, physiochemical, and immunological properties that can produce humoral and cellular immune responses against HPV. The designed chimeric vaccine has cross-production with the 15 different hrHPV strains. Further experimental investigation is planned to determine the efficiency of the chimeric vaccine, especially allele-specific for the Chinese population.
Supplementary Materials: The following are available online at http://www.mdpi.com/1999-4915/11/1/63/s1. Figure S1: Conservation of two overlapped epitope segments in fifteen hrHPV strains. Figure S2: Refined 3D structure of the SGD58 and TLR5 by using UCSF Chimera. (a) The 3D structure of the SGD58 was obtained through homology modeling by using Robetta. (b) The 3D structure of the TLR5 was obtained through homology modeling by using Robetta. Figure S3: (a) The total energy, (b) potential energy, (c) temperature and (d) pressure plots of MD simulation for the TLR5-SGD58 complex in simulations of 100 ns. Figure S4: (a) Rg plot of the TLR5 and (b) The Rg plot of SGD58 complex. Figure S5: Codon optimization and in silico cloning of the gene. Table S1: The overlapped epitope segments of MHC-I, CTL and TCR from N-terminal region of HPV58 were predicted by using different servers. Table S2: The overlapped epitope segments of MHC-II, INF-gamma producing and B-cell epitopes N-terminal region of HPV58 by using different servers. Table S3: Conservation across-hrHPV strains by the overlapped HPV58 epitope segments. Table S4: Validation of 3D structures of the designed SGD58 obtained by the I-TASSER (I-T) and Robetta and its refinement by the Galaxy Refine (Gal) and 3Drefine (3DR). Table S5: Validation of 3D structures of the TLR5 obtained by the I-TASSER (I-T) and Robetta and its refinement by the GalaxyRefine (Gal) and 3Drefine (3DR). Table S6: Dis-continuous B-cell epitopes identified in the refined 3D structure of designed vaccine constructs of HPV58 by using Discotope 2.0. Table S7: Statistical analysis of the TLR5-SGD58 docking result obtained by HADDOCK.
Author Contributions: G.S., S.K., and D.-Q.W. conceived and designed the experiment. S.K and G.S. performed the immunoinformatics, vaccine design, and molecular docking studies. S.C., Q.W., and A.S.N. performed the molecular dynamics simulation. W.C.C., S.K., G.S., S.C., and D.-Q.W. wrote the main manuscript text. S.K. and A.S.N. formatted the manuscript and figures according to the instructions. G.S., W.C.C., G.K., and D.-Q.W. critically reviewed the manuscript. All the authors approved the final manuscript.