Peptide-Based Subunit Vaccine Design of T- and B-Cells Multi-Epitopes against Zika Virus Using Immunoinformatics Approaches

The Zika virus disease, also known as Zika fever is an arboviral disease that became epidemic in the Pacific Islands and had spread to 18 territories of the Americas in 2016. Zika virus disease has been linked to several health problems such as microcephaly and the Guillain–Barré syndrome, but to date, there has been no vaccine available for Zika. Problems related to the development of a vaccine include the vaccination target, which covers pregnant women and children, and the antibody dependent enhancement (ADE), which can be caused by non-neutralizing antibodies. The peptide vaccine was chosen as a focus of this study as a safer platform to develop the Zika vaccine. In this study, a collection of Zika proteomes was used to find the best candidates for T- and B-cell epitopes using the immunoinformatics approach. The most promising T-cell epitopes were mapped using the selected human leukocyte antigen (HLA) alleles, and further molecular docking and dynamics studies showed a good peptide-HLA interaction for the best major histocompatibility complex-II (MHC-II) epitope. The most promising B-cell epitopes include four linear peptides predicted to be cross-reactive with T-cells, and conformational epitopes from two proteins accessible by antibodies in their native biological assembly. It is believed that the use of immunoinformatics methods is a promising strategy against the Zika viral infection in designing an efficacious multiepitope vaccine.

Prediction for linear epitopes was done using BepiPred-2.0 [50] and LBtope [51]. BepiPred-2.0 was trained using the Random Forest algorithm to find epitope residues from a dataset of epitopes and non-epitopes in crystal structures [50], and LBtope was trained using various methods including the SVM and Nearest Neighbor on various datasets [51]. The dataset LBtope_confirm was used in the epitope prediction, all entries in this dataset had been reported by at least two studies [51]. For BepiPred-2.0, residues scoring above~0.5 were considered epitope, and for Lbtope, residues with %probability above~60 were considered epitope. The minimal length of a linear epitope was set to be six, following the typical minimum epitope length of Zika epitopes listed in IEDB (http://www.iedb.org) [52].
To predict conformational epitopes, three-dimensional Zika protein structures were obtained either directly from PDB or through modeling. The prediction was done using DiscoTope 2.0 [53] by uploading the protein PDB file and specifying the chain(s) to be used. DiscoTope 2.0 combines sequence information with spatial information in finding epitope residues [53].

B-cell Epitope Shortlisting
Similar to the T-cell, consensus epitopes were passed through toxicity, conservancy, and antigenicity tests, then their similarity to human proteins was checked using BLASTp [49]. In similarity checking, epitopes with identity above 90% with a part of human proteome were eliminated.

Modeling of Protein Structures
All modeling was done using MODELLER 9.19 [54]. For the homology modeling, template proteins were first obtained by the help of "build_profile()" command or searching in PDB. After obtaining a template structure, the target sequence was aligned to the template using the command "salign()", and the model was built using "automodel()". Generated 3D structures were then validated using Errat [55], ProQ [56], Verify3D [57] and MolProbity [58].
Modeling for proteins with minimal (1aa) difference from the target sequence was done by changing only the residue of interest while leaving the other residues rigid. MODELLER would then optimize the new residue.

Molecular Docking Simulation
PDB structures of both the epitope (ligand) and MHC were used in docking. Epitope 3D structures were generated using PEP-FOLD3 [59] by inputting their amino acid sequence. AutoDock Vina 1.1.2 [60] was employed to perform docking simulations, with the inputs of both PDB files and docking parameters.
Docking preparation was done in AutoDock Tools [61]. Firstly, protein orientation was adjusted such that minimum xyz dimensions for the peptide binding site could be achieved. Protein was given polar hydrogens, and ligand torsions were enabled for all rotatable bonds, except for the amide bonds. After that, a grid box was set to be large enough to accommodate the peptide ligand (at least 30 Å x 15 Å x 15 Å in size). Docking was done using default parameters, save for exhaustiveness = 16.

Molecular Dynamics Simulation
A molecular dynamics simulation was performed to study the stability of docked peptide-MHC complexes. Docked epitopes were complexed with their HLA receptors and became input for MD simulation performed by GROMACS 2018 [62,63].
The force field AMBER99sb-ILDN [64] was used with the TIP3P water model [65]. The simulation condition was 310 K, 1 bar, and 0.15 M NaCl to mimic the physiological condition. The simulation box was a dodecahedron with an additional 1.2 nm space from the edge of the protein. The atom neighbor list was searched using the Verlet cutoff method [66], and electrostatic forces were calculated using the smooth PME method [67] with a cutoff distance of 1 nm. Minimization was done using the steepest descent down to 1000 kJ/mol.nm. Bond lengths for all atoms were constrained using the LINCS algorithm [68]. Equilibration was done by restraining heavy atom (non-H) positions. Temperature equilibration was done using the v-rescale thermostat [69] for 100 ps, followed by the pressure equilibration for 500 ps using the Berendsen thermostat (2 fs timestep). Finally, position restraints were lifted; barostat was switched to Parrinello-Rahman [70,71]; and the production simulation was run for 20 ns (2 fs timestep).

Preparation of Zika Polyprotein Sequence for Epitope Screening
Metsky et al. [9] uploaded 110 genome sequences of circulating Zika viruses in the Americas (American's Zika lineage) alongside their translated protein sequences, under the BioProject accession PRJNA344504. A sequence pool was created from 39 polyprotein sequences that contained no long stretches of gap (Table S1, Supplementary file). These sequences came from six countries/states: Dominican Republic, Colombia, Florida, Puerto Rico, Brazil, and Honduras. From the sequence pool, one consensus sequence was generated by preferring Brazilian sequences, since the strains circulating in Brazil are the most established. Sporadic mutations (occurring random or in scattered mutations) were reverted to their corresponding consensus residues as new sequences, and localized mutations (regular mutation) were reverted to their corresponding Brazilian sequences. This sequence is 3242 amino acid long and available in Diagram S1 (Supplementary file).

Selection of HLA Alleles in T-cell Epitope Screening
The HLA alleles used in this study are summarized in Table 1. In total, a pool of 13 HLA class I alleles for HLA-A and B was created. Out of the 13 alleles, nine alleles have been commonly found in Indonesia and four alleles have been reported to have a protective effect against dengue. For class II, a pool of 12 alleles for HLA-DR and DQ was created, consisting of 10 common alleles and three protective alleles. In cases where the allele subtype was not known, 01 was used to represent that allele. To form the binding pocket of HLA class II molecules, chain A and B need to be combined. However, due to the monomorphic nature of HLA-DRA [72], the recombination of chain A and B was only done for the five HLA-DQ alleles, resulting in a set of 13 class II HLA molecules.

Prediction of T-cell Epitopes Recognized by MHC-I and MHC-II
Two predictors were used in both finding MHC-I epitopes (HLA-A and B) and MHC-II (HLA-DR and DQ) to generate consensus epitope-HLA pairs. Two hundred and seventy seven epitope-HLA pairs were discovered for MHC-I, and 95 pairs for MHC-II. By ignoring HLA, there were 175 unique peptides for class I and 67 for MHC-II. To ensure the epitopes would work well as a vaccine component, the quality of these epitopes was then further evaluated by running them through immunogenicity, toxicity, conservation and population coverage tests. Ten of the best epitopes for MHC-I and II each were obtained. After further antigenicity and autoimmunity testing, nine MHC-I epitopes (Table 2) and eight MHC-II epitopes (Table 3) were selected for molecular docking study.

Docking of T-cell Epitopes to MHC Molecules
Nine selected epitopes for MHC-I and eight selected epitopes for MHC-II were used for the molecular docking study. Each epitope was paired with an appropriate HLA molecule predicted to strongly bind the epitope. By looking at the NetMHC and NetMHCII results alongside the initial screening results, one HLA was selected to be paired with each epitope based on the binding quality, allele frequency in Southeast Asia, and prediction reliability (for NetMHCII). Some class II HLA structures were not available in PDB and had to be modeled. Structures for HLA-DQA1*05:01-DQB1*03:03, HLA-DQA1*02:01-DQB1*03:03 and HLA-DRA/DRB1*09:01 were generated using the homology modeling, while DRA/HLA-DRB1*15:02 that has only one residue difference from its reference structure was point mutated to its receptor. The PDB structures used to model these HLAs are listed in Tables 4 and 5, whereas validation of the resulting structures can be seen in Table S2 (Supplementary file). The receptor with an asterisk sign (*) indicates that the structure was used as a template in modeling.
The AutoDock Vina uses its scoring function to calculate inter-atomic interactions and the difference of the Gibbs free energy (∆G) value, which depicts the binding energy between the protein and ligand [60,82]. This value can be converted into an inhibition constant (K i ) by using the formula: where R and T are the gas constant and temperature, respectively. A lower binding energy and inhibition constant indicate a stronger bond between the epitope and HLA.
Peptide docked poses are displayed in Figure 1, and docking summaries are displayed in Tables 4 and 5. The bolded entries WFHDIPLPW for MHC-I and WAIYAALTT for MHC-II had the lowest binding energies and inhibition constants (Ki) in their respective groups, indicating the best binding interactions. Where R and T are the gas constant and temperature, respectively. A lower binding energy and inhibition constant indicate a stronger bond between the epitope and HLA. Peptide docked poses are displayed in Figure 1, and docking summaries are displayed in Tables 4 and 5. The bolded entries WFHDIPLPW for MHC-I and WAIYAALTT for MHC-II had the lowest binding energies and inhibition constants (Ki) in their respective groups, indicating the best binding interactions.

Molecular Dynamics Simulations
The stability of HLAs and peptide epitopes were tested by 20 ns MD simulation. As a comparison, the same simulation was also done for the unbound form of the receptors. The RMSD analysis for WFHDIPLPW-HLA-C*04:01 ( Figure 2a) showed that the stability was reached at about 10 ns for the complex structure, indicated by a plateauing RMSD value. The unbound HLA seemed to have not reached stability. The RMSD analysis for WAIYAALTT-HLA-DRA/DRB1*15:02 (Figure 2b) also showed a rather stable RMSD in the bound structure, whereas RMSD of the unbound structure appeared to increase slowly. structure appeared to increase slowly.
Root mean square fluctuation (RMSF) values depict the degree of movement of each residue. In both cases (WFHDIPLPW and WAIYAALTT), the overall RMSF appeared higher for the unbound structure, indicating a lower stability ( Figure 3). The last nine residues in the complex RMSF belong to the epitope ( Figure 4). WFHDIPLPW had a rather high RMSF (about 1.4-2.9 Å , average 2.04 Å ), while WAIYAALTT had a lower RMSF (average 1.11 Å ) with residues one to seven having RMSF lower than 1 Å ( Figure 4). It could be inferred that residues one to seven of WAIYAALTT had been docked well, resulting in minimum changes (movements) and low RMSF values.
Lastly, the radius of gyration (Rg) analysis depicts structure density by calculating the atomic root mean square deviation (RMSD) from a center of mass [83]. A protein changing the conformation would thus display a changing Rg value. Rg values for WFHDIPLPW-HLA-C*04:01 tended to stabilize around 2.3 Å while in the unbound structure Rg seemed to yet stabilize after 20 ns ( Figure  5). Rg values for WAIYAALTT-HLA-DRB1*05:02 both in the bound and unbound structure appeared rather stable with a slightly declining trend for the unbound structure. However, the unbound Rg was lower, implying a more compact structure ( Figure 5).  Root mean square fluctuation (RMSF) values depict the degree of movement of each residue. In both cases (WFHDIPLPW and WAIYAALTT), the overall RMSF appeared higher for the unbound structure, indicating a lower stability ( Figure 3). The last nine residues in the complex RMSF belong to the epitope ( Figure 4). WFHDIPLPW had a rather high RMSF (about 1.4-2.9 Å, average 2.04 Å), while WAIYAALTT had a lower RMSF (average 1.11 Å) with residues one to seven having RMSF lower than 1 Å (Figure 4). It could be inferred that residues one to seven of WAIYAALTT had been docked well, resulting in minimum changes (movements) and low RMSF values.    Lastly, the radius of gyration (Rg) analysis depicts structure density by calculating the atomic root mean square deviation (RMSD) from a center of mass [83]. A protein changing the conformation would thus display a changing Rg value. Rg values for WFHDIPLPW-HLA-C*04:01 tended to stabilize around 2.3 Å while in the unbound structure Rg seemed to yet stabilize after 20 ns ( Figure 5). Rg values for WAIYAALTT-HLA-DRB1*05:02 both in the bound and unbound structure appeared rather stable with a slightly declining trend for the unbound structure. However, the unbound Rg was lower, implying a more compact structure ( Figure 5).   The conformational analysis revealed several changes in the bound ligand WFHDIPLPW after 20 ns of MD simulation ( Figure 6). The docked structure shows residues two to three (FH), forming the kink in the ligand structure ( Figure 6b). After 20 ns of MD simulation, the kink was observed in residues seven to nine (LPW) (Figure 6c). MD simulation also opened the ligand -COOH end part of the receptor (mainly Lys80 and Lys146) and the -NH 2 end, which was closed by Arg62 and Trp167 [84] ( Figure 6c). The conformational analysis revealed several changes in the bound ligand WFHDIPLPW after 20 ns of MD simulation ( Figure 6). The docked structure shows residues two to three (FH), forming Fleischmann et al. [85] reported that strongly binding epitopes tend to tighten the gap in the HLA molecule. According to the atoms used by Fleischmann et al. [85], d1 was defined as the distance between CA of Tyr85 and Thr138 (HLA-C*04:01) and d2 as the distance between CA of Asp74 and Ala149 (HLA-C*04:01). Following 20 ns MD simulation on WFHDIPLPW-HLA-C*04:01, d1 increased from 9171 Å to 10,041 Å and d2 from 21,132 Å to 21,773 Å.
In the case of WAIYAALTT, the MD simulation for 20 ns slightly shifted the position of Trp1 outward from the pocket with both hydrophobic rings still inside, and Ala2 was now closer to chain B, resulting in closer residue one to two conformations to the crystal structure (Figure 7a,c). Residues four and seven were also still located in the lateral pocket, and residues five and eight still pointed upward. The main difference was observed for Thr9, which dissociated itself from the HLA pocket and Thr8 that helped in accommodating the dissociation of Thr9.
Microorganisms 2019, 7, x FOR PEER REVIEW 13 of 29 the kink in the ligand structure ( Figure 6b). After 20 ns of MD simulation, the kink was observed in residues seven to nine (LPW) (Figure 6c). MD simulation also opened the ligand -COOH end part of the receptor (mainly Lys80 and Lys146) and the -NH2 end, which was closed by Arg62 and Trp167 [84] (Figure 6c). Fleischmann et al. [85] reported that strongly binding epitopes tend to tighten the gap in the HLA molecule. According to the atoms used by Fleischmann et al. [85], d1 was defined as the distance between CA of Tyr85 and Thr138 (HLA-C*04:01) and d2 as the distance between CA of Asp74 and Ala149 (HLA-C*04:01). Following 20 ns MD simulation on WFHDIPLPW-HLA-C*04:01, d1 increased from 9171 Å to 10,041 Å and d2 from 21,132 Å to 21,773 Å .

Prediction of B-cell Linear Epitopes
An epitope search was performed for all Zika proteins: C, prM and M, E, Ns1, NS2A, NS2B, NS3, NS4A, 2K, NS4B, and NS5. Two programs were employed to search for B-cell linear epitopes. A list of consensus epitopes was first generated by combining the two prediction results. A total of 42 consensus epitopes were gathered. Among these, no epitopes were present for NS2A/B, NS4A, and 2K.
The highest epitope densities were observed in NS1, NS3, and NS5. The list of consensus epitopes is available in Table S5 (Supplementary file).
Similar to T-cell epitopes, toxicity, conservancy and antigenicity tests, were performed on consensus epitopes, then their similarity to human proteins was checked. The results for these tests can be seen in Table S5. After the elimination process, 22 epitopes were found to be non-toxic, conserved, having antigenic residues and non-autoimmunity-inducing are displayed in Table 6. Lastly, each epitope was checked for the presence of a T-cell epitope residue, which will increase its quality as a vaccine epitope. The filtered epitopes were compared against epitopes for the 13 HLA alleles (both MHC-I and II) that had passed immunogenicity (MHC-I only), toxicity and conservancy tests. Epitopes EWFHDIP (E), WRDRYKYHPDSPR (NS1), and DVPAPKE (NS3) were found to share epitopic residues with MHC-I, whereas epitopes LDPYWGDVKQD and WMDARVCSDHA from NS3 shared epitopic residues with MHC-II.

Prediction of B-cell Conformational Epitopes
Initially, a PDB search for tertiary Zika protein structures was performed. The only usable tertiary structures came from proteins C, E, prM and M, NS1, NS2B, NS3 protease, NS3 helicase and NS5. No tertiary structure that could be used as a template for proteins NS2A, NS4A and NS4B was found. To use the American consensus sequence, proteins prM, NS3 (protease and helicase domains) and NS5 were modeled using homology modeling and NS1 using point mutation H1R. The list of proteins, reference structures and modeling methods are presented in Table 7. For structures generated using homology modeling, their validations can be seen in Table S6 (Supplementary file).

Prediction of T-cell Epitopes Recognized by Class I and II MHC
After acquiring pathogenic gene mutations, the Zika virus reached America and caused an outbreak in the naïve population [5][6][7]. To start the screening of epitopes, an amino acid sequence is required (see Figure 1). In this study, sequences from the Zika virus currently circulating in America were used to find the sequence used in screening.

Prediction of T-cell Epitopes Recognized by Class I and II MHC
After acquiring pathogenic gene mutations, the Zika virus reached America and caused an outbreak in the naïve population [5][6][7]. To start the screening of epitopes, an amino acid sequence is required (see Figure 1). In this study, sequences from the Zika virus currently circulating in America were used to find the sequence used in screening.
The specificity of each HLA's epitope recognition [72] means that the selection of correct HLA molecules can affect the success of epitope screening. In this study, various HLAs common in Indonesia were used to screen for T-cell epitopes. Indonesia as a tropical country, has a very suitable climate for Zika mosquito vectors. Introduction of the more recent and dangerous American strains to this area can potentially trigger an outbreak. An example of this kind of introduction was in Singapore from a traveler returning from Brazil in 2016, although it did not trigger an outbreak [86].
Some HLAs have been associated with protection or vulnerability to certain diseases; examples include Dengue type 2 protective HLAs HLA-DRB1*04 and HLA-DRB1*07 that have been reported to have lower frequencies in the patient population than the healthy population [78]. Including protective HLAs in epitope screening are expected to return more essential epitopes for pathogen neutralization. Unfortunately, reports that associate certain HLAs with protection or vulnerability to Zika are not yet available.
Screening for T-cell epitopes was done for all 26 MHC-I and II HLA molecules. A peptide was said to be epitope for its respective HLA if it was predicted to bind strongly to the HLA molecule. Two predictors were used in both finding class I MHC epitopes (HLA-A and B) and class II (HLA-DR and DQ) to generate consensus epitope-HLA pairs.
Consensus epitopes were then passed through several tests to ensure their quality as a vaccine component. For class I MHC, epitope immunogenicity was firstly evaluated. In this case, immunogenicity is defined as the ability of a peptide-MHC-I complex to bind with a T-cell receptor [40]. Peptide bound to MHC has specific residues facing outwards, giving a larger contribution to the interaction with TCR [45,87]. Toxicity was then assessed. Certain peptides can give a toxic reaction when administered by inhibiting certain biological functions [88]. After that, the epitope conservation analysis was done by considering the high specificity of MHC molecules and continuous mutation of viruses. It means a peptide used in a vaccine needs to be conserved among the various strains of the virus. An extended polyprotein sequence pool covering wider geographical area was generated to evaluate epitope conservancy. Epitopes were then put through the IEDB population coverage test [47] for regions vulnerable to Zika. This was done because HLA molecules are highly polymorphic with high peptide specificities [72], but their coding allele frequencies differ across various populations. To ensure the effectiveness of a vaccine in the general population, epitopes must be able to activate the desired immune response for the majority of the targeted population.
When checking coverage for MHC-I, coverage numbers for Central America were very low (Table 4), and thus excluded when calculating averages. When checking for the MHC-II coverage, both chains were included and counted separately, resulting in bloated coverage numbers.
Epitopes were then screened for the presence of antigenic residues. While this method is usually aimed towards B-cell epitope screening, T-cell epitopes may also come from digested B-cell epitopes [72]. Therefore, the T-cell epitope in an antigenic region is expected to have better potential of becoming a good epitope. Finally, high similarity between the epitope and parts of human proteins may raise concerns of an autoimmune response [89]. To minimize the risk of an autoimmune response, epitopes were subjected to BLASTp [49] against a human proteome.
The remaining peptides have passed various tests to ensure their quality as a vaccine component. The peptides are predicted to be able to strongly interact with MHC molecules, and be physicochemically and biologically suitable for use in humans to safely elicit the desired immune response. The coverage analysis over various geographic regions also means that these peptides would be suitable for use in a larger population.

Molecular Docking of T-cell Epitopes to MHC Molecules
Mirza et al. [22] described the drawback in using rigid body docking, where both the receptor and ligand are not allowed to change their conformation. Since the peptides were modeled in unbound conformation, they tended to interact with themselves, forming folds or helices which shouldn't be present in the bound conformation. These secondary structures were preserved in the docked structures from Mirza et al. [22]. However, since ligand flexibility was allowed in this study, a linear conformation could be found, solving the inherent problem associated with rigid body docking.
The structure of HLA used will affect the epitope docking result. To obtain the desired docking conformation, holo structures of receptors were used, with the assumption that in real conditions, both the HLA and epitope will adopt conformations similar to the holo structure. Most bound MHC ligands also have a similar conformation [72,90]. Therefore, bound ligands in receptor/reference structures were used to evaluate the correctness of docking results by visual inspection of backbone and side chains.
From nine docked conformations returned by AutoDock Vina [60], one conformation closest to the reference ligand was selected for analysis. Unfortunately, AutoDock Vina [60] returned highly varying conformations with many having a folded structure. This was more evident in MHC-II, where on average only one out of nine conformations could be considered similar to the reference (data not shown). The overall lack of docking quality may be caused by the program itself that is more specialized for docking of small ligands [60]. On the other hand, the high flexibility of peptides makes it a great challenge to produce a correct docking result [91,92].
In the case of MHC-II, receptors from homology modeling had an overall lower quality when compared to their reference structures (Table S2, Supplementary file), and since they were not modeled together with the ligand, some features of their binding sites could have been lost. The prediction accuracy for MHC-II epitopes is also still low [93], while the high prediction accuracy for MHC-I has been reported [94]. The poorer results for the MHC-II docking could also come from falsely predicted epitopes. Two particular cases are IEMAGPMAA and ILLVAHYM, where no conformation close to the reference ligand could be found. ILLVAHYM was even docked to a non-modeled receptor.
Inter-protein interactions are dominated with non-covalent interactions [95]. Hydrogen bond plays a significant role in stabilizing the ligand in its receptor, and its strength among non-covalent interactions makes it a main parameter in observing receptor-ligand interactions. Interestingly, the amounts of hydrogen bonds for these epitopes are still lower than many other epitopes.
More detailed analysis of each interaction is required to obtain a more accurate interpretation. The complete list of hydrogen bonds for all epitopes is available in Tables S3 and S4. Some atoms shared hydrogen bonds with several other atoms. Further inspection on WFHDIPLPW revealed two hydrogen bonds shared by one atom, while AIYAALTTF with 12 hydrogen bonds had seven hydrogen bonds shared by three atoms. The same phenomenon could not be observed for LYFHRRDLR (MHC-II, 12 hydrogen bonds) that only had two hydrogen bonds shared by one atom. Another important parameter to consider is the interaction distance that affects its strength. Directions of hydrogen bonds also make for interesting considerations since they influence the strength of the hydrogen bond [96]. Overall, these factors, including forces other than hydrogen bonds, are considered in the calculation of binding energy. The two epitopes with lowest binding energies WFHDIPLPW and WAIYAALTT were thus selected for further analysis by molecular dynamics simulation.

Molecular Dynamics Simulations
In the context of epitope searching, molecular dynamics (MD) simulation can be used to observe the stability of a peptide-MHC complex and evaluate the quality of the epitope [85,97]. Overall, MHC molecules showed higher stability when bound to its ligand (epitope). Considering that the original crystal structures were of holo conformation, the higher degree of changes in unbound HLAs can be partially explained by the search of a new conformation, which would be their native apo structure.
When correctly bound to HLA, a typical nonameric peptide would have the residue in each position interact with the HLA molecule in a manner specific to that HLA (an example is shown in Figure 6a) [72,90]. A known HLA-peptide complex structure can thus be used to evaluate the experimental epitope structure in the same/very similar HLA based on the position and orientation of each residue. The analysis of the crystal structure of HLA-Cw4 and its epitope (1QQD) showed that the peptide ligand had been known to be buried in positions one to three, seven and nine, with residues one, two, seven, and nine buried deep inside four pockets formed by the HLA molecule, whereas residues four and eight are the most exposed for interaction with receptor KIR2DL1 [84]. Residues three to four are also known to form a kink on the ligand structure [84], whereas the docked structure revealed residues two to three (FH) forming the kink (Figure 6b), which moved to residues seven to nine (LPW) following the MD simulation (Figure 6c). One possible reason being His3 that was facing outward in the docked structure. The opening of the receptor structure was also unexpected, as the closed ends of the class I HLA function to restrict most epitopes to eight to ten aa in length [72]. The opening of the -COOH end part of the receptor might be explained by the high density of residues seven to nine (LPW). The opening of the receptor structure was also shown by increasing d1 and d2 values, indicating a non-optimally bound epitope.
Looking at the characteristics of each residue, Trp1 is a large nonpolar residue, while HLA-Cw4 is known to accommodate smaller polar residues like Serine and Glutamine at position 1 [84]. Phenylalanine at position 2 is very similar to Tyrosine (Figure 7a), which both contain a hydrophobic ring. As expected, Phe2 was also buried deep in the same pocket as Tyr2 (Figure 7b,c). A small lateral pocket can accommodate smaller amino acids for residue three [38], but the side chain of His3 pointed upward, both before and after MD (Figure 7b,c). Residue seven is supposed to be buried in a pocket, but Leu7 pointed outward and Pro8 inward (Figure 7b,c), contrary to the crystal structure (Figure 7a) where residue seven is buried and eight exposed [84]. The pocket for residue nine is hydrophobic and was filled by Tryptophan. Following the MD simulation, the HLA residues Lys80 and Lys146 separated, and Tyr9, despite still facing inward was positioned higher (closer to the surface) (Figure 7c). Overall, it appeared that the ligand tended to separate itself from the HLA molecule, most visibly in residues six to eight, leaving Phe2 as the main anchor residue after 20 ns of MD simulation. However, it is not yet known whether this phenomenon was caused by incorrect epitope residues or improper docking result.
As for WAIYAALTT, the receptor structure HLA-DRA/DRB1*15:02 was obtained by the V115G point mutation of chain B of HLA-DRA/DRB1*15:01 (PDB ID: 5V4M) [98]. The conformational analysis was done using the template structure. Amino acids of residues one, four, seven, and nine were reported to be buried in HLA pockets, working as main anchors [98]. In the docked structure, Trp1 and Trp9 were buried, and Tyr4 and Leu7 pointed toward the lateral pocket as described [98] (Figure 8b). The similar orientations of ligand residues indicate a good docking result.
The discharge of residue nine from its pocket possible indicates an incompatibility with interacting HLA residues. Threonine itself is a polar amino acid, which goes against the accepted residues in the ninth position, Phe, Leu, Val, and Ile, all nonpolar [98]. During the simulation, Thr9 might have gone up to interact with water molecules. The dramatically change in the Thr9 position also explains the high RMSF values in residues eight and nine (Figure 5b).
All things considered, the docked epitope WAIYAALTT is sufficiently good for residues one to eight when docked to HLA*C:04:02. However, residue nine should not have been polar. Whether threonine in the ninth position makes this epitope, a bad epitope is yet to be determined. In this respect, however, the simulated structures from the result of each 20 ns of MD simulation analyses cannot be conclusive but open the way to further research developments.

Prediction of B-cell Linear Epitopes
Several programs have been developed to predict B-cell epitopes from a protein sequence. Unfortunately, the prediction capabilities of these programs are still rather poor [50]. The linear B-cell epitope was searched for on all Zika proteins. Despite not all proteins being readily exposed for binding with the antibody, an epitope search for non-naturally-exposed proteins would still be useful for physiological laboratory testing, as was done by Saisawang et al. [99].
Interestingly, the removal of pr sequence altered BepiPred-2.0 scoring for the M protein, giving rise to a new epitope previously undetected in prM. Care should be taken regarding the completeness of the sequence given for BepiPred-2.0.
The elimination method used in this study had several shortcomings where longer epitopes tended to be eliminated in the conservancy test, and shorter epitopes tended to be eliminated in the autoimmunity test. Even so, mitigating these problems would require a variation of epitope lengths, such as trimming into several parts or inclusion of neighboring residues. While in vaccine/antibody design modifications may be required, each epitope was treated as one peptide for simplicity reasons.
Despite no epitope having both MHC-I and II epitope residues, five epitopes that share epitopic residues with MHC have better potential as a good linear epitope due to cross-recognition by T-cells. Of particular interest are those recognized by CD4+ T-cells that help in B-cell activation [72]. Among these epitopes, care should be taken before employing DVPAPKE for use in humans, as BLASTp found only one residue difference; lysine from its human counterpart arginine, which are both similar (hydrophilic and positively charged). A conservative mutation, as in this case, would result in minimal characteristic change [100]. Peptide extension might solve this issue.
In conclusion, all 22 filtered consensus epitopes have the potential to become a linear epitope for B-cells. Among them, five epitopes, EWFHDIP (E), WRDRYKYHPDSPR (NS1), DVPAPKE (NS3), LDPYWGDVKQD (NS3) and WMDARVCSDHA (NS3) are the most potential candidates owing to the presence of T-cell epitope residues. Care should, however, be taken for DVPAPKE for its high similarity to the human protein.

Prediction of B-cell Conformational Epitopes
Despite conformational epitopes making up 90% of B-cell epitopes [101], predicting these epitopes is much more difficult due to the need of the protein tertiary structure. To create a vaccine for humans, the main target proteins are those naturally exposed for interaction with B-cell antibodies. In addition, the organization of proteins can change depending on the condition. In virions, the E protein is an exposed protein that plays a role in cellular entry [102]. Before maturation, prM-E complexes form trimeric spikes [102,103]. In mature virions, following a successful prM cleavage, these M-E complexes take on a dimeric arrangement on the surface of the virus, embedded at their transmembrane domains (shown in Figure 8a) [104]. Another protein, NS1, is the only nonstructural protein secreted out of the cell [105], allowing antibody attachment. Inside cells, NS1 forms dimers but secreted NS1 forms hexamers in the configuration of three dimers [105]. In both forms, the beta-roll domain (shown in Figure 8b) is not exposed because it associates with the membrane or is hidden inside, forming a hydrophobic core [105]. This information means antibodies are unlikely to bind to the 'underside' of proteins E (which contains the transmembrane domain) and NS1 (which contains the beta-roll domain), particularly in humans. Aside from proteins E and NS1, C also forms a dimer, and its presence as a monomeric and trimeric protein has been reported [106]. Even though in laboratories, the research environment can be manipulated, in human applications, information regarding how the proteins associate becomes essential. Therefore, in epitope prediction, more focus was given to proteins E and NS1 that can be naturally bound by antibodies.
The search for E protein epitopes was done for both its monomeric and E-M heterodimer forms and reported in Figure 8a. Chain E was used to represent monomeric epitopes since it returned to more epitope positions. Glycosylation in Asn154 has been reported to play a role in vivo Zika infectivity by inhibiting the production of reactive oxygen, which plays a role in mosquito immunity [106]. Unfortunately, glycosylation data did not affect scoring. Thus, its effect could not be observed. In human applications, E proteins exposed for antibody binding will mostly be found on the virion surface in the form of an E-M heterodimer [104]. Hence these heterodimer epitopes can be focused on vaccine/antibody design. It should also be noted that E proteins are packed with the side containing the transmembrane domain (Figure 8a, pink) facing inward, hidden from the reach of antibodies [104].
Most known Zika antibody-antigen complex structures come from the E protein [107], so it is interesting to compare the epitopes predicted by DiscoTope with the empirical data available. Compared with available antibody-antigen structures, the E fusion loop region, which was predicted as the epitope only in the monomeric structure, is also a target for a flavivirus broadly neutralizing antibody 2A10G6 [108]. Some antibodies like Z20 and ZIKV-117 have been reported to work on the dimeric form of the E protein, attaching themselves across both monomers [109,110]. Predicted epitope residues Asp320-Gly232 are also part of the ZIKV-117 binding site [109], and residue Gly232 is a part of the Z20 binding site [110]. However, these antibodies still have extensive interactions with many surrounding residues. Lastly, domain III (labeled in Figure 8a) is also known to be a binding site for several antibodies [111,112]. Only one residue (Gln350) was predicted as the epitope in this domain, which is known to interact with antibody Z006 [111]. As is the case with Z20 and ZIKV-117, this residue is only a part of a much larger interaction site. Due to the large surface area of antibodies, antigens can be expected to interact with antibodies in several residues. In the latter three cases, the predicted epitopic residues are not located at the 'center' of the actual epitopes.
The NS1 protein was also checked in both its monomeric and dimeric form (Table 8, Figure 8b). As with the E protein, this protein can also be found outside of cells, in the form of hexamers, which model can be found in [104]. For the creation of the vaccine/antibody, the side containing beta-roll domain (Figure 8b, pink) will be hidden and unlikely for antibody binding [105]. Interestingly, two epitopes Asp157 and Glu258 were present in the dimeric but not monomeric structure. Asp157 is positioned close to the β-roll and β-ladder of the other chain, and the appearance of Glu258 may be caused by the wing of the other chain. In both cases, these epitopes appeared because of the presence of new neighboring residues previously absent in the monomeric structure.
Aside from the two proteins accessible extracellularly under natural conditions, the epitope search for other proteins, such as nonstructural proteins, is also interesting for its usefulness in laboratory testing. The epitope search for the C protein revealed several epitopes found only in its monomeric form, whereas no epitope was detected for its multimeric form (Table 8. The prM protein spans across the entire E protein and the consecutive epitopic residues form a long stretch (Table 8). This stretch might not be a true epitope since the E protein was not accounted for during prediction. The M protein in the form of the E-M heterodimer did not have any epitope, but by itself, a long stretch of epitopic residues was found (Table 8). This epitope is located at the end of the long stretch of the prM epitope. The structure for NS2B was obtained from a complexed structure with the protease domain of dengue NS3 [113]. NS2B is a cofactor of the NS3 protease and circles the protein [113], similar to the case of prM; the long 'stretch' was also identified as the epitope (Table 8). In the presence of NS3 it is also probable that this epitope will disappear. Epitope residues for the NS3 protease and helicase are available in Table 8. Unfortunately, the complete structure of NS3 has not been determined, forbidding the epitope search on the full protein structure. Epitopes for the last protein NS5 are displayed in Table 8. This protein gave the most epitopic residues, partly from its large size.
Epitope prediction using DiscoTope 2.0 [114] uses spatial information to predict epitopic residues. The predicted epitopes tend to be in the more exposed areas. The structural unit used will also influence prediction results since some residues will become available/unavailable for interaction with antibodies. For the application in humans, it is important to select the correct structural unit in predicting conformational epitopes. The epitopic regions potential as antibody targets, or can be said to activate B-cells for proteins E and NS1 in their extracellular forms have been predicted. Some predicted epitopic residues in the E protein have also been confirmed to interact with antibodies [108][109][110][111]. Additionally, B-cell epitope residues have been predicted for proteins C, prM, M, NS2B, NS3 protease and helicase, and NS5. A similar study of immunoinformatics approach in the development of the novel peptide vaccine using T-and B-cells highly conserved epitopes of Zika envelope glycoprotein showed positive results as a potential vaccine [53].
Further investigation on how the immunogenic peptide could induce T-cell responses by Paquin-Proulx et al. (2017). An interesting result showed that T-cell cross-reactivity could give a positive impact on the ZIKV infection in vitro. Paquin-Proulx et al. successfully demonstrated that the ZIKV CD4+ T-cell response induced by DENV vaccination could promote the appearance of cross-reactive antibodies mediating ADE. However, a better understanding of the interactions between the ZIKV and DENV immunity, such as induction by vaccination or by natural infection, will be necessary to provide safe and efficient vaccines [115]. Several studies have reported how the computational prediction of antigenic peptides were successfully validated experimentally to combat HIV-1, malaria parasites and leishmaniasis, a vector-borne parasitic disease [116][117][118]. In validating the predicted epitopes, it is suggested to integrate genomics with proteomics approaches to represent a valid strategy in refining the antigen characterizations as well as give the useful insights on abundance and subcellular localization of pathogenic antigens [119]. These predicted epitopes need further investigation by synthesizing and evaluating in vitro using the cell culture or rodent model to determine the impacts of immunity to ZIKV.

Conclusions
By only using necessary peptides, the peptide vaccine is expected to minimize risks associated with unnecessary antigens. Furthermore, using in silico methods to screen for the best candidates can significantly reduce the time and materials spent in laboratories. In this study, the in silico search has been performed to find peptides that can potentially trigger the immune system, and be a component of the Zika vaccine. For the T-cell, the search method was adjusted for certain regions and their surrounding populations. Potential epitopes for both MHC-I and MHC-II have been screened for from the complete Zika protein sequence. Docking and molecular dynamics simulations have also been performed on the most promising epitopes. Regrettably, MD results were unsatisfactory for the best MHC-I epitope, WFHDIPLPW, which was probably caused by an improper starting docked structure. MD results for the best MHC-II epitope WAIYAALTT were quite promising apart from the harmful effect of one residue's polarity.
Coverage analysis does not account for HLA allele pairing, whereas for some HLA both chains are required to confer their specificity. Finding the frequencies of the allele pairs (chain A and B) is the ideal solution, but it is difficult to do. In this case, the epitope search using globally common alleles might give a better result.
The linear and conformational epitopes of the B-cell have been successfully mapped. Several promising B-cell epitopes that have been verified by physicochemical modeling have been reported in this study. Most promising among them are EWFHDIP from the E protein, WRDRYKYHPDSPR from the NS1 protein, and DVPAPKE, LDPYWGDVKQD and WMDARVCSDHA from the NS3 protein, which are also predicted as T-cell epitopes. Of particular interest are LDPYWGDVKQD and WMDARVCSDHA since they would also be recognized by T helper cells which play a role in B-cell activation. Caution should be taken before using DVPAPKE for its similarity to a part of the human proteome.
Conformational epitopes for the B-cell have also been searched for, for proteins C, E, prM, M, NS1, NS2B, NS3 (protease and helicase domains) and NS5. Particular attention was given to the proteins E and NS1 that can be found as an extracellular matrix under native conditions. These proteins are the primary antibody targets in the human body. This study also considered the change in epitope residues when a protein is in its monomeric or multimeric form.
Supplementary Materials: Supplementary materials can be found at http://www.mdpi.com/2076-2607/7/8/226/s1. Table S1: Zika polyprotein sequences used to obtain consensus sequence and in epitope conservancy analysis, Table S2: Validation of homology modeled MHC structures and their template structures, Table S3: List of hydrogen bonds for docked MHC-I epitopes, Table S4: List of hydrogen bonds for docked MHC-II epitopes, Table  S5: List of consensus linear B cell epitopes, scores averaged across all residues, and their filtration results, Table S6: Validation of homology modeled MHC structures and their template structures, Diagram S1: Consensus Zika polyprotein sequence.