Modeling Novel Putative Drugs and Vaccine Candidates against Tick-Borne Pathogens: A Subtractive Proteomics Approach

Ticks and tick-borne pathogens (TBPs) continuously causing substantial losses to the public and veterinary health sectors. The identification of putative drug targets and vaccine candidates is crucial to control TBPs. No information has been recorded on designing novel drug targets and vaccine candidates based on proteins. Subtractive proteomics is an in silico approach that utilizes extensive screening for the identification of novel drug targets or vaccine candidates based on the determination of potential target proteins available in a pathogen proteome that may be used effectively to control diseases caused by these infectious agents. The present study aimed to investigate novel drug targets and vaccine candidates by utilizing subtractive proteomics to scan the available proteomes of TBPs and predict essential and non-host homologous proteins required for the survival of these diseases causing agents. Subtractive proteome analysis revealed a list of fifteen essential, non-host homologous, and unique metabolic proteins in the complete proteome of selected pathogens. Among these therapeutic target proteins, three were excluded due to the presence in host gut metagenome, eleven were found to be highly potential drug targets, while only one was found as a potential vaccine candidate against TBPs. The present study may provide a foundation to design potential drug targets and vaccine candidates for the effective control of infections caused by TBPs.


Introduction
Ticks are ectoparasites and notorious vectors for disease-causing pathogens that transmit various arboviruses, bacteria, and protozoans to vertebrate hosts adversely affecting the livestock industry and public health [1][2][3][4]. Some of the tick-borne pathogens (TBPs), such as bacteria (Rickettsia rickettsii, Francisella tularensis, Ehrlichia chaffeensis, Anaplasma phagocytophilum, Borrelia burgdorferi), Vet protozoans (Babesia spp., Theileria spp.), and viruses (Crimean-Congo hemorrhagic fever virus, tick-borne encephalitis virus), cause a variety of diseases in infected hosts [5][6][7][8][9][10][11]. Human and animal movements associated with environmental changes have favored the dispersal of ticks and TBPs [12,13]. Therefore, the emergence and re-emergence of several TBPs pose public and veterinary health risks. For instance, tick-borne diseases, such as borreliosis, ehrlichiosis, anaplasmosis, and rickettsiosis, are some of the diseases emerging in regions where they have not been reported previously [14][15][16][17]. Recent progress in the field of bioinformatics has generated various in silico strategies and drug designing approaches that reduce the time and cost associated with the trial and error experimentations for drug development [18,19]. These methods serve to shortlist the potential drug targets that may be used for experimental validation. Subtractive proteomics is an in silico method used for the identification of essential and non-host homologous proteins within a pathogen proteome [18,20,21]. By selecting essential proteins unique to pathogen survival and propagation, the subtractive proteomics approach allows the identification of novel drug targets within a pathogen. The Database of Essential Gene (DEG) server can be used for the identification of those proteins involved in central metabolic pathways required for the survival of a pathogen.
The identification of proteins homologous to the proteins in the host gut can be screened out during the prediction of computer-based drug targets or vaccine candidates to avoid potential adverse effects of a drug. Target proteins selected through this approach may be used as a promising tool to control the diseases caused by infectious agents [22]. Subtractive proteome analysis has already been utilized for the identification of novel drug targets and vaccine candidates against several life-threatening pathogens such as Pseudomonas aeruginosa [23], Streptococcus pneumonia [24], and Mycobacterium tuberculosis [25,26].
Vaccination is a promising and sustainable approach to controlling ticks and TBPs [27,28]. Various in silico and drug design approaches have generated a plethora of data by eliminating the time and cost involved in trial and error experimentations during a drug or vaccine development [18,[29][30][31][32][33][34]. Inceptive steps in the discovery of a novel drug target or vaccine candidates include the identification of target proteins [35]. To the best of our knowledge, limited studies have been reported using subtractive proteome analysis for the identification of drug targets or vaccine candidates against TBPs such as B. burgdorferi ZS7 [36] and Rickettsia rickettsii [37]. The purpose of this study is an in silico approach using subtractive proteomics for the prediction of potential drug targets and vaccine candidates against TBPs.

Retrieval of Pathogens Proteome
In this study, TBPs were selected which had not been previously reported in similar in silico studies, had available complete proteome in the National Center for Biotechnology Information (NCBI), their name availability in the KEGG (Kyoto Encyclopedia of Genes and Genomes pathway database), their KO (KEGG Orthology) list provided by KAAS (KEGG Automatic Annotation Server), and their KO list of proteins available in KEGG pathways. The analysis of other TBPs was excluded in this study due to the fact of their available published reports and unavailability of their complete proteome and KO number in the KEGG database. The complete proteomes of selected pathogens, including Borrelia burgdorferi B31, Ehrlichia chaffeensis str. Arkansas, Rickettsia rickettsii str. "Sheila Smith", Francisella tularensis SCHU S4, and Anaplasma phagocytophilum HZ, were retrieved in Fast Adaptive Shrinkage Threshold Algorithm (FASTA) format from NCBI.

Identification of Essential and Non-Host Homologous Proteins in Pathogens
To identify paralogous, duplicate or redundant sequences (when one or more homologous sequences are present in the same set of data) [38,39], the proteome of each pathogen was subjected to CD-HIT (cluster database at high identity with tolerance) with a sequence identity cut-off value of 0.4 (40%) [40,41]. Those proteins having more than a 40% identity were considered as paralogs in this analysis. The paralog protein sequences were excluded, and the non-paralog protein sets were subjected to the Basic Local Alignment Search Tool (BLASTp) at NCBI [42] against the host (Homo sapiens and Bos taurus) with threshold expected value (E-value) 10 −5 to identify the non-host homologous proteins in pathogens. To screen the essential proteins, the retrieved non-homologous protein sequences, which were not present in the host (H. sapiens and B. taurus), were subjected to BLASTp against DEG to obtain essential genes [43,44]. The cut-off for E-value, bit score, and percentage of identity were considered <E 10 −10 , ≥100, and >35%, respectively [43,45,46]. A minimum bit score of 100 was used to screen out proteins that represented essential genes. The resultant data set revealed the non-homologous essential proteins of pathogens.

Metabolic Pathways and Subcellular Localization Analysis
The pathogen-specific metabolic pathways were predicted by subjecting the non-homologous proteins to KAAS and KEGG [47][48][49][50]. The proteins were separated based on their role in pathogenspecific unique metabolic pathways. The online server subCELlular Localization (CELLO) V.2.5 [51] was used for the prediction of subcellular localization of these proteins.

Druggability, Virulency Antigenicity, and Allergenicity Analysis
The vital non-host homologous proteins of pathogens were BLASTp against the DrugBank database which contains the Food and Drug Administration (FDA) approved drugs. As previously reported [52], target proteins with a bit score > 100, E-value 10 −5 , and having more than 50% identity with the drug targets present in the DrugBank database were selected as druggable. Virulence factors (VFs) of selected pathogen proteins were identified by performing BLASTp searches against the Virulence Factors Database (VFDB) core data set (R1) with a cut-off bit score > 100, and the E-value was 10 −5 [53]. Vaxijen, an antigen alignment independent prediction tool was used for antigenicity analysis, and the AllerTOP v.2.0 server was used to predict allergenicity. The predicted antigenic score for each protein was categorized into a high antigenic and non-antigenic score. Proteins having antigenic scores more than 0.4 (default threshold value 40%) were considered highly antigenic, whereas those having less than 0.4 scores were considered as non-antigenic. Proteins with a high antigenic score were selected, and the NetCTL 1.2 server [54] was used for the prediction of potential T-cell epitopes. The Immune Epitopes Database was used to find the interaction between the T-cell epitope and MHC-I molecule (IEDB). To predict B-cell epitopes, a set of bioinformatics tools was used including the Kolaskar and Tongaonkar antigenicity scale [55], Emini surface accessibility prediction [56], Karplus and Schulz flexibility prediction [57], Bepipred linear epitope prediction analysis [54], and Chou and Fasman β-turn prediction analysis [58,59]. ProtParam [60] predicted the molecular weight, instability index, approximate half-life, isoelectric pH, GRAVY values, hydropathicity, and aliphatic index of the vaccine candidates.

Human Gut-Metagenomes Screening and Secondary Structure Prediction
To knock out pathogen proteins found in human gut flora, essential, non-homologous, and virulent proteins of B. burgdorferi B31, E. chaffeensis str. Arkansas, F. tularensis SCHU S4, and A. phagocytophilum HZ were scanned by BLASTp with an E-value cut-off score of 1 against proteins of the human gut flora using Human Microbiome Project database server [61].
The self-optimized prediction method by SOPMA alignment software [62] and Position-Specific Iterative Basic Local Alignment Search Tool (PSI-BLAST) based secondary structure prediction (PSIPRED) program were used to predict the secondary structure of the target proteins.

Phylogenetic Analysis
The amino acid sequence of the vaccine candidate (B. burgdorferi B31 FLiS protein) identified in this study was scanned for homologous sequences by BLASTp at NCBI. The homologous sequences were downloaded in FASTA format and were aligned using ClustalW in BioEdit Sequence Alignment Editor Vet. Sci. 2020, 7, 129 4 of 20 v.7.0.5 [63]. The evolutionary relationship of sequences was constructed using the neighbor-joining method in MEGA v. X [64] with bootstrapping at 1000 replications [65].

Homology Modeling and Molecular Dynamics Simulation
A swiss-model online database was used for the homology modeling of each target protein.
Subsequently, the predicted models were validated using the Ramachandran plot [66]. The COFACTOR [67] server was employed for the prediction of the binding site in the generated models. Moreover, the model was checked for stability by molecular dynamics (MD) simulation methodology using AMBER v2014 software package [68]. The LEaP module was used to add the missing polar/non-polar hydrogen atoms and counterions (Na + and Cl − ) were added to neutralize the overall system. Next, a solvated octahedral box of transferable intermolecular potential with 3 points (TIP3P) water model (10.0 Å buffer) was used to sandwich the system in a water environment. Bonds involving hydrogen atoms were constrained with the SHAKE algorithm [69]. All MD simulations were done by the CUDA version of PMEMD in GPU cores of NVIDIA ® Tesla K80 [68]. The NPT ensemble at 298 K, 1 bar, and an integration time step of 2 fs was used to integrate the equations of motion. An Anderson-like temperature coupling scheme was used to control the temperature and imaginary "collisions" were randomized by the velocities at a distribution corresponding to simulation temperature every 1000 steps. Pressure control was performed using Berendsen barostat with the pressure relaxation time set to 1.0 ps. A cut-off 8.0 Å was used for Lennard-Jones interactions and the short-range electrostatic interactions.

Identification of Essential and Non-Host Homologous Proteins in Pathogens
To our knowledge, the subtractive analysis performed in this study is the first computational report to characterize and identify novel therapeutic targets for the control of TBPs. To predict unique proteins as drug targets and vaccine candidates within the proteome of a pathogen, subtractive proteomics has been reported among the most powerful approaches for unique yet uncharacterized sequences as possible therapeutic targets [18,25,33,[70][71][72][73][74][75]. The objective of the current study was to predict novel drug targets and vaccine candidates based on subtractive proteomics approach against B. burgdorferi B31, E. chaffeensis str. Arkansas, R. rickettsii str. "Sheila Smith", A. phagocytophilum HZ, and F. tularensis SCHU S4. The entire proteomes of selected TBPs were scanned to obtain a group of essential and non-host homologous proteins. Among them, cytoplasmic proteins were predicted as putative drug targets and a membrane-bound protein as a vaccine candidate. This membrane-bound protein may be a capable vaccine candidate for controlling infections caused by TBPs. The entire model of this subtractive analysis is given in the flow chart below ( Figure 1).
Complete proteomes of selected pathogens, including B. burgdorferi B31 (1391 proteins), E. chaffeensis str. Arkansas (889 proteins), R. rickettsii str. "Sheila Smith" (1246 proteins), A. phagocytophilum HZ (1048 proteins), and F. tularensis SCHU S4 (1556 proteins), were retrieved and subjected to the CD-HIT algorithm to remove paralogous sequences [61]. A 40% similarity was chosen as a cut-off to maintain a very stringent selection criteria for the identification of the most effective targets. It has been widely accepted to set a 40% sequence identity as a cut-off to maintain a rigid criterion to remove duplicate proteins [31,45,71,76,77]. This is because protein sequence databases are incredibly redundant, and this redundancy occurs when several similar data are deposited from different regions [78]. The inclusion of similar sequences in individual-specific analyses mostly introduces undesirable biases [38,39]. Duplicate proteins and proteins with less than 100 amino acids were also excluded, and this has been previously documented [18,79,80]. A set of non-paralogous proteins was generated for further analysis based on the assumption that these proteins may be essential for pathogen survival [80,81]. The identified non-paralogous proteins were 1181 out of 1391 in B. burgdorferi B31, 846 out of 889 in E. chaffeensis str. Arkansas, 830 out of 1246 in R. rickettsii str. "Sheila Smith", 712 out of 1048 in A. phagocytophilum HZ, and 1295 out of 1556 in F. tularensis SCHU S4. The non-redundant data set was further filtered, and only those proteins which had a sequence similarity less than 30% or no significant similarity with the host (H. sapiens and B. taurus) proteome were targeted. Further, an NCBI BLASTp search with a threshold expectation value of (E-value) 10 −5 with the host (H. sapiens and B. taurus) was used, and sequences that showed no similarity with the host were selected. The resultant data set revealed non-host homologous proteins of pathogens. Non-host homologous proteins were 765 in B. burgdorferi B31, 793 in E. chaffeensis str. Arkansas, 409 in R. rickettsii str. "Sheila Smith", 105 in A. phagocytophilum HZ, and 185 in F. tularensis SCHU S4. Complete proteomes of selected pathogens, including B. burgdorferi B31 (1391 proteins), E. chaffeensis str. Arkansas (889 proteins), R. rickettsii str. "Sheila Smith" (1246 proteins), A. phagocytophilum HZ (1048 proteins), and F. tularensis SCHU S4 (1556 proteins), were retrieved and subjected to the CD-HIT algorithm to remove paralogous sequences [61]. A 40% similarity was chosen as a cut-off to maintain a very stringent selection criteria for the identification of the most effective targets. It has been widely accepted to set a 40% sequence identity as a cut-off to maintain a rigid criterion to remove duplicate proteins [31,45,71,76,77]. This is because protein sequence databases are incredibly redundant, and this redundancy occurs when several similar data are deposited from different regions [78]. The inclusion of similar sequences in individual-specific analyses mostly introduces undesirable biases [38,39]. Duplicate proteins and proteins with less than 100 amino acids were also excluded, and this has been previously documented [18,79,80]. A set of non-paralogous proteins was generated for further analysis based on the assumption that these proteins may be essential for pathogen survival [80,81]. The identified non-paralogous proteins were 1181 out of 1391 in B. burgdorferi B31, 846 out of 889 in E. chaffeensis str. Arkansas, 830 out of 1246 in R. rickettsii str. "Sheila Smith", 712 out of 1048 in A. phagocytophilum HZ, and 1295 out of 1556 in F. tularensis SCHU S4. The non-redundant data set was further filtered, and only those proteins which had a sequence similarity less than 30% or no significant similarity with the host (H. sapiens and B. taurus) proteome were targeted. Further, an NCBI BLASTp search with a threshold expectation value of (E-value) 10 −5 with the host (H. sapiens and B. taurus) was used, and sequences that showed no similarity with the host were selected. The resultant data set revealed non-host homologous proteins of pathogens. Non-host homologous proteins were 765 in B. burgdorferi B31, 793 in E. chaffeensis str. Arkansas, 409 in R. rickettsii str. "Sheila Smith", 105 in A. phagocytophilum HZ, and 185 in F. tularensis SCHU S4. Essential proteins are regularly required to support the basic cellular functions of micro-organisms and are essential for the survival of a pathogen [76,82]. A potent drug target must be an essential protein possessing features required for the survival and existence of a pathogen [75]. A BLASTp search for the non-homologous proteins of selected pathogens against the DEG database was done to screen out the essential proteins [43,83]. The queried proteins having a homologous hit in DEG was 34 in B. burgdorferi B31, 113 in E. chaffeensis str. Arkansas, 76 in R. rickettsii str. "Sheila Smith", 105 in A. phagocytophilum HZ, and 185 in F. tularensis SCHU S4. All these predicted sets of essential proteins were found to be involved in metabolic pathways (Table 1).

Pathogens Unique Metabolic Pathways and Subcellular Localization
The predicted novel metabolic pathways in all TBPs were 66, and among them, 14 were in R. rickettsii str. "Sheila Smith", 13 in B. burgdorferi B31, 8 in E. chaffeensis str. Arkansas, 6 in A. phagocytophilum HZ, and 25 in F. tularensis SCHU S4. A total of 61 proteins were found to be involved in metabolic pathways that are unique to TBPs and having no similarity with the host (H. sapiens and B. taurus) proteome. The unique metabolic pathways included the quorum-sensing metabolic pathway, two-component system, lysine biosynthesis, flagellar assembly, bacterial secretion system, monobactam biosynthesis, and the peptidoglycan biosynthesis (Table 2). These unique metabolic pathways contain essential proteins necessary for the survival, virulence, and pathogenicity of TBPs that can be used as drug targets and vaccine candidates.

Functional Analysis of Unique Pathways
Comparative analysis of the metabolic pathways of TBPs against the host (H. sapiens and B. taurus) revealed 66 unique pathways in TBPs having no similarities with the host. The KO list of TBPs proteins provided by the KAAS server was searched against each pathogen pathway to screen the unique essential proteins involved in unique pathways. Among them, 12 unique pathways-such as quorum-sensing, two-component system, and lysine biosynthesis in A. phagocytophilum HZ; flagellar assembly in B. burgdorferi B31; bacterial secretion system and monobactam biosynthesis in E. chaffeensis str. Arkansas; quorum-sensing and peptidoglycan biosynthesis in F. tularensis SCHU S4; and the two-component system in R. rickettsii str. "Sheila Smith"-have unique essential proteins having no similarities with host pathways (Table 2).
Proteins present in the quorum-sensing pathway are responsible for the bioluminescence, sporulation, competence, antibiotic production, biofilm formation, and virulence factors secretion [86][87][88]. Two of the target proteins, preprotein translocase subunit SecY and preprotein translocase subunit SecG protein, are present in the quorum-sensing pathway of A. phagocytophilum HZ and F. tularensis SCHU S4, respectively, which can be used as potential drug targets. The two-component system pathway, essential for the growth and survival in adverse environmental conditions, is ubiquitous in bacteria and has been reported to be involved in virulence [89,90]. The chromosomal replication initiator protein DnaA (dnaA) and cytochrome d ubiquinol oxidase subunit 1 protein are

Functional Analysis of Unique Pathways
Comparative analysis of the metabolic pathways of TBPs against the host (H. sapiens and B. taurus) revealed 66 unique pathways in TBPs having no similarities with the host. The KO list of TBPs proteins provided by the KAAS server was searched against each pathogen pathway to screen the unique essential proteins involved in unique pathways. Among them, 12 unique pathways-such as quorum-sensing, two-component system, and lysine biosynthesis in A. phagocytophilum HZ; flagellar assembly in B. burgdorferi B31; bacterial secretion system and monobactam biosynthesis in E. chaffeensis str. Arkansas; quorum-sensing and peptidoglycan biosynthesis in F. tularensis SCHU S4; and the two-component system in R. rickettsii str. "Sheila Smith"-have unique essential proteins having no similarities with host pathways (Table 2).
Proteins present in the quorum-sensing pathway are responsible for the bioluminescence, sporulation, competence, antibiotic production, biofilm formation, and virulence factors secretion [86][87][88]. Two of the target proteins, preprotein translocase subunit SecY and preprotein translocase subunit SecG protein, are present in the quorum-sensing pathway of A. phagocytophilum HZ and F. tularensis SCHU S4, respectively, which can be used as potential drug targets. The two-component system pathway, essential for the growth and survival in adverse environmental conditions, is ubiquitous in bacteria and has been reported to be involved in virulence [89,90]. The chromosomal replication initiator protein DnaA (dnaA) and cytochrome d ubiquinol oxidase subunit 1 protein are present in the two-component system pathway of the A. phagocytophilum HZ and R. rickettsii str. "Sheila Smith", respectively. The peptide cross-linking in the peptidoglycan layer of bacteria plays a central role in pathogenesis. Inhibitors of peptidoglycans form a significant class of antibiotics and have been demonstrated as probable drug targets [91,92]. The biosynthesis of peptidoglycan involves various ADP forming ligases, such as MurA, MurC, MurD, MurE, and MurF, which catalyze the successive additions of l-alanine, d-glutamate, a diamino acid, and d-alanine-d-alanine to UDP-N-acetylmuramic acid [93]. Both UDP-N-acetylmuramate-l-alanine ligase (murC) and phospho-N-acetylmuramoyl-pentapeptide-transferase (murE) are present in the peptidoglycan pathway of the F. tularensis SCHU S4. These drug targets, which inhibit peptidoglycan biosynthesis, have the potential to control pathogens and minimize microbe-generated pathogenicity [77]. The general secretion (Sec) and twin-arginine translocation (Tat) pathways are the bacterial secretion system, most used to transport proteins across the cytoplasmic membrane [94]. Pathogens require a functional Tat pathway for virulence during infection, survival, and other physiological functions [95][96][97]. Similarly, the twin-arginine translocase subunit TatC present in the bacterial secretion system pathway, and aspartate kinase in the monobactam biosynthesis pathway of E. chaffeensis str. Arkansas is required for survival and virulence. Aspartate-semialdehyde dehydrogenase is present in the lysine biosynthesis pathway of A. phagocytophilum HZ. Several proteins of the flagellar assembly pathway are involved in protein export, especially in the export of VFs [98]. The proteins UDP-N-acetylmuramoyl-tripeptide-d-alanyl-d-alanine ligase and Flagellar secretion chaperone FliS are present in the flagellar assembly pathway of B. burgdorferi B31.
All the predicted 12 target proteins present in the unique pathways of TBPs have no similarities with the host pathways (H. sapiens and B. taurus). Thus, proteins involved in these pathways are potential drug targets, and their inhibition will increase the susceptibility of TBPs to various drugs ( Table 2).

Druggability and Virulence Analysis for the Identification of Potential Drug Targets and Vaccine Candidates
To evaluate the druggability potential, the shortlisted essential proteins were subject to BLASTp against the FDA approved drugs. A total of fifteen proteins from all pathogens were predicted to be druggable. For instance, there were four protein targets (i.e., chitibiose transporter protein ChbA, FLiS, flagellar hook capping protein, UDP-N-acetylenolpyruvoyl glucosamine reductase) in B. burgdorferi B31, three-drug targets (i.e., twin-arginine translocase subunit TatC, preprotein translocase subunit SecA, and aspartate kinase) in E. chaffeensis str. Arkansas, one drug target (i.e., cytochrome d ubiquinol oxidase subunit I) in R. rickettsii str. "Sheila Smith", four drug targets (i.e., UDP-Nacetylmuramate-l-alanine ligase, preprotein translocase subunit SecG, preprotein translocase subunit SecY, and UDP-N-acetylmuramoylalanyl-d-glutamate-2,6-diaminopimelate ligase) in F. tularensis SCHU S4, and three-drug targets (i.e., preprotein translocase subunit SecY, chromosomal replication initiator protein DnaA, and aspartate-semialdehyde dehydrogenase) in A. phagocytophilum HZ. Screening of VFs has been a promising option for the prediction of therapeutic targets [99]. To find virulency, the fifteen predicted protein targets of all pathogens were subject to BLASTp against the core data set (R1) of the VFDB. All target proteins were virulent except twin-arginine translocase subunit (TatC) from E. chaffeensis str. Arkansas ( Table 2). The VFs inherited properties required for bacteria to adhere, colonize, invade, and conquer the host defense system and, thus, are considered as potential drug targets and vaccine candidates [100].

Screening of Essential, Non-Homologous Target Proteins Versus Gut Metagenome and Secondary Structure Analysis
The beneficial microbes that reside in the human digestive tract constitute gut microbiota. There are trillions of microbes that reside symbiotically in a human intestine [101,102]. These microbes contribute to ferment undigested carbohydrates and produce energy, preventing harmful species growth, and enhance the functions of the immune system in the residing host [101]. To exclude those proteins found in human gut flora, TPBs proteins were subject to BLASTp against the human microbiome project database. After metagenomics, eleven protein targets were found to have no similarity with the gut metagenome of the host and were considered as final target proteins. The eleven target proteins included: twin-arginine translocase subunit TatC, aspartate kinase, UDP-N-acetyl muramate-l-alanine ligase, preprotein translocase subunit SecG, preprotein translocase subunit SecY, UDP-N-acetylmuramoylalanyl-d-glutamate-2,6-diaminopimelate ligase, preprotein translocase subunit SecY, chromosomal replication initiator protein DnaA, aspartate-semialdehyde dehydrogenase, UDP-N-acetylmuramoyl-tripeptide-d-alanyl-d-alanine ligase, and flagellar protein FLiS (Table 2). These essential, non-host homologous, and virulent target proteins can be used as potential drug targets and vaccine candidates.
The predicted secondary structure drawn using SOPMA revealed the percentage of the α-helix, extended strand, β-turn, and random coil in each target protein ( Table 3). The confidence of prediction observed throughout the predicted secondary structures was high, and a high percentage of α-helices was found in most of the target proteins. For instance, the α-helices contents were found to be 59.31% in FLiS protein. Most of the transmembrane proteins, especially those present in the cytoplasmic membrane, are solely constituted by α-helices. The extended strands or beta-sheets linked to the α-helices may construct the external transmembrane regions, thus providing stability to these proteins [103][104][105][106].

Phylogenetic Analysis
The phylogenetic relationship is crucial for understanding the evolution and background history of various proteins. Nearly all proteins have structural similarities with other proteins and in some cases, share a common evolutionary origin. To determine the evolutionary relationship of the predicted vaccine candidate (FLiS protein), a neighbor-joining tree was constructed [65] which showed a 91% bootstrapping support value (Figure 3). All sequences were clustered together which suggested that this protein is highly conserved among various strains of B. burgdorferi and may play a functional role in pathogen survival, propagation, transmission, and pathogenesis. Further, the FLiS protein is present in TBP (B. burgdorferi B31) and other pathogens; it may serve as a universal vaccine by eliciting an immune response against several infectious agents [107,108].

Phylogenetic Analysis
The phylogenetic relationship is crucial for understanding the evolution and background history of various proteins. Nearly all proteins have structural similarities with other proteins and in some cases, share a common evolutionary origin. To determine the evolutionary relationship of the predicted vaccine candidate (FLiS protein), a neighbor-joining tree was constructed [65] which showed a 91% bootstrapping support value (Figure 3). All sequences were clustered together which suggested that this protein is highly conserved among various strains of B. burgdorferi and may play a functional role in pathogen survival, propagation, transmission, and pathogenesis. Further, the FLiS protein is present in TBP (B. burgdorferi B31) and other pathogens; it may serve as a universal vaccine by eliciting an immune response against several infectious agents [107,108].

Characterization of Drug Targets and Vaccine Candidates
Among the twelve targets after cellular localization, five proteins were cytoplasmic, five inner membranes and only one protein was each outer membrane and extracellular. In the adhesion and invasion mechanism during the host-pathogen interaction, outer membrane proteins played a significant role in invading a host cell and entering the tissue [21]. Comparatively, it was evident from previous reports that outer membrane proteins are vaccine candidates and that cytoplasmic proteins are drug targets [21,109]. It is well known that exported proteins are the prominent molecules of interaction with cells infected by pathogens; therefore, they are potential candidates for vaccine targets [110][111][112][113][114]. The antigenicity and allergenicity analysis of target proteins revealed that eight among them were antigenic while the remaining four proteins were non-antigenic and all the target proteins were non-allergen ( Table 2). The FLiS protein in B. burgdorferi B31 has several antigenic epitopes having the potential as a vaccine candidate. The extracellular protein (FLiS; UniProt ID:

Characterization of Drug Targets and Vaccine Candidates
Among the twelve targets after cellular localization, five proteins were cytoplasmic, five inner membranes and only one protein was each outer membrane and extracellular. In the adhesion and invasion mechanism during the host-pathogen interaction, outer membrane proteins played a significant role in invading a host cell and entering the tissue [21]. Comparatively, it was evident from previous reports that outer membrane proteins are vaccine candidates and that cytoplasmic proteins are drug targets [21,109]. It is well known that exported proteins are the prominent molecules of interaction with cells infected by pathogens; therefore, they are potential candidates for vaccine targets [110][111][112][113][114]. The antigenicity and allergenicity analysis of target proteins revealed that eight among them were antigenic while the remaining four proteins were non-antigenic and all the target proteins were non-allergen ( Table 2). The FLiS protein in B. burgdorferi B31 has several antigenic epitopes having the potential as a vaccine candidate. The extracellular protein (FLiS; UniProt ID: O51500_BORBU, accession no, NP_212684.1, KEGG ID: BBU02040) was found with a high antigenic score of 0.42 as well as human non-allergen. Potential T-cell epitopes were predicted within the FLiS protein for the prediction of an epitope-based subunit vaccine. The molecular weight of the FliS protein was 16.45 kDa, while the theoretical pI was measured as 9.20 which indicated that this protein should have a negative charge. The half-life of the vaccine was expected to be more than 10 h in E. coli in vivo. The estimated rate of extinction and an aliphatic index were 15,470 and 115.66, respectively. The protein's computed GRAVY value was −0.253 while the index of instability (34.07) classified the protein as stable. The results of predicted B-cell epitopes of FLiS protein are shown in (Figure 4).

Homology Modeling and Molecular Dynamic Simulation
Structure similarity search for B. burgdorferi B31 protein UDP-n-acetylmuramoyl-tripeptide-D-alanyl-D-alanine ligase showed a 27% identity with crystal structure of unliganded CH59UA, the inferred unmutated ancestor of the RV144 anti-HIV antibody lineage producing CH59 Protein Data Bank (PDB ID 4QF5.1) ( Figure 5I) and FLiS showed a 33.3% identity with a flagellar export chaperone in complex with its cognate binding partner from Aquifex aeolicus (PDB ID 1ORY.1) ( Figure 6E). The A. phagocytophilum HZ protein preprotein translocase subunit SecY showed a 43.36% identity with crystal structure of the TEPC15-Vk45.1 anti-2-phenyl-5-oxazolone NQ16-113.8 scFv in complex with phOxGABA (PDB ID 3J45.1) ( Figure 5H), chromosomal replication initiator protein DnaA showed 35.53% identity with AMPPCP-bound DnaA from A. aeolicus (PDB ID 2HCB.1) ( Figure 5D), and aspartate-semialdehyde dehydrogenase showed 31.68% identity with aspartate semialdehyde dehydrogenase complexed with glycerol and sulfate from Mycobacterium tuberculosis H37Rv (PDB ID 3VOS) ( Figure 5F). The E. chaffeensis str. Arkansas protein twin-arginine translocase subunit TatC showed 34.62% identity with twin arginine translocase receptor-Tatc In DDM from A. aeolicus (PDB ID 4HTT.1) ( Figure 5C) and aspartate kinase showed 44.31% identity with aspartate kinase from Synechocystis species (PDB ID 3L76.1) ( Figure 5K). The R. rickettsii str. "Sheila Smith" protein cytochrome d ubiquinol oxidase subunit II showed 12% identity with alternative complex iii from Rhodothermus marinus (PDB ID 6F0K.1) ( Figure 5E). The F. tularensis SCHU S4 protein preprotein translocase subunit SecG showed a 52.11% identity with quaternary complex between SRP, SR, and SecYEG bound to the translating ribosome from E. coli (PDB ID 5NCO.1) ( Figure 5A Phi dihedral angles for FLiS model revealed that high residues lie in favored regions as compared to allowed regions ( Figure 6C). The details of the binding site and binding site residues of vaccine candidate (FliS protein) are shown in (Table 4).   Karplus and Schulz flexibility prediction (E). Kolaskar and Tongaonkar antigenicity (F). The x-axis and y-axis represent the sequence position and corresponding antigenic properties score, respectively. The threshold level was set as default parameter of the server. The regions shown in yellow color above the threshold value were predicted as B-cell epitope. and UDP-N-acetylmuramoylalanyl-D-glutamate-2, 6-diaminopimelate ligase showed a 29.94% identity with Staphylococcus aureus MurE with UDP-MurNAc-Ala-Glu-Lys and ADP (PDB ID 4C12) ( Figure 5G). A total of five models were generated for every drug target and vaccine candidate. However, molecular dynamics simulation was done only for the predicted vaccine candidate. The generated model of the vaccine candidate, FLiS protein (B. burgdorferi B31), was validated, and the evaluation of Psi and Phi dihedral angles for FLiS model revealed that high residues lie in favored regions as compared to allowed regions ( Figure 6C). The details of the binding site and binding site residues of vaccine candidate (FliS protein) are shown in (Table 4).  With constant improvement in algorithm design for simulations, MD simulations have played an essential role in the development of novel therapeutics [115]. The FLiS structure was simulated in an explicit water environment for 20 ns. The deviation of the backbone atoms was examined by the root-mean-square deviation (RMSD). Consequently, the results of the backbone deviation relative to the original structures revealed that the simulation time of 20 ns is enough to reach the equilibration at temperature 298 K. It was observed from the RMSD graphs that the FLiS system initially behaves systematically steady (~3 Å) till 6 ns. However, this steady behavior dramatically increased and then With constant improvement in algorithm design for simulations, MD simulations have played an essential role in the development of novel therapeutics [115]. The FLiS structure was simulated in an explicit water environment for 20 ns. The deviation of the backbone atoms was examined by the root-mean-square deviation (RMSD). Consequently, the results of the backbone deviation relative to the original structures revealed that the simulation time of 20 ns is enough to reach the equilibration at temperature 298 K. It was observed from the RMSD graphs that the FLiS system initially behaves systematically steady (~3 Å) till 6 ns. However, this steady behavior dramatically increased and then oscillated around (~4 Å) until 20 ns ( Figure 6A). To understand the effect of specific residues in the FLiS system, we analyzed the root-mean-square fluctuations (RMSF) ( Figure 6D). The results revealed a high fluctuation in some residues (residues 38-45 and 65-78) which suggested that these residues might play a crucial role in flagellin recognition [116]. The compactness of the system was analyzed through radius of gyration (RoG) during MD simulation which showed high compactness during 7 ns, while showing local compactness afterward ( Figure 6B). These results delineate that the FLiS protein possesses a highly dynamic N-terminal region, which is appended to the standard four-helix bundle structure, and further indicates that the FLiS could be used as a potential vaccine candidate against TBPs.

Conclusions and Future Directions
Subtractive proteomics is a rapid approach for the screening of drug targets and vaccine candidates against a pathogen provided both the pathogen and host proteomes are available. We applied a subtractive proteomics approach to find essential and non-host homologous protein targets in the proteome of TBPs which can be used as potential drug targets and vaccine candidates. Further analysis of shortlisted targets, such as different metabolic pathways proteins, subcellular localization of targets, antigenicity, allergenicity, and druggable properties, revealed eleven drug targets (cytoplasmic proteins) and one vaccine candidate (membrane-bound protein). Inhibiting proteins involved in these metabolic pathways will increase the susceptibility of TBPs to various drugs. The identified FliS protein has immunogenic and allergenic potential, and further studies on various aspects of this protein will help in understanding its diverse functions, development of a suitable vaccine against TBPs, and treatment of allergenic diseases caused by TBPs. This study will facilitate the development of drug targets and vaccine candidates against TBPs and may play a role in the prediction of targets against other pathogens. Furthermore, the proposed vaccine needs to be validated experimentally in an animal model by effective immunological methods to ensure the control of TBPs.