Identification of Potential Druggable Targets and Structure-Based Virtual Screening for Drug-like Molecules against the Shrimp Pathogen Enterocytozoon hepatopenaei

Enterocytozoon hepatopenaei (EHP) causes slow growth syndrome in shrimp, resulting in huge economic losses for the global shrimp industry. Despite worldwide reports, there are no effective therapeutics for controlling EHP infections. In this study, five potential druggable targets of EHP, namely, aquaporin (AQP), cytidine triphosphate (CTP) synthase, thymidine kinase (TK), methionine aminopeptidase2 (MetAP2), and dihydrofolate reductase (DHFR), were identified via functional classification of the whole EHP proteome. The three-dimensional structures of the proteins were constructed using the artificial-intelligence-based program AlphaFold 2. Following the prediction of druggable sites, the ZINC15 and ChEMBL databases were screened against targets using docking-based virtual screening. Molecules with affinity scores ≥ 7.5 and numbers of interactions ≥ 9 were initially selected and subsequently enriched based on their ADMET properties and electrostatic complementarities. Five compounds were finally selected against each target based on their complex stabilities and binding energies. The compounds CHEMBL3703838, CHEMBL2132563, and CHEMBL133039 were selected against AQP; CHEMBL1091856, CHEMBL1162979, and CHEMBL525202 against CTP synthase; CHEMBL4078273, CHEMBL1683320, and CHEMBL3674540 against TK; CHEMBL340488, CHEMBL1966988, and ZINC000828645375 against DHFR; and CHEMBL3913373, ZINC000016682972, and CHEMBL3142997 against MetAP2.The compounds exhibited high stabilities and low binding free energies, indicating their abilities to suppress EHP infections; however, further validation is necessary for determining their efficacy.


Introduction
Enterocytozoon hepatopenaei (EHP) infections in Pacific white shrimp, Litopenaeusvannamei, have become a pandemic in recent years [1]. The pathogen was first discovered and isolated from the hepatopancreatic tissues of black tiger shrimp, Penaeus monodon, in Thailand in 2009 [2]. EHP infections were subsequently reported in China, India, Vietnam, Indonesia, Malaysia, Venezuela, and Australasia [1,3,4]. EHP is an intracellular spore-forming parasite that germinates within the epithelial cells of the hepatopancreas and is transmitted horizontally [1,5]. The presence of EHP spores in the hepatopancreas and midgut indicates an oral route of transmission, which could either result from immersion, swallowing pathogeninfected water, active cannibalism, or the consumption of infected shrimp [3,5]. A recent analysis of the 3. 26 Mbp EHP genome revealed that the pathogen could not generate ATP either viaglycolysis or oxidative phosphorylation [6,7]. EHP, therefore, obtains ATP and other purine nucleotides for energy and biosynthesis from their host by using nucleotide transport (NTT) proteins, which are acquired via lateral gene transfer [8].
Although EHP infections do not cause significant mortality in shrimp, the pathogen dramatically retards the growth of penaeid shrimp and is therefore emerging as a critical

Identification of Pockets Based on Conserved Amino Acid Residues in Active Sites
The sequence homologs of the query proteins were determined with BLASTp against the Protein Data Bank (PDB). EHP AQP (A0A1W0E445) had 95% query coverage and 29% identity with the AQP protein of Arabidopsis thaliana (PDB ID: 5I32), and 98% query coverage and 26% identity with human AQP (PDB ID: 1FQY). Residues Phe22, Asn69, Pro70, Glu138, Gly197, and Asn201 were found to be conserved in EHP AQP ( Figure S1). Similarly, the CTP synthase of EHP (A0A1W0E736) showed 97% and 96% query coverage with the CTP synthase proteins of Drosophila melanogaster (PDB ID: 6L6Z) and human (PDB ID: 5U03), respectively, and 29% and 27% identity, respectively. Residues Lys24, Tyr295, Cys377, Phe351, and Arg453 were conserved in the CTP synthase of EHP ( Figure S2). The   Table 1. Identification of binding pockets in the five druggable protein targets of EHP and generation of grid box for molecular docking studies. Table 1. Identification of binding pockets in the five druggable protein targets of EHP and generation of grid box for molecular docking studies. Table 1. Identification of binding pockets in the five druggable protein targets of EHP and generation of grid box for molecular docking studies.

Analysis of Protein-Ligand Interactions
The intermolecular interactions between the selected compounds and the binding sites of the target proteins are depicted in Figure 2. The results demonstrated that hydrogen bonds, van der Waals interactions, and carbon-hydrogen bonds played significant roles in maintaining complex stability. The intermolecular interactions were analyzed from the best docked pose for each molecule. Analysis of the ligand interactions of AQP revealed that all the five compounds, namely, CHEMBL3703838, ZINC000002243083, CHEMBL133039, CHEMBL3140193, and CHEMBL2132563, were stabilized via hydrogen bonds with different amino acids atthe binding site ( Table 3). The calculated hydrogen bond distances ranged from 2.3 to 2.9 Å. ZINC000002243083 and CHEMBL3140193 formed the highest numbers of hydrogen bonds with the binding site. Further analysis revealed that three compounds, CHEMBL3703838, CHEMBL133039, and CHEMBL3140193, formed π-π stacking interactions with Thr121, Lys34, and Thr33, respectively (Figure 2A-E). The compounds were stabilized by the surrounding residues via various protein-ligand interactions, including hydrogen bonds, van der Waals interactions, and electrostatic interactions. The binding affinities of the 5 compounds (CHEMBL48494, CHEMBL1162979, CHEMBL133039, CHEMBL525202, and CHEMBL1091856) selected against CTP synthase ranged from−9.6 to −7.6 kcal/mol ( Table 3). Analysis of the protein-ligand interactions revealed that all the compounds were stabilized in the binding pocket of CTP synthase via hydrogen bonds. The calculated hydrogen bond distances ranged from 1.7 to 2.9 Å. CHEMBL1091856 formed the highest number of hydrogen bonds, followed by CHEMBL1162979 and CHEMBL525202. With the exception ofCHEMBL525202, all the compounds formed π-π stacking interactions with the receptors. CHEMBL48494, CHEMBL1091856, and ZINC000219968783 formed π-π interactions with Phe351, while CHEMBL133039 formed three π-π interactions with Phe351, Arg453, and Tyr302 ( Figure 2F-J).
A total of 5 compounds, CHEMBL3674540, CHEMBL1683320, CHEMBL391279, ZINC000031750813, and CHEMBL4078273, were similarly selected against TK, with binding affinities ranging from −9.5 to −8.1 kcal/mol (Table 3). All the compounds were stabilized in the binding pocket via hydrogen bonds, and the calculated hydrogen bond distances ranged from 1.8 to 3.0 Å. CHEMBL1683320 formed the highest number of hydrogen bonds, followed by CHEMBL391279 and ZINC000031750813. Analysis of the docked poses of compounds CHEMBL391279, ZINC000031750813, and CHEMBL4078273 revealed additional π-π stacking interactions with Arg45 and Arg46 ( Figure 2K-O).
A total of five compounds, namely, CHEMBL56533, ZINC000016682862, ZINC000828645375, CHEMBL3901573, and CHEMBL108166, were selected against EHP DHFR. All the compounds formed hydrogen bonds with different amino acids and were stabilized in the binding pocket (Table 3). The calculated hydrogen bond distances ranged from 1.7 to 3.0 Å. CHEMBL3901573, ZINC000828645375, CHEMBL108166, and ZINC000016682862 formed additional π-π stacking interactions with different amino acids. CHEMBL3901573 and CHEMBL108166 formed π-π interactions with Met55 and Arg150, respectively. ZINC000828645375 formed three π-π interactions with Met55, Phe52, and Tyr79. ZINC000016682862 formed two π-π stacking interactions with Tyr79 and Phe52. CHEMBL56533 formed the highest number of hydrogen bonds, followed by CHEMBL108166 ( Figure 2P-O). Similarly, a total of five compounds, namely, CHEMBL3913373, CHEMBL1962731, CHEMBL3142997, ZINC000199197855, and ZINC000016682972, were selected against MetAP2 using docking-based VS (Table 3). All the compounds formed hydrogen bonds with different amino acids and were stabilized in the binding pocket of the receptor. The calculated hydrogen bond distances ranged from 1.8 to 2.9 Å. CHEMBL3913373 and ZINC000199197855 formed the highest number of hydrogen bonds. CHEMBL3913373, CHEMBL1962731, ZINC000199197855, and ZINC000016682972 formed π-π interactions with different amino acids in the binding pocket. CHEMBL3913373 formed a single π-π stacking interaction with His92. However, CHEMBL1962731, ZINC000199197855, and ZINC000016682972 formed two π-π interactions with His92 as well as His202, Tyr304, and Phe80, respectively ( Figure 2U-Y). A total of 5 compounds, CHEMBL3674540, CHEMBL1683320, CHEMBL391279, ZINC000031750813, and CHEMBL4078273, were similarly selected against TK, with binding affinities ranging from −9.5 to −8.1 kcal/mol (Table 3). All the compounds were stabilized in the binding pocket via hydrogen bonds, and the calculated hydrogen bond distances ranged from 1.8 to 3.0 Å. CHEMBL1683320 formed the highest number of hydrogen bonds, followed by CHEMBL391279 and ZINC000031750813. Analysis of the docked poses of compounds CHEMBL391279, ZINC000031750813, and CHEMBL4078273 revealed additional π-π stacking interactions with Arg45 and Arg46 ( Figure 2K-O).

MD Simulations of Protein-Ligand Complexes
MD is a powerful computational method for predicting and analyzing the stabilities of protein-ligand complexes and for studying atomic movements with respect to a macromolecule. The stabilities and behaviors of the protein-ligand complexes were analyzed in a dynamic environment based onroot-mean-square deviation (RMSD), root-mean-square fluctuation (RMSF), the radius of gyration (Rg), and molecular mechanics/generalized Born surface area (MM/GBSA) energy. The simulation trajectories were subjected to a principal component analysis (PCA), and a cross-correlation matrix of the resultant MD trajectories was also constructed and analyzed.

RMSD Values of the Cα Backbone of Target Proteins
The RMSD values of the protein-ligand complexes were plotted graphically for understanding the structural stability and integrity of the complexes. The results demonstrated that the average RMSD values of the 5 compounds selected against AQP ranged from 1.84 to 3.68 Å. Compounds CHEMBL3703838, ZINC000002243083, and CHEMBL3140193 showed high stabilities during the simulation, with an RMSD fluctuation of 0.5 Å ( Figure 3A, Table S5). Similarly, CHEMBL48494, CHEMBL133039, CHEMBL525202, and CHEMBL1091856 formed stable complexes with CTP synthase, with the average RMSD values ranging from 1.90 to 2.49 Å ( Figure 4A, Table S5). The average RMSD values for the Cα backbone of TK complexed with the 5 compounds ranged from 2.13 to 6.96 Å ( Figure 5A, Table S5). The average RMSD value of ZINC000031750813 (6.96 Å) was higher than that of the four other compounds selected against TK (Table S5). With the exceptions of CHEMBL3901573 and ZINC000016682862, the average RMSD values of the three other compounds selected against DHFR indicated stable bindings ( Figure 6A, Table S5). The average RMSD fluctuation of these compounds ranged from 2.97 to 4.22 Å. The trajectory analysis revealed that all the 5 compounds complexed with MetAP2 remained stable throughout the 100 ns simulation, with the average RMSD values of the compounds ranging from 1.21 to 1.37 Å ( Figure 7A, Table S5).

RgValues of the Cα Backbone of Target Proteins
The compactness of the protein-ligand complexes during the simulation was determined by measuring the values of Rg. The average values of Rg for the 5 compounds complexed with AQP ranged from 17.89 to 18.30 Å ( Figure 3B, Table S6). Similarly, the average Rg values of the compounds complexed with CTP synthase ranged from 25.11 to 25.42 Å ( Figure 4B, Table S6). With the exception of ZINC000031750813 (19.66 Å), all the 4 compounds selected against TK formed stable complexes with the receptor, with the average Rg values ranging from 17.89 to 18.05 Å ( Figure 5B, Table S6). The average Rg values of the 5 compounds selected against DHFR ranged from 16.98 to 17.48 Å. With the exceptions of ZINC000016682862 and CHEMBL3901573, the three compounds selected against DHFR formed stable complexes with the receptor ( Figure 6B, Table S6). All five of the compounds complexed with MetAP2 formed tightly packed, stable structures with the receptor ( Figure 7B, Table S6).

RMSF Values of the Cα Backbone of Target Proteins
The average atomic mobility of the protein backbone during the MD simulations was measured using the values of RMSF. The average RMSF values of the 5 compounds complexed with AQP ranged from 0.92 to 1.53 Å ( Figure 3C, Table S7). Further analysis revealed that residues 160-170 of AQP underwent fluctuations when complexed with CHEMBL3703838, CHEMBL133039, and CHEMBL2132563. Residues Thr33, Lys34, Gly46, Leu119, Ile120, Ala123, Ser193, Ser196, Gly197, Gly198, and Ala 199 of AQP, which were primarily involved in the formations of ligand-protein hydrogen bonds, underwent minimal fluctuations for all the five compounds (Table S7). Similarly, the average RMSF values of CTP synthase when complexed with the 5 compounds ranged from 0.92 to 1.32 Å ( Figure 4C, Table S7). Residues Ala59, Lys63, Glu64, Ile65, Val301, Tyr295, Tyr302, Gly349, Phe351, Arg453, and Glu501, which were crucial for the formations of protein-ligand hydrogen bonds and π-π stacking interactions, remained stable throughout the simulation (Table S7). The average RMSF values of the 5 compounds complexed with TK ranged from 1.03 to 2.85 Å ( Figure 5C, Table S7). However, the average RMSF value of ZINC000031750813 (2.85 Å) was higher than that of the 4 other compounds selected against TK. For all five of the compounds, residues 205-215 of TK underwent fluctuations during the simulation. However, the residues that were crucial for the formations of ligand-receptor hydrogen bonds and π-π stacking interactions, including Pro10, Ser12, Cys13, Gly14, Lys15, Thr16, Phe42, Arg45, Tyr46, Glu86, Phe89, and Gly184, remained stable throughout the simulation (Table S7). The average RMSF values of the 5 compounds complexed with DHFR ranged from 1.21 to 1.67 Å ( Figure 6C, Table S7), while the average RMSF values of MetAP2 complexed with the selected 5 compounds ranged from 0.68 to 0.79 Å (Table S7). Additionally, residues 16-23, 175-184, and 248-258 underwent fluctuations during the simulation. However, the amino acid residues that mediated the formations of protein-ligand hydrogen bonds and π-π stacking interactions remained stable during the simulation ( Figure 7C, Table S7).

Dynamic Cross-Correlation and PCA
The cross-correlation matrix of the resulting trajectories was analyzed for understanding the dynamical correlations of conformational motion of the protein-ligand complexes. The positive regions of the matrix are associated with the strongly correlated motions of residues moving in the same direction, while negative regions are linked with anticorrelated movements. The correlations of motion of the residues of AQP complexed with the five ligands were generated and displayed as a correlation matrix, depicted in Figure S6K-O. Deeper colors indicate more positively or negatively correlated motions between the structural patterns. The pink blocks displayed in the figure indicate residues with highly correlated movements, while the green blocks indicate the least correlation. Residues 10-100, 125-150, and 175-200 of AQP complexed with CHEMBL133039, CHEMBL3140193, and CHEMBL3703838 exhibited concerted movements. These residues included amino acids that formed the binding pocket of AQP. The MD simulation trajectories of AQP complexed with the five compounds were subjected to a PCA, and the results are depicted in Figure S6P-T. A dynamic cross-correlation matrix of CTP synthase complexed with the five compounds was similarly generated ( Figure S7K-O). Analysis of the resulting matrix revealed that residues 1-100, 150-250, and 275-400 of CTP synthase complexed with CHEMBL525202, CHEMBL1091856, and CHEMBL1162979 exhibited considerable correlated movements. The results further indicated that 72% of the amino acids in the 275-400 residue stretch were involved in the formation of the ligand-binding pocket. The dynamical cross-correlated maps of TK complexed with the five different compounds are graphically presented in Figure S8K-O. Residues 1-50, 60-140, and 150-200 of TK exhibited a strong positive correlation when complexed with ZINC000031750813, CHEMBL3674540, and CHEMBL4078273, but the correlation decreased when bound to CHEMBL391279 and CHEMBL1683320. The cross-correlation matrix of DHFR complexed with five compounds are depicted in Figure S9K-O. The cross-correlation matrix of MetAP2 complexed with the five compounds revealed a strong correlation for residues 1-150 ( Figure S10K-O). However, residues 150-225 and 240-300 of MetAP2 exhibited reduced correlation when complexed with CHEMBL3913373. Notably, these regions comprised amino acids that formed the ligand-binding pocket of MetAP2.

Determination of Binding Free Energies of Protein-Ligand Complexes
The binding free energy represents the sum total of all the interaction energies, including the van der Waals energy, polar solvation energy, electrostatic energy, and solventaccessible surface area SASA energy. The binding free energies of all the complexes were estimated using the MM/GBSA approach ( Table 4). The binding free energies of the 5 compounds complexed with AQP ranged from −2.8214 to −36.0654 kcal/mol, of which CHEMBL3703838 (−36.0654 ± 2.6122 kcal/mol) had the lowest free energy of binding. Further analysis revealed that van der Waals interactions played a major role in stabilizing the docked complexes ( Figure S6A-E). The intermolecular protein-ligand interactions after the MD simulation are depicted in Figure S6F-J. The free energies of binding of the 5compounds complexed with CTP synthase ranged from −37.1662 to −56.5194 kcal/mol (Table 4), of which CHEMBL1091856 had the lowest binding free energy of −56.5194 ± 5.0206 kcal/mol. Further analysis revealed that van der Waals and electrostatic interactions played a major role in stabilizing the docked complexes ( Figure S7A-E). The intermolecular protein-ligand interactions of CTP synthase at the end of the production run are depicted in Figure S7F-J. The free energies of binding of the 5 compounds selected against TK ranged from −9.1083 to −33.5752 kcal/mol, of which CHEMBL4078273 had the lowest free energy of binding of −33.5752 ± 4.6211 kcal/mol ( Table 4). The interaction energies of all the five compounds selected against TK are depicted in Figure S8A-E, and the results demonstrated that van der Waals and electrostatic interactions played a major role in stabilizing the docked complexes. The intermolecular protein-ligand interactions of TK at the end of the production run are depicted in a 2D plot ( Figure S8F-J). The free energies of binding of the 5 compounds complexed with DHFR ranged from −28.6175 to −35.9711 kcal/mol, of which CHEMBL340488 had the lowest free energy of binding of −35.9711 ± 3.1254 kcal/mol ( Table 4). The van der Waals interactions between the compounds and DHFR played a major role in stabilizing the protein-ligand complexes ( Figure S9A-E). The intermolecular protein-ligand interactions of DHFR are depicted in a 2D plot ( Figure S9F-J).The free energies of binding of the 5 compounds selected against MetAP2 ranged from −28.6175 to −43.5796 kcal/mol (Table 4), of which CHEMBL3913373 had the lowest free energy of binding of −43.5796 ± 5.1314. The interaction energies of the five compounds complexed with MetAP2 are depicted in Figure S10A-E, and the intermolecular protein-ligand interactions of MetAP2 at the end of the production run are depicted in Figure S10F-J.

Discussion
The slow growth syndrome of shrimp caused by EHP infections is a major threat to sustainable shrimp farming in Asia at present. The disease has caused significant economic losses for shrimp farmers not only in Thailand but also in other parts of Southeast Asia since 2009. Shinn (2019) [15] calculated that EHP infections resulted in a loss of USD 387.9 to USD 555.8 million in 2019 owing to the reduction in harvest size from 18 g to 13 g and an increase in production costs of 23.2%. EHP infections have been recently spreading to new geographical locations in Korea [16], Venezuela [4], and Mexico, and there are no effective methods for limiting the negative effect of EHP infections on shrimp cultivation to date. In this study, we identified drug-like molecules against five potential druggable target proteins of EHP, namely, AQP, CTP synthase, DHFR, MetAP2, and TK, by employing structurebased VS and MD simulations. The selected molecules are biologically active compounds listed in the National Center for Advancing Translational Sciences (NCATS). Detailed information on these drug-like compounds was included in Supplementary Table S8. CTP synthase catalyzes the synthesis of cytidine 5 -triphosphate (CTP) from uridine 5 -triphosphate (UTP) in the final step of the production of cytidine nucleotides from both the de novo and uridine salvage pathways [17]. CTP synthase is precisely regulated by intracellular concentrations of CTP and UTP. As a consequence, CTP synthase activity regulates the intracellular rates of RNA, DNA, and phospholipid synthesis [18]; therefore, it has been selected as a target for the development of drugs. CTP is not only a building block for nucleic acids but is also required for protein glycosylation [19], lipid biogenesis [20], and cellular communication [21]. Inhibition of the CTP synthase enzyme of T. gondii disrupts the lytic cycle [17]. CTP synthase is also a potential target for drug discovery against T. brucei [22], malaria [23], and giardiasis [24]. A structural analysis revealed that CTP synthase comprises an ammonia ligase (AL) domain and a glutamine amidotransferase (GAT) domain and also catalyzes the final step of de novo CTP biosynthesis [25]. The GAT domain catalyzes the hydrolysis of glutamine to yield NH 3 , which is subsequently utilized for the ATP-dependent conversion of UTP to CTP. A structural analysis of Drosophila CTP synthase (dmCTP synthase) revealed that the guanine base of GTP interacts with the Leu107 and Leu444 of another monomer. Further studies revealed that GTP forms three hydrogen bonds with Arg481. In addition, the π-π interaction between the guanine ring and Phe373 also contributes to the binding. The ribose ring of GTP interacts with Phe50 and Arg479 via hydrogen bonds. The triphosphate moiety of GTP also forms three hydrogen bonds with Lys306, Tyr307, and Arg376. The binding of glutamine via interactions with Phe37 and Cys399 further stabilizes the binding of GTP [25]. A comparison of the sequence and structure of EHP CTP synthase with dmCTP synthase resulted in the identification of similar important amino acid residues, including Tyr295, Phe351, Cys377, and Arg453, in the binding pocket of EHP CTP synthase. The results of the molecular docking revealed that all five of the compounds interacted with Phe351, Arg453, and Tyr295 via hydrogen bonds and π-π interactions in the binding pocket, which are equivalent to residues Phe373, Arg481, and Tyr307, respectively, of dmCTP synthase. Further analysis revealed that three compounds, namely, CHEMBL1091856, CHEMBL1162979, and CHEMBL525202, had the lowest free energies of binding with EHP CTP synthase. The compound 6-diazo-5-oxo-Lnorleucine (DON) is a glutamine analog that inhibits the functions of CTP synthase. DON interacts with Phe373 and Cys399 and covalently binds to the active site of glutaminase [25]. Potent inhibitors of the CTP synthase enzymes of Plasmodium sp., Trypanosoma sp., and Toxoplasma sp., includingα-amino-3-chloro-4,5-dihydro-5-isoxazoleacetic acid, glutamate γ-semialdehyde, azaserine, and 2,4-diaminopentanedioic acid, have been identified and reported in previous studies [22,26,27].
The DHFR catalyzes the NADPH-dependent reduction of dihydrofolate (DHF) to tetrahydrofolate (THF). Tetrahydrofolates serve as co-factors for the synthesis of purines, pyrimidine (thymidylate), and for the re-methylation of homocysteine to methionine. Reduction in DHFR enzymatic activity diminishes the THF pool inside the cell, which slows DNA synthesis and cell proliferation, eventually leading to cell death [28,29]. DHFR is a potential druggable target owing to its metabolic importance. DHFR inhibitors are commonly used for the treatment of malaria [11] and other protozoal infections, including T. cruzi [30] and Cryptosporidium hominis [30]. Several classes of compounds with potential antifolate activity are used in the treatment of cancer and rheumatoid arthritis (methotrexate, MTX), bacterial DHFR enzyme (trimethoprim, TMP), and the DHFR of P. falciparum (pyrimethamine, PYR). Although pyrimidine biosynthesis occurs ubiquitously in EHP and shrimp, the DHFR enzyme is divergent enough to allow the design of inhibitors specific to EHP DHFR. In this study, we identified five compounds with high binding affinity to the folate binding pocket of EHP DHFR. Of these compounds, ZINC000016682972 had the lowest free energy of binding and exhibited high stability with EHP DHFR. Further analysis revealed that MTX and TMP bind to the same region of DHFR, which was selected as the druggable site for docking-based VS against EHP DHFR in this study. DHFR inhibitors, including riluzole, 6-(trifluoromethoxy)-1,3-benzothiazol-2-ylamine, 7-((2-thiazol-2-yl)benzimidazol-1-yl)-2,4 diamino quinazoline, and proguanil, which specifically bind to the folate-binding pocket of DHFR, have been identified for suppressing infections caused by L. major, T. brucei, Staphylococcus aureus, and the malarial parasite, respectively [31][32][33].
MetAP2 catalyzes the removal of N-terminal methionine residues from nascent proteins [34]. Inhibition of MetAP activity could therefore affect protein biological activity and proper cellular localization and turnover, and result in interference with cell signal transduction and cell-cycle progression [35]. The microsporidia contain only MetAP2 in their genome which makes microsporidia MetAP2 an essential target for designing therapeutic agents for microsporidiosis [36,37]. The MetAP2 protein has been identified as a potential target for the treatment of infections caused by E. bieneusi, E. cuniculi, Entamoeba histolytica, and Vittaforma corneae [37][38][39]. A gene-encoding MetAP2 is also present in EHP (EhpMetAP2), and EhpMetAP2 shares 51% identity with the MetAP2 protein of E. cuniculi (EcMetAP2). A structural analysis of EcMETAP2 revealed that it has a metal-binding domain and a methionine-binding domain. Residues Asp130, Asp141, His210, Glu243, and Glu339, which are responsible for coordinating Fe 2+ ions, were found to be completely conserved in all the MetAP protein sequences, including EcMetAP2.
The residues comprising the methionine-binding pocket include Phe97, Pro98, His109, Ile217, and His218 inthe peripheral loops and His261, Val263, Pro292, and Tyr324 in the subdomain. Only two residues, His109 and His218, are completely conserved across the entire MetAP protein family [38]. The methionine-binding domain identified in EhpMetAP2 comprised similar conserved amino acid residues, including Phe80, Pro81, His92, Ile201, His202, His244, Pro275, and Tyr304, which formed a druggable pocket that was targeted with docking-based VS in this study. The results of molecular docking revealed that all five of the compounds formed hydrogen bonds with His92 and His202, which are equivalent to His109 and His218 of EcMetAP2. Further analysis revealed that CHEMBL3913373, CHEMBL1962731, and ZINC000199197855 formed π-π interactions with His92, His202, and Tyr304. The free energies of binding of ZINC000016682972, CHEMBL3142997, and CHEMBL1962731 with EhpMetAP2 were the lowest among all the selected compounds.
The TK key enzyme catalyzes the transfer of the γ-phosphate of ATP to 2 -deoxythymidine (dThd), forming thymidine monophosphate (dTMP) via the salvage pathway in microsporidia [42]. Inhibition of TK causes severe dTTP depletion that leads to the massive incorporation of uracil into DNA and contributes to the phenomenon called "thymineless death" [43]. TK is a potential druggable target of L. major [44] and T. brucei [45]. The gene encoding the TK enzyme is also present in EHP. Human TK1 (hTK1) is the most studied type II enzyme that consists of a Zn-binding domain, an ATP-binding domain, a 20-dThdthymidine-binding domain, and a lasso domain. The primary sequence of EHP TK (EhpTK) significantly differs from the human TK (hTK) enzyme. However, a sequence-based comparison of the active sites of EhpTK and HuTK1 revealed that the majority of amino acids surrounding the deoxyribonucleoside moiety are identical in both enzymes. Tyr163 of HuTK1 is part of a hydrophobic pocket that surrounds the 5-methyl group of thymine. The 5-methyl group is also surrounded by residues Leu124, Tyr181, and Met28, which are equivalent to Thr153, Leu114, Tyr189, and Val11 of EhpTK. The ribose moiety of dThd is stabilized via interactions with Asp88 and Glu98, which are equivalent to residues Asp43 and Glu86 of EhpTK [46]. Glu98, which is essential for catalysis, forms a hydrogen bond with the 5 -oxygen of the ribose ring, and is well-placed to act as the catalytic base for abstracting a proton from the oxygen. This enables the oxygen atom to perform a nucleophilic attack on the γ-phosphate of the phosphate donor [44,47]. The results of molecular docking revealed that all the compounds bound stably to the binding pocket via hydrogen bonds with Glu86 and Arg45. However, CHEMBL4078273, CHEMBL1683320, and CHEMBL3674540 had the lowest free energies of binding and formed highly stable complexes with EhpTK. Several thymidine analogs, including aurantiamide acetate, zidovudine, stavudine, azidothymidine, and 3-trifluoromethyl-4-chloro-phenyl-urea-α-thymidine, have been previously identified for the treatment of infections caused by P. falciparum, Giardia intestinalis, and T. cruzi [48][49][50].
AQPs are transmembrane channels that transport water and/or small solutes, such as glycerol, nitrates, and urea, across cellular membranes. A total of 13 AQP isoforms (AQP0 to AQP12) have been identified in humans [51]. Of these, AQP3, AQP7, AQP9, and AQP10 are aquaglyceroporins that facilitate the transport of glycerol and other small neutral solutes such as urea, ammonia, and carbon dioxide [52]. The solute selectivity of AQPs is determined by two-channel sections, namely, the conserved asparagine-prolinealanine (NPA) region and the aromatic/arginine (ar/R) constriction [53]. A comparison of the sequences and structures of the AQP protein of EHP and human AQP1 revealed the presence of a conserved NPA region in the AQP protein of EHP. However, the alignment revealed that an arginine (Arg195) was substituted for an isoleucine (Ile204) in the AQP protein of EHP, which clearly indicated that it belongs to the aquaglyceroporin subset. Arg195 is conserved in all members of the AQP superfamily that are selective for water transport [54]. It was postulated that aquaporin transports water inside spores, and as a result, the osmotic pressure quickly increases inside the spore which triggers the shooting out of its polar tube and transferring its sporoplasm into a host cell [55,56]. The inhibition of AQPs with HgCl 2 effectively inhibits the germination of Anncaliia algerae spores [57]. The AQP protein of A. algerae was, therefore, identified as a potential druggable target in an earlier study. [57]. In this study, we identified the conserved NPA region and the residues in the selective filter region of the AQP protein of EHP by comparing the sequence with those of human AQP1, AQP3, and AQP4 proteins. The binding pocket formed by these residues was subsequently targeted for screening drug molecules [58,59]. A similar region was selected in a study by Yadav et al. in 2020 [59] for screening drug molecules against the AQP3 protein of humans. The results demonstrated that all the docked poses formed several direct hydrogen bonds with important residues-including Asn60 and Arg218-while the backbone atoms of Gly145, Ala148, Gly207, Gly211, and Phe208 were involved in the formation of protein-ligand hydrogen bonds. Additionally, the majority of compounds formed π-π stacking interactions with the aromatic rings of Tyr150 and Phe208. Another study demonstrated that acetazolamide, a carbonic anhydrase inhibitor, reduces the water permeability of the AQP1 protein from the oocytes of Xenopus laevis by binding to a region similar to that identified in the present study [60].

Identification of Potential Druggable Target Proteins
The potential druggable protein targets of EHP were identified using two different approaches. The complete protein information of EHP was retrieved from the UniProt Knowledgebase and subjected to a functional enrichment analysis for identifying the candidate proteins that are involved in major metabolic pathways, using the online tools in DAVID for a KEGG enrichment analysis. The protein targets were finally selected using an approach similar to that used in earlier studies [61,62], which involved a thorough review of the literature published on microsporidians [17,22,26,37,38,45].The amino acid sequences of the five druggable protein targets of EHP, including AQP, CTP synthase, MetAP, DHFR, and TK, were retrieved from the UniProt Knowledgebase (https://www.uniprot.org/ accessed on 5 January 2022). The three-dimensional structures of all the five proteins were generated using AlphaFold 2 [63]. The stereochemical qualities of the protein models were further validated with the ProSA and PROCHECK modules of the PDBsum server [64].

Identification of Druggable Pockets
The druggable pockets in the five protein targets were identified using two separate approaches. The amino acid sequences of the five proteins were first used for identifying all the sequence homologs in the PDB with the NCBI BLAST server [65], using default parameters. The sequences of the closely related protein homologs were retrieved, and a multiple sequence alignment was performed for identifying the consensus and conserved residues in the ligand-binding sites of the proteins across the different members using CLC Workbench v8.5. software (Qiagen). The ligand-binding pockets were subsequently predicted using the CASTp server [66]. The appropriate target sites were finally selected based on previous knowledge of the conserved residues lining the pockets.

Structure-Based VS
The three-dimensional structures of the five proteins of EHP were used as targets for screening drugs from the ZINC15 [67] and ChEMBL [68] databases with docking-based VS using the EasyVS web-based VS tool (http://biosig.unimelb.edu.au/easyvs/ accessed on 25 January 2022) [69]. A sequential screening strategy was employed for screening the drug molecules. The drug molecules were initially filtered via Lipinski's rule of five, and the selected molecules were docked using the web-based EasyVS tool. A chemical space was subsequently prepared around the druggable site that was selected as the potential target site (Table 1).

Prediction of ADMET Properties
All the screened molecules were subjected to an ADMET analysis for predicting the pharmacokinetics and toxicity properties using the SwissADME server (https:// admetmesh.scbdd.com/ accessed on 10 August 2022) [70]. ADMET studies provide insights into various pharmacokinetic properties, including absorption, distribution, metabolism, excretion, and toxicity.

Screening Based on EC
The compounds were initially screened based on the following criteria: affinity scores ≥ 7.5 and numbers of interactions ≥ 9. These compounds were re-screened based on their EC [71]. The results were further enriched via re-docking the selected compounds with Flare v5.0.0 (Cresset Inc., Cambridge, UK), and compounds with high EC values were selected for further studies. The selected compounds were additionally screened using the PatchDock server [72].

MD Simulations of Protein-Ligand Complexes
The stabilities of the protein-ligand complexes were verified with 100 ns MD simulations using the Amber ff19SB [73] force field and the general AMBER force field (GAFF) [74]. A dodecahedral box of 12 Å was constructed around the protein-ligand complexes, and the box was solvated with TIP3P water. The excess charges were neutralized via the addition of either Na + or Cl − ions at a molar concentration of 0.15 M. The systems were subjected to energy minimization for relaxing the water molecules and intramolecular steric clashes at a temperature of 300 K under 1 bar pressure. The systems were subsequently equilibrated for 20,000 ps while imposing positional restraints of 700 kJ/mol. All the simulations were performed under the NPT ensemble by maintaining the temperature at 300 K using the Langevin thermostat [75], with a collision frequency of γ = 1/ps. Pressure control was achieved by coupling the system to a Monte Carlo barostat [76] at a reference pressure of 1 atm and a relaxation time of 2 ps. The simulations were performed using the GPUaccelerated version of the OpenMM 7.6 engine [77] and the 'Making it rain' [78] cloud-based molecular simulations notebook environment. The trajectories generated during the MD simulations of the protein-ligand complexes were analyzed for calculating the values of RMSD, RMSF, and hydrogen bonds using scripts included in AMBER. The trajectories were subjected to a PCA, and the cross-correlation maps of the entire trajectories were analyzed.

Determination of Free Energies of Protein-Ligand Complexes
The binding free energies of the docked complexes were calculated using the mechanics/generalized Born surface area (MM/GBSA) approach [79]. The binding free energies (∆G bind ) were calculated using the following equations [80,81]: where ∆G complex , ∆G receptor , and ∆G ligand represent the free energy of the complex, receptor, and ligand, respectively. ∆G = ∆E gas + ∆G sol − T∆S gas (2) ∆E gas = ∆E int + ∆E ELE + ∆E VDW (3) where ∆G represents the free energy. The energy in the gas phase (∆E gas ) comprises the internal energy (∆E int ), electrostatic interactions (∆E ele ), and van der Waals interactions (∆E vdw ) energy terms. The solvation free energy (∆G sol ) comprises the polar energy (∆G GB ) and non-polar energy (∆G Surf ) terms. T∆S gas represents the contribution of conformational entropy.

Conclusions
EHP is an intracellular parasite that is responsible for the slow growth syndrome of the Penaeid shrimps L. vannamei and P. monodon. Shrimp production in Asia has declined by 10-20% owing to EHP infections, which has resulted in significant economic losses. In addition, the infection is rapidly spreading to new geographical locations. The shrimp farming industry will suffer substantial economic losses if the scenario of EHP infections remains unaltered, which will have serious effects on the global socio-economic structure. In this study, a total of fifteen compounds (CHEMBL3703838, CHEMBL2132563, CHEMBL133039, CHEMBL1091856, CHEMBL1162979, CHEMBL525202, CHEMBL4078273, CHEMBL1683320, CHEMBL3674540, ZINC000016682972, CHEMBL3142997, CHEMBL340488, CHEMBL1966988, ZINC000828645375, and CHEMBL1962731) were identified against five potential druggable protein targets of EHP. The compounds had high binding affinities and low free binding energies, as indicated by the results of extensive the insilico analyses. The compounds formed stable complexes with the respective protein targets and were predicted to have insilico inhibitory potentials. The results of the computational analyses obtained in this study will be experimentally validated by in vitro and in vivo studies in future.