Computer-Aided Prediction of the Interactions of Viral Proteases with Antiviral Drugs: Antiviral Potential of Broad-Spectrum Drugs

Human society is facing the threat of various viruses. Proteases are promising targets for the treatment of viral infections. In this study, we collected and profiled 170 protease sequences from 125 viruses that infect humans. Approximately 73 of them are viral 3-chymotrypsin-like proteases (3CLpro), and 11 are pepsin-like aspartic proteases (PAPs). Their sequences, structures, and substrate characteristics were carefully analyzed to identify their conserved nature for proposing a pan-3CLpro or pan-PAPs inhibitor design strategy. To achieve this, we used computational prediction and modeling methods to predict the binding complex structures for those 73 3CLpro with 4 protease inhibitors of SARS-CoV-2 and 11 protease inhibitors of HCV. Similarly, the complex structures for the 11 viral PAPs with 9 protease inhibitors of HIV were also obtained. The binding affinities between these compounds and proteins were also evaluated to assess their pan-protease inhibition via MM-GBSA. Based on the drugs targeting viral 3CLpro and PAPs, repositioning of the active compounds identified several potential uses for these drug molecules. As a result, Compounds 1–2, modified based on the structures of Ray1216 and Asunaprevir, indicate potential inhibition of DENV protease according to our computational simulation results. These studies offer ideas and insights for future research in the design of broad-spectrum antiviral drugs.


Introduction
The outbreak of Coronavirus Disease 2019 (COVID- 19), caused by SARS-CoV-2, has had a significant impact on human health and lives [1,2].In May 2023, the World Health Organization announced that the three-year epidemic is no longer considered a Public Health Emergency of International Concern (PHEIC) [3].Although the most severe phase of the COVID-19 pandemic is over, people still have a long way to go to combat the virusdriven crisis.A lack of understanding of viruses underlines the global vulnerability to emerging diseases.The Global Virome Project report estimates that there are 1.67 million unknown viruses in mammalian and bird hosts [4].Among these viruses, an estimated 650,000 to 840,000 can infect the human body and cause disease [4].To date, only about 270 viruses are known to infect humans [1,2,[4][5][6][7][8][9][10].
Since the 1960s, at least 95 small molecule inhibitors against viruses have been approved (Table S1) [11].However, these small-molecule drugs only target 11 types of viruses (HSV, VZV, HCMV, VARV, HBV, HCV, SARS-CoV-2, FluV, RSV, HDV, and HIV), which account for only 4.07% of the 270 known viruses that can infect humans.In addition to targets related to viral DNA and RNA synthesis, e.g., DNA polymerase, RNA polymerase, and reverse transcriptase, proteases are also frequently targeted in the development of antiviral drugs.A few antiviral drugs targeting proteases such as 3CL pro of SARS-CoV-2, NS3/4A of HCV, and the protease of HIV have been approved (details are below).The 3CL pro of SARS-CoV-2, also known as the main protease (M pro ), has the ability to cleave two polyproteins expressed by the virus genome into multiple non-structural functional proteins.This process plays a crucial role in virus translation and replication [12].NS3/4A of HCV is embedded in the endoplasmic reticulum membrane of host cells [13,14].Its primary function is similar to that of the 3CL pro of SARS-CoV-2, which cleaves the polyproteins translated from the HCV genome into functional viral proteins [13,14].The HIV protease cleaves the polyprotein Gag-Pol at multiple sites to produce mature viral components of HIV [15].
As shown in Figure 1, the approved drugs targeting 3CL pro of SARS-CoV-2 are three covalent inhibitors: Nirmatrelvir [16,17], SIM0417 [18], and RAY1216 [19], as well as a non-covalent inhibitor, Ensitrelvir [20].Nirmatrelvir and SIM0417 need to be combined with Ritonavir to increase the drug's half-life in the human body.Three generations of drugs have been developed for HCV NS3/4A.Boceprevir [21], Telaprevir [22], and Narlaprevir [23] are first-generation Protease Inhibitors (PIs) that all contain covalent warheads (α, β-unsaturated aldehydes and ketones) capable of forming covalent bonds with the catalytic residue Ser.The second generation of PIs, including Asunaprevir [24], Danoprevir [25], Paritaprevir [26], and Simeprevir [27], do not contain functional groups to form covalent bonds and are equipped with hydrophobic aromatic groups P2, which significantly increases their inhibitory effect on proteases.The third generation of PIs, including Vaniprevir [28], Grazoprevir [29], Voxilaprevir [30], and Glecaprevir [31,32], have transformed the P1-P3 macrocycle of the second generation of PIs into a P2-P4 macrocycle.They have the advantage that they are effective against many genotypes, including a short treatment duration, infrequent oral administration, do not require the use of interferon in combination, and are not affected by multiple drug-resistant mutations.In the early 1980s, AIDS emerged as a new viral disease in the public eye, marking the beginning of a prolonged battle against the illness [33].Two generations of drugs targeting HIV protease have been developed.Saquinavir is the first HIV protease inhibitor [34].Subsequently, the other first-generation PIs of Ritonavir [35], Indinavir [36], Nelfinavir [37], and Amprenavir [38] were also approved.The secondgeneration PIs of Lopinavir [39], Atazanavir [40], Tipranavir [41], and Darunavir [42] are characterized by a greater emphasis on the pharmacokinetic properties, efficacy, and safety of the drugs.These drugs targeting HIV proteases all contain hydroxyl groups that form hydrogen bonds with the catalytic residue Asp.This interaction is crucial for inhibiting the activity of proteases.There are numerous other viral proteases that play crucial roles in the life cycle of viruses.For example, the 3CL pro of viruses in the order Picornavirales, such as FMDV [43], EV-A71 [44], HRV [45], and NV [46], and the proteases of viruses in the order Amarillovirales, such as DENV [47], could cleave the viral polyproteins.Based on the design process of approved drugs, we can develop drugs for similar targets.
Understanding the development process and the mechanisms of antiviral drugs is important for designing more efficient drugs.In this article, we have collected protease sequences from viruses that infect humans and depicted the protease profiles to illustrate the distribution of viral proteases among viruses.We focused on two types of proteases, 3CL pro and PAPs, analyzed their sequences, structures, and substrate features, and performed binding predictions for these proteins and some approved anti-virus drugs.The initial binding poses for drugs-proteins were constructed by referring to the experimentally determined binding complexes of the protease and ligands.The studied drugs were positioned appropriately through structure alignment, and the residues around the small molecules were treated as flexible in order to refine the complex structures.

The Protease Profiles of Human-Infective Viruses
We initially analyzed which viruses related to humans are capable of expressing proteases and the types of proteases that can be expressed.There are approximately 270 known viruses that infect humans [1,2,[4][5][6][7][8][9]48].According to the order classification, these viruses include Renovirales and Durnavirales of the dsRNA virus class; Monongavirales, Bunyavirales, and Articulaviales of the (−) ssRNA virus class; Picornavirales, Nidovirales, Amarillovivirales, Martellivirales, Hepelivirales, and Stellavvirales of the (+) ssRNA virus class; Retrovirales of the ssRNA (RT) virus class; Piccovirales and Aneloviridae (family) of the ssDNA virus class; Rowavirales, Herpesvirales, Zurhausenvirales, Sepolyvirales, and Chitovirres of the dsDNA virus class; Hepadnaviridae of the dsRNA (RT) virus class.Among these viruses, those capable of expressing proteases are mainly distributed in six orders of the (+) ssRNA virus class, Retrovirales of the ssRNA (RT) virus class, Bunyavirales of (−) ssRNA virus class, and Rowavirales, Herpesvirales, and Chitovirres of the dsDNA virus class.Based on this classification, we extracted and collected the protease sequences of these viruses and generated protease profiles of human-infective viruses to visualize the distribution of these targets on the constructed evolutionary trees (Figure 2).These proteases are classified into seven clans (Table 1), including a Cys/Ser protease (clan PA), three Cys proteases (clan CA, clan CN, and clan CE), a Ser protease (clan SH), an Asp protease (clan AA), and a Metallo-protease (clan ME).The same clan has the same fold.The proteins in the group of clan PA are 3-chymotrypsin-like proteases (3CL pro ).Proteins in the group of clan AA are pepsin-like aspartic proteases (PAPs).The proteins in the group of clan CA are papain-like proteases (PL pro ).

The Protease Profiles of Human-Infective Viruses
We initially analyzed which viruses related to humans are capable of expressing proteases and the types of proteases that can be expressed.There are approximately 270 known viruses that infect humans [1,2,[4][5][6][7][8][9]48].According to the order classification, these viruses include Renovirales and Durnavirales of the dsRNA virus class; Monongavirales, Bunyavirales, and Articulaviales of the (−) ssRNA virus class; Picornavirales, Nidovirales, Amarillovivirales, Martellivirales, Hepelivirales, and Stellavvirales of the (+) ssRNA virus class; Retrovirales of the ssRNA (RT) virus class; Piccovirales and Aneloviridae (family) of the ssDNA virus class; Rowavirales, Herpesvirales, Zurhausenvirales, Sepolyvirales, and Chitovirres of the dsDNA virus class; Hepadnaviridae of the dsRNA (RT) virus class.Among these viruses, those capable of expressing proteases are mainly distributed in six orders of the (+) ssRNA virus class, Retrovirales of the ssRNA (RT) virus class, Bunyavirales of (−) ssRNA virus class, and Rowavirales, Herpesvirales, and Chitovirres of the dsDNA virus class.Based on this classification, we extracted and collected the protease sequences of these viruses and generated protease profiles of human-infective viruses to visualize the distribution of these targets on the constructed evolutionary trees (Figure 2).These proteases are classified into seven clans (Table 1), including a Cys/Ser protease (clan PA), three Cys proteases (clan CA, clan CN, and clan CE), a Ser protease (clan SH), an Asp protease (clan AA), and a Metallo-protease (clan ME).The same clan has the same fold.The proteins in the group of clan PA are 3-chymotrypsin-like proteases (3CL pro ).Proteins in the group of clan AA are pepsin-like aspartic proteases (PAPs).The proteins in the group of clan CA are papain-like proteases (PL pro ).As shown in Table 1, we have summarized the catalytic residues and functions of these proteases.Except for HEV and RV of Hepelivirales, other (+) ssRNA viruses can express 3CL pro .The capsid of alphaviruses (order Martellivirales) exhibits 3CL pro activity, which is utilized for self-cleavage [49][50][51].The capsid of an alphavirus is released from the polyprotein through a one-step enzymatic cleavage, after which its protease activity is inactivated [49][50][51].The 3CL pro of Picornavirales, Nidovirales, Amarillovirales, and Stellavirales is used to cleave polyproteins that are expressed in the viral genome and are not components of the viral particles [12][13][14]52,53].Alphavirus cleaves the polyprotein using nsP2 proteins (clan CE) [54].
Several viruses can produce proteins with PL pro activities.The PL pro of the coronavirus in Nidovirales proteases, in addition to 3CL pro , have multiple cleavage sites on the viral polyprotein and act as deubiquitinases (DUBs) [55][56][57].HCoV-OC43, HCoV-HKU1, HCoV-229E, HCoV-NL63, and CCoV-HuPn-2018 express two PL pro (PLP1 and PLP2) [58].PLP2 has the same function as PL pro in other coronaviruses [55][56][57].PLP1 and PLP2 collaborate to promote virus replication.L pro (the leader protein) is the first protein encoded on the FMDV (Picornavirales) polyprotein and belongs to the PL pro type.It self-cleaves from the precursor of the polyprotein, cleaves the host translation initiation factor eIF-4G, leading to a decrease in host cap-dependent mRNA translation, and also exhibits DUBs activity [56].The PL pro , expressed by RV (Hepelivirales), cleaves viral polyproteins into nonstructural proteins [56].PL pro , expressed by RV (Hepelivirales), cleaves viral polyproteins into nonstructural proteins [59].The large tegument protein of Herpesvirales and the large (L) protein of CCHFV (Bunyavirales) are both components of the viral particle and contain a PL pro domain, which has been shown to have DUB activity [56,60].
The proteases of the assemblins (clan SH) of herpesviruses (Herpesvirales) are crucial for the formation of the nucleocapsid of the virus and virus replication.During the viral assembly process, the natural substrates of the assemblins are the viral protease precursor and the viral-assembly protein [61,62].
In the family of Retroviridae, 11 human-infectable viruses, including HIV-1, HIV-2, WMSV, GaLV, XMRV, SFV, BLV, HTLV-1, HTLV-2, HTLV-3, and HTLV-4, express PAPs (clan AA) to cleave viral polyproteins.These PAPs are also structural proteins of the viruses that are distributed in the viral matrix.The protease of SFV is located on the same protein as the reverse transcriptase and ribonuclease H [63].
The core protease (I7) of the Poxviridae (Chitovirales) and adenain (the adenovirus endoprotease) of the Adenoviridae (Rowavirales) belong to the group of clan CE.The core protease is necessary for cleaving the viral membrane and core proteins during virus assembly [64,65].Adenain, which is responsible for the uncoating and maturation of virions, converts immature viral particles into mature viral particles with infectious ability and releases them from infected cells [66].In addition, poxviridae also express a Metalloprotease (clan ME), which plays a role in the maturation of viral proteins [67].
The classification of viral proteases is important for developing broad-spectrum antiviral inhibitors, establishing precise antiviral treatment systems, and increasing the global stock of antiviral drugs.

3-Chymotrypsin-Like Proteases 2.2.1. The Sequences, Structures, and Substrates of the Class of 3CL pro
Chymotrypsin comprises two β-barrel-like domains (Figure 3A).Each β-barrel consists of 6 β-Sheets.The catalytic active center is located between two domains.As shown in Figure 3B, we present the 3CL pro structures of typical viruses from five viral orders.The catalytic domains of SARS-CoV-2, EV-A71, HAtV, HCV, and CHIKV 3CL pro exhibit the same structural fold as chymotrypsin.The active center of Coronaviridae 3CL pro consists of the catalytic dyad His-Cys.The active center of Picornavirales 3CL pro consists of the catalytic triad His-Asp/Glu-Cys.The active center of astrovirus Flaviviridae and alphavirus 3CL pro consists of the catalytic triad His-Asp/Glu-Ser.
A total of 73 sequences of viral 3CL pro proteins were collected (Figure 2).We calculated the sequence identities between these proteins and illustrated the matrix diagram in Figure 3C.Identity values can be found in the Supplementary Materials.Conserved sequences are concentrated in the same order.We clustered these proteins into 13 classes based on identity values.Sequences with identity values greater than 30% were grouped into one class.The identity between the 3CL pro of coronaviruses is greater than 30% (class 1).HAtV 3CL pro of Stellavvirales was classified as class 2. For the viruses in the order of Picornavirales, we classified these viruses into 8 classes: class 3 (EV-A71, EV-B93, EV-D68, EV-C, PV, HRV-B, and HRV-A), class 4 (PeV and LjV), class 5 (NV and HuNoV), class 6 (EMCV), class 7 (AiV), class 8 (FMDV), class 9 (HAV), and class 10 (SaV).In the Flaviviridae family, the proteases of other 25 viruses, such as ZIKV, JEV, and DENV, belong to the NS3/2B (class 11), while the proteases of HCV and HGV belong to the NS3/4A (class 12).NS4A and NS2B are both cofactors of proteases.The identity values between the 3CL pro of alphaviruses are all greater than 30% (class 13).We performed sequence alignment with structural constraints to align the 3CL pro sequences and depicted a sequence logo for the two domains.As shown in Figures 3D and S1, there are 6 β-sheets distributed in each of the domains.Each β-sheet is arranged in the same order.The positions of the catalytic residues His and Cys/Ser are highly conserved.A total of 73 sequences of viral 3CL pro proteins were collected (Figure 2).We calculated the sequence identities between these proteins and illustrated the matrix diagram in Figure 3C.Identity values can be found in the Supplementary Materials.Conserved sequences are concentrated in the same order.We clustered these proteins into 13 classes based on identity values.Sequences with identity values greater than 30% were grouped into one class.The identity between the 3CL pro of coronaviruses is greater than 30% (class Meanwhile, we also collected the substrates of these proteases.The cleavage sites of some viral proteases are not annotated, but they share similarities with homologous sequences in terms of cleavage sites and substrate sequences.Then, we created sequence logo diagrams for the substrate sequences of the 13 classes of viruses (Figure S2).Due to the conservative P1 residues, we further divided the substrate sequence into four categories based on the P1 position.The sequence logo diagram is shown in Figure 3E.Substrates of Coronaviridae, Picornavirales, and HAtV 3CL pro have a high proportion of glutamine (Q) or glutamic acid (E) at the P1 position and were classified into a single category.The substrates of Flaviviridae NS3/2B have a high proportion of Arginine (R) or Lysine (K) at the P1 position and were classified as a single type.Substrates of the Flaviviridae NS3/4A protease have a high proportion of Cysteine (C) or Threonine (T) at the P1 position and were classified as a single type.Substrates of Alphavirus capsid have a high proportion of Tryptophan (W) at the P1 position and were classified as one type.The substrate sequence preference indicates the binding preference of protein sites.In Figure S3A, the S1 position of the protein pocket corresponds to the P1 residue.By combining the residues at the S1 position with reported complex structures, we can summarize the patterns of the functional groups of inhibitors at the S1 site (Figure S3B).The high proportion of Q/E at the P1 position in Coronaviridae, Picornavirales, and HAtV is attributed to the presence of the Histidine (His) residue at the S1 position, which acts as a hydrogen bond donor.We offer various fragments for designing inhibitors, including amide groups, lactam rings, carboxyl groups, and nitrogen heterocycles.Due to the presence of Aspartic acid (Asp) in Flaviviridae NS3/2B at the S1 site, positively charged groups are suitable for this position, such as guanidine and amino groups.Hydrophobic groups are suitable for the S1 site of Flaviviridae NS3/4A.A Tryptophan (Trp) residue inserts into the S1 site of the alphavirus capsid.In addition, hydrophobic groups are suitable for the S2 site of Coronaviridae 3CL pro , such as the P2 position of Nirmatrelvir, SIM0417, RAY1216, and Ensitrelvir.For the S2 site of Flaviviridae NS3/2B, positively charged groups are suitable for this position, such as guanidine and amino groups.
The analysis and comparison of the sequence, structure, and substrate characteristics of viral 3CL pro are highly significant for designing inhibitors targeting such targets.

The Drug-Protein Complex Structure Prediction and Modeling Method
If the two protein sequences are significantly different, i.e., the identity is lower than 30%, the two structures cannot be aligned using sequence-based 3D structural alignment methods.Sequence-independent alignment enables researchers to visually identify similarities and differences between proteins from a 3D perspective.The structures of Nirmatrelvir and SIM0417 closely resemble those of Boceprevir (Figure 4A).The identity between HCV NS3/4A and SARS-CoV-2 3CL pro is only 4.87%.We superimposed the 3D complex structures of SARS-CoV-2 3CL pro -Nirmatrelvir and SARS-CoV-2 3CL pro -SIM0417 on the complex structure of HCV NS3/4A-Boceprevir using cealign [68,69], a sequence-independent alignment method (Figure 4B).The root mean square deviations (RMSD) are 4.395 Å and 4.901 Å, respectively, and the P1-P4 segments of these molecules have matching 3D structures.The above results suggest that the inhibitors may have similar binding modes when the proteins have highly conserved active sites.In molecular docking, small molecules are flexible, while proteins are rigid or semi-flexible.This method cannot accurately predict the binding mode between small molecules and homologous proteins with significant differences.Constructing complex structures through structural alignment and local refinement is suitable for this situation.
Of the 73 sequences that we have collected, the structures of 34 sequences have already been determined experimentally.Approximately 35 sequences have no experimental structures, but they have templates with an identity of more than 30% in the PDB database.Four sequences have neither structures nor templates with similar sequences.The emergence of AlphaFold enables researchers to accurately predict protein structures [70].ColabFold simplifies and accelerates the prediction of the 3D structure of proteins [71].We used ColabFold to predict the 3D structures of sequences without experimental structures, superimposed these structures, and calculated the RMSD between them (Figure 4C).The RMSD of these 3D structures ranges from 0.122 Å to 7.611 Å.The structural comparison results indicate that there may be slight spatial differences in the positions of protease pockets S1, S2, S3, and S4, but the overall arrangement remains consistent.
The emergence of AlphaFold enables researchers to accurately predict protein structures [70].ColabFold simplifies and accelerates the prediction of the 3D structure of proteins [71].We used ColabFold to predict the 3D structures of sequences without experimental structures, superimposed these structures, and calculated the RMSD between them (Figure 4C).The RMSD of these 3D structures ranges from 0.122 Å to 7.611 Å.The structural comparison results indicate that there may be slight spatial differences in the positions of protease pockets S1, S2, S3, and S4, but the overall arrangement remains consistent.The computational binding complex prediction strategy is based on the hypothesis that a ligand will bind to the same binding site with similar binding poses while it interacts with similar proteins.As shown in Figure 4D, the main idea is to build the complex structures of target proteins and ligands according to the known experimental complex structures.For small molecules without experimental complex structures, we aligned them to the experimentally determined bound conformations and the binding positions of the ligands, which are structurally similar, by using Ligand Alignment in the Schrödinger suite to generate initial binding poses for them, then merged these conformations with the target protein structures to create complex structures, and performed substantial local optimization to obtain the final complex structures.For proteins without experimental complex structures, we superimposed their structures on proteins with reported complex structures, extracted ligands from the complex structures, merged these conformations, and conducted optimization in the local region of the binding sites to build the complex structures.The approved drugs have been tested for their bioactivities, pharmacokinetics, and toxicities before being administered to the human body.Repositioning such drugs The computational binding complex prediction strategy is based on the hypothesis that a ligand will bind to the same binding site with similar binding poses while it interacts with similar proteins.As shown in Figure 4D, the main idea is to build the complex structures of target proteins and ligands according to the known experimental complex structures.For small molecules without experimental complex structures, we aligned them to the experimentally determined bound conformations and the binding positions of the ligands, which are structurally similar, by using Ligand Alignment in the Schrödinger suite to generate initial binding poses for them, then merged these conformations with the target protein structures to create complex structures, and performed substantial local optimization to obtain the final complex structures.For proteins without experimental complex structures, we superimposed their structures on proteins with reported complex structures, extracted ligands from the complex structures, merged these conformations, and conducted optimization in the local region of the binding sites to build the complex structures.The approved drugs have been tested for their bioactivities, pharmacokinetics, and toxicities before being administered to the human body.Repositioning such drugs can expedite the development of inhibitors.We constructed complex structures of 4 protease inhibitors of SARS-CoV-2 and 11 protease inhibitors of HCV against 73 viral 3CL pro proteins using this method and then estimated the binding affinities via the method of molecular mechanics Poisson-Boltzmann surface area (MM-GBSA) [72].The binding energy matrix is shown in Figure 4E.The concrete application of this strategy will be introduced as follows:
Energy is one of the indicators of the binding affinity between a ligand and a protein pair.The binding mode of a ligand-protein can provide insight into molecular interactions and is crucial for drug design.RAY1216 exhibits strong binding energy with multiple 3CL pro viruses.We predicted the binding modes of the typical 3CL pro viruses from thirteen different classes of viruses (Figures 5 and S5).The binding modes between RAY1216 and eight 3CL pro proteins of viruses in the order Picornavirale (EMCV, AiV, FMDV, EV-A71, HAV, PeV, SaV, and NV) are similar to the binding mode between RAY1216 and 3CL pro of SARS-CoV-2 (PDB code: 8IGN).According to our predicted binding complex structures, RAY1216 was observed to be able to fit snugly into the active pockets of these above-mentioned 3CL pro proteins and form hydrogen bond interactions with the conserved residue His in the S1 pocket.The α, β-unsaturated aldehyde and ketone of RAY1216 are in close proximity to the catalytic residue cysteine and have the potential to form covalent bonds.It also forms hydrogen bonds with the main chain of some residues around it.It is noteworthy that the 3CL pro of AiV, PeV, and SaV do not have experimentally determined structures, nor do they have templates with similar sequences.For HAtV, the binding mode suggests a potential interaction between Ray1216 and H566 of S1.However, the presence of N569 may contribute to an additional hydrogen bond with the ligand, leading to an observed lower binding energy between Ray1216 and HAtV 3CL pro (Figure S5).RAY1216 also exhibits strong binding with DENV NS3/2B, HCV NS3/4A, and CHIKV capsid.Further modifications are required for Ray1216 in order to accommodate the preference of S1 and S2 pockets for DENV NS3/2B and HCV NS3/4A.The self-cleaves, and the C-terminal residue Trp occupies the active site after cleavage (Figure S3A).Whether this site is suitable for designing inhibitors requires further study.
The above results indicate that Ray1216 has therapeutic potential for various Picornavirales, and can be treated as a lead inhibitor for Flaviviridae and alphaviruses.

Proposal for a Drug Optimization Strategy
We can thus develop inhibitors with a broad spectrum of activity for virus targets with similar sequences.For targets with significant differences in sequences but similar folding and structures, we can design and modify inhibitors based on existing viral drugs.Till now, there have been no specific drugs for Dengue, caused by the dengue virus (DENV).Hence, we propose some modified molecules, starting with Ray1216 and Asunaprevir for DENV protease (NS3/2B).Due to the preference of the S1 and S2 sites or substrate P1 and P2 characteristics, we replaced the P1 and P2 groups of Ray1216 and Asunaprevir with guanidine and amino groups (Figure 6A).The complex structures of DENV protease with compounds 1 and 2 were constructed, and the ΔG was calculated by the above method.ΔG between compound 1 and DENV NS3/2B (−62.49kcal/mol) was lower than ΔG between Ray1216 and DENV NS3/2B (−59.47 kcal/mol), indicating a stronger binding.ΔG between compound 2 and DENV NS3/2B (−77.43 kcal/mol) was lower than

Proposal for a Drug Optimization Strategy
We can thus develop inhibitors with a broad spectrum of activity for virus targets with similar sequences.For targets with significant differences in sequences but similar folding and structures, we can design and modify inhibitors based on existing viral drugs.Till now, there have been no specific drugs for Dengue, caused by the dengue virus (DENV).Hence, we propose some modified molecules, starting with Ray1216 and Asunaprevir for DENV protease (NS3/2B).Due to the preference of the S1 and S2 sites or substrate P1 and P2 characteristics, we replaced the P1 and P2 groups of Ray1216 and Asunaprevir with guanidine and amino groups (Figure 6A).The complex structures of DENV protease with compounds 1 and 2 were constructed, and the ∆G was calculated by the above method.

Pepsin-Like Aspartic Proteases
Pepsin-like aspartic proteases (PAPs) include two domains with similar folding.Each domain is composed of multiple stranded β-sheets and a few helices (Figure 7A).Retroviral PAPs consist of two identical catalytic subunits (Figure 7B).The catalytic site is located between two subunits.Each subunit has a catalytic residue (Asp) in the same position.
We collected the sequences of 11 Retrovirales proteases and calculated their identities (Figure 7C).Classification was performed according to a cutoff of identity > 30%.The Retrovirales PAPs include five classes: class 1 (HIV-1 and HIV-2), class 2 (WMSV, GaLV, and XMRV), class 3 (HTLV-1, HTLV-2, HTLV-3, and HTLV-4), class 4 (BLV), and class 5 (SFV).We performed sequence alignment for 11 PAPs (Figure S6).The sequence logo of alignment is shown in Figure S7A.β-sheets and α-helics are arranged in order.The positions of the catalytic residue Asp are very conservative.We also collected substrates for the PAPs (Figure S7B).The substrate-cleaving characteristics of SFV protease are uncertain.The cleaving features of other viral proteases are similar.The P1 residue is often a hydrophobic residue.There are six PAPs without experimental report structures.We predicted the dimer structures with ColabFold and superimposed the 3D structures of 11 PAPs.The RMSD matrix diagram is shown in Figure S7C.The RMSD ranges from 0.787 to 6.221 Å.
ing features of other viral proteases are similar.The P1 residue is often a hydrophobic residue.There are six PAPs without experimental report structures.We predicted the dimer structures with ColabFold and superimposed the 3D structures of 11 PAPs.The RMSD matrix diagram is shown in Figure S7C.The RMSD ranges from 0.787 to 6.221 Å.
We constructed complex structures of 11 PAPs with 9 HIV protease inhibitors through a local optimization strategy and depicted the ΔG matric diagram between ligands and proteins (Figure 7D).ΔG are all less than -39.33 kcal/mol.We took Saquinavir, for example, to demonstrate the binding modes of HIV-1, WMSV, HTLV-1, BLV, and SFV PAPs with saquinavir.The complex structures of WMSV, HTLV-1, BLV, and SFV PAPs with saquinavir are similar to the complex structure of HIV-1 PAP with Saquinavir (Figure 7E,F).To provide a more intuitive analysis of binding modes, we drew a ligand interaction diagram.As shown in Figure 7F, the hydroxyl groups of saquinavir all form hydrogen bonding interactions with the catalytic residue Asp of proteases, which is essential to the binding of PAP inhibitors to the protein.Inhibitors can also form salt bridges with the catalytic residue Asp of proteases.Inhibitors also have various interactions with other parts of proteases, such as hydrophobic interaction, π-π interaction, and hydrogen bond interaction.Overall, HIV protease inhibitors have the potential to inhibit the other nine retroviruses.We constructed complex structures of 11 PAPs with 9 HIV protease inhibitors through a local optimization strategy and depicted the ∆G matric diagram between ligands and proteins (Figure 7D).∆G are all less than −39.33 kcal/mol.We took Saquinavir, for example, to demonstrate the binding modes of HIV-1, WMSV, HTLV-1, BLV, and SFV PAPs with saquinavir.The complex structures of WMSV, HTLV-1, BLV, and SFV PAPs with saquinavir are similar to the complex structure of HIV-1 PAP with Saquinavir (Figure 7E,F).To provide a more intuitive analysis of binding modes, we drew a ligand interaction diagram.As shown in Figure 7F, the hydroxyl groups of saquinavir all form hydrogen bonding interactions with the catalytic residue Asp of proteases, which is essential to the binding of PAP inhibitors to the protein.Inhibitors can also form salt bridges with the catalytic residue Asp of proteases.Inhibitors also have various interactions with other parts of proteases, such as hydrophobic interaction, π-π interaction, and hydrogen bond interaction.Overall, HIV protease inhibitors have the potential to inhibit the other nine retroviruses.

Discussion and Conclusions
In this study, we collected and clustered 170 protease sequences from 125 viruses related to the infection of humans.The sequences, structures, and substrate characteristics of the viral 3CL pro and PAPs were analyzed.We employed a ligand-protein complex structure prediction method to reposition approved drugs for viral 3CL pro and PAPs.Taking Ray1216 and Asunaprevir as examples, we modified and designed inhibitors for the DENV protease and conducted computational simulations.We classified proteases from various viruses, offering valuable insights for systematic research and the design of antiviral drugs.
There is no specific drug available to treat viruses in the order Picornavirales.For example, HRV is a virus in the order Picornavirales.The clinical trial of the HRV inhibitor (Rupintrivir) was terminated due to low oral bioavailability [45].Our results indicate that four approved drugs targeting the 3CL pro of SARS-CoV-2 show potential for inhibiting the proteases of multiple viruses in the order Picornavirales (Figures 1 and 4E).Expanding the indications of these drugs can help address the current shortage of specific medications for Picornavirales.
In addition, Ensitrelvir is a non-peptidomimetic, non-covalent inhibitor that targets the SARS-CoV-2 3CL pro .As depicted in Figure S8, the predicted complex structure of DENV NS3/2B-Ensitrelvir is quite similar to the complex structure of SARS-CoV-2 3CL pro -Ensitrelvir.The P1', P1, and P2 segments of Ensitrelvir are well bound with the S1', S1, and S2 sites in both complex structures.Ensitrelvir is a potential inhibitor targeting DENV NS3/2B.
Furthermore, our study has demonstrated that the strategy is suitable for homologous proteins with significant differences.The RMSD between SARS-CoV-2 3CL pro with SIM0417 (PDB code: 8IGX) and MODV NS3/2B is 6.226 Å.The RMSD between HCV NS3/4A with Boceprevir (PDB code: 3LOX) and PeV 3CL pro is 7.539 Å.The RMSD between HIV-1 Protease with Saquinavir (PDB code: 4QGI) and SFV Protease is 4.739 Å.The RMSD values between proteins are significant.As shown in Figure S9, we displayed and compared the complex structures.There are many unreasonable interactions between small molecules extracted from the experimentally determined structures and the target proteins (Figure S9B).After refining the complex structures in Figure S9B using our complex structure prediction strategy, we obtained the predicted complex structures of MODV NS3/2B-SIM0417, PeV 3CL pro -Boceprevir, and SFV Protease-Saquinavir (Figure S9C).The P4'-P4 segments of ligands occupy well with the S4-S4' sites on the surface of proteins.
This study proposed the potential of repurposing anti-virus drugs for other known viruses that contain similar proteins and provided modification strategies for further optimizing these drugs.For example, inhibitors of the VEEV nsP2 protease can be used to find inhibitors for other alphavirus nsP2 proteins [78].The inhibitors of HCMV and HHV6 assemblins can be applied to finding inhibitors of other Herpesvirus assemblins [79].The folding of Poxviridae core proteases is similar to that of Adenoviridae adenains.The inhibitors of Adenoviridae adenains have been reported [80][81][82].Using this as a reference, inhibitors of Poxviridae core proteases can be designed.However, our strategy also has some limitations.Compared to molecular dynamics simulations with larger sampling [83], this strategy is faster but cannot provide as much useful information.This limitation needs to be considered in the future.Typically, active sites are highly conservative, whereas allosteric sites are relatively unique, so our strategy may not be as applicable to allosteric sites.In addition, we only selected one sequence for each virus.In fact, each virus exists in multiple mutated forms.Research on virus resistance is also especially important.
With the increasing availability of information about viruses, there is a growing need for systematic research on viruses and improved treatment of antiviral infections.Exploiting broad-spectrum inhibitors for similar viral targets and summarizing inhibitor design strategies for the same type of target will advance the development of antiviral drugs.

The Protease Profiles of Human-Infective Viruses
The clan classification was based on the MEROPS database [101].The proteases in the same clan were clustered by phylogenetic analysis in MEGA 11 [102].The sequences were first aligned by ClustalW [103]

Sequence Alignment
The identity matrixes of aligned sequences by ClustalW were calculated by the BioAider tool [105].The heat map was drawn with GraphPad Prism 8.0 software.The structure-based multiple sequence alignment was conducted by PROMALS3D [106].The seq-logo of aligned sequences was depicted by WebLogo [87].

Three-Dimensional Structure Prediction
Protein structures and complexes were predicted by ColabFold [71].Choose the template structures with the highest rank.The structures of Flaviviridae NS3/4A, NS3/2B, and Retrovirales proteases are dimers.The structures of other sequences are monomeric.Due to the predicted structures of HEV PL pro , BToV 3CL pro , and BToV PL pro with low confidence, the proteins were removed from the proteases list.The PDB ID of proteins with reported structures is added to the Supplementary Materials.

The Computational Ligand-Protein Complex Structure Prediction
The protein structures were first prepared.Solvents and ligands were removed from the experimentally determined structures.One chain of proteins with multiple chains retained.We use the Protein Preparation Wizard in Maestro 12.8 (Schrödinger, LLC, New York, NY, USA, 2021) to assign correct protonation states and formal charges and add missing residues [107].
The pipeline for complex structure prediction is shown in Figure 4D.For proteins without experimental complex structures, the structures were constructed following the steps of aligning the protein structures, extracting ligands into target protein structures, and performing local refinement for complex structures.For ligands without experimental complex structures, the steps will be modified as follows: aligning the target ligands into the reference ligands, extracting target proteins, merging proteins and target ligands, and performing local refinement for complex structures.
For example, the complex structure of HCV 3CL pro with Paritaprevir has not been experimentally determined, but the 2D structure of Paritaprevir is similar to that of Danoprevir.Paritaprevir was preprocessed by using the LigPrep module in Maestro 12.8 to predict the protonation state and generate the low-energetical 3D conformation.The structure of Paritaprevir was aligned to the ligand Danoprevir in PDB 5EQR by Ligand Alignment in Maestro 12.8.Then, the complex structure of HCV 3CL pro with Paritaprevir was constructed by Prime MM-GBSA in Maestro 12.8.The compounds 1-2, designed by us, were simulated using the same methods.
The 3CL pro or PAPs of multiple viruses were aligned to the above complex structures by Cealign in PyMOL [68,69].Then, the complex structures of proteins with ligands were constructed, and the binding energies were calculated by Prime MM-GBSA.The flexible residue distance from the ligand is 5 Å.The solvation model VSGB and force field OPLS4 were used in the simulations.Heat maps of energy matrixes were drawn with GraphPad Prism 8.0 software.

Funding:
We acknowledge the funding supported by the National Natural Science Foundation of China (82341093), the National Key R&D Program of China (No. 2022YFC3400501), the Science and Technology Commission of Shanghai Municipality grants (20QA1406400 and 2043078030), the start-up package from ShanghaiTech University, and the Shanghai Frontiers Science Center for Biomacromolecules and Precision Medicine at ShanghaiTech University.We are also grateful for the support from HPC Platform.

Figure 2 .
Figure 2. The protease profiles of human-infective viruses.Proteases in the same group of clans are shown in one tree.Viruses in the same order are marked with the same colors.

Figure 2 .
Figure 2. The protease profiles of human-infective viruses.Proteases in the same group of clans are shown in one tree.Viruses in the same order are marked with the same colors.

Figure 3 .
Figure 3.The folding patterns, sequences, structures, and substrates of viral 3CL pro .(A) General folding topology diagram for the class of 3CL pro .Arrows represent β-sheets.(B) The 3D structures of SARS-CoV-2, EV-A71, HAtV, HCV, and CHIKV 3CL pro .The proteins are represented as cartoon models.The catalytic domains are colored in cyan.(C) The heat map shows the percentage of sequence identity for 3CL pro proteins between different viruses.(D) Sequence logo of multiple sequence alignment.Arrows represent β-sheets.Asterisks represent catalytic residues.(E) Sequence logo of 3CL pro cleavage sites.Residues are scaled according to their frequencies at each position.

Figure 3 .
Figure 3.The folding patterns, sequences, structures, and substrates of viral 3CL pro .(A) General folding topology diagram for the class of 3CL pro .Arrows represent β-sheets.(B) The 3D structures of SARS-CoV-2, EV-A71, HAtV, HCV, and CHIKV 3CL pro .The proteins are represented as cartoon models.The catalytic domains are colored in cyan.(C) The heat map shows the percentage of sequence identity for 3CL pro proteins between different viruses.(D) Sequence logo of multiple sequence alignment.Arrows represent β-sheets.Asterisks represent catalytic residues.(E) Sequence logo of 3CL pro cleavage sites.Residues are scaled according to their frequencies at each position.

Figure 4 .
Figure 4. Computational prediction and analysis of the ligand-protein binding complex structures.(A) The 2D chemical structures of Boceprevir, Nirmatrelvir, and SIM0417.(B) The superimposition of HCV NS3/4A with Boceprevir (PDB code: 3LOX) and SARS-CoV-2 3CL pro with Nirmatrelvir (PDB code: 7RFS) and SIM0417 (PDB code: 8IGX).Boceprevir (green), Nirmatrelvir (cyan), and SIM0417 (magenta) are shown as sticks.The proteins are shown as cartoon models.(C) The heat map shows the RMSD for those viral 3CL pro proteins.(D) The pipeline of the computational ligand-protein complex structure prediction and optimization strategy.(E) The heat map shows the predicted binding affinities (measured using MM-GBSA) for the 15 studied SARS-CoV-2 or HCV drugs against the 73 3CL pro proteins of different viruses.

Figure 4 .
Figure 4. Computational prediction and analysis of the ligand-protein binding complex structures.(A) The 2D chemical structures of Boceprevir, Nirmatrelvir, and SIM0417.(B) The superimposition of HCV NS3/4A with Boceprevir (PDB code: 3LOX) and SARS-CoV-2 3CL pro with Nirmatrelvir (PDB code: 7RFS) and SIM0417 (PDB code: 8IGX).Boceprevir (green), Nirmatrelvir (cyan), and SIM0417 (magenta) are shown as sticks.The proteins are shown as cartoon models.(C) The heat map shows the RMSD for those viral 3CL pro proteins.(D) The pipeline of the computational ligand-protein complex structure prediction and optimization strategy.(E) The heat map shows the predicted binding affinities (measured using MM-GBSA) for the 15 studied SARS-CoV-2 or HCV drugs against the 73 3CL pro proteins of different viruses.

Figure 5 .
Figure 5. Predicted binding modes of Ray1216 against the 3CL pro proteins of Picornavirales.The proteins are shown as cartoon models.The Ray1216 and interaction residues are depicted as sticks.The red dashed lines are hydrogen bonds between the ligand and protein.Picornavirales have the viruses EMCV, AiV, FMDV, EV-A71, HAV, PeV, SaV, and NV.The binding mode for Ray1216 and the 3CL pro of SARS-CoV-2 is used as a control (PDB code: 8IGN) [19].The PDB files of FMDV (PDB code: 2WV5), EV-A71 (PDB code: 3SJO), HAV (PDB code: 2HAL), and NV (PDB code: 5T6D) 3CL pro are retrieved from the PDB database [74-77] and used in this study.The other protein structures were predicted by ColabFold [71].

Figure 5 .
Figure 5. Predicted binding modes of Ray1216 against the 3CL pro proteins of Picornavirales.The proteins are shown as cartoon models.The Ray1216 and interaction residues are depicted as sticks.The red dashed lines are hydrogen bonds between the ligand and protein.Picornavirales have the viruses EMCV, AiV, FMDV, EV-A71, HAV, PeV, SaV, and NV.The binding mode for Ray1216 and the 3CL pro of SARS-CoV-2 is used as a control (PDB code: 8IGN) [19].The PDB files of FMDV (PDB code: 2WV5), EV-A71 (PDB code: 3SJO), HAV (PDB code: 2HAL), and NV (PDB code: 5T6D) 3CL pro are retrieved from the PDB database [74-77] and used in this study.The other protein structures were predicted by ColabFold [71].

∆G between compound 1
and DENV NS3/2B (−62.49kcal/mol) was lower than ∆G between Ray1216 and DENV NS3/2B (−59.47 kcal/mol), indicating a stronger binding.∆G between compound 2 and DENV NS3/2B (−77.43 kcal/mol) was lower than that between Asunaprevir and DENV NS3/2B (−64.19 kcal/mol).As shown in Figure 6B,C, the guanidino groups of compounds 1-2 are more suitable for binding to the S1 site compared to the γ-lactam group of Ray1216 or the alkene group of Asunaprevir.The electrostatic potential surface of proteins also displays that electronegative S1 and S2 sites pair with electropositive guanidine and amino groups.Taken together, the results indicate that compounds 1-2 we designed may have potential anti-DENV activities.Molecules 2024, 29, x FOR PEER REVIEW 11 of 2 that between Asunaprevir and DENV NS3/2B (−64.19 kcal/mol).As shown in Figure 6B,C the guanidino groups of compounds 1-2 are more suitable for binding to the S1 site com pared to the γ-lactam group of Ray1216 or the alkene group of Asunaprevir.The electro static potential surface of proteins also displays that electronegative S1 and S2 sites pai with electropositive guanidine and amino groups.Taken together, the results indicate tha compounds 1-2 we designed may have potential anti-DENV activities.

Figure 6 .
Figure 6.Inhibitors proposed against DENV protease are based on Ray1216 and Asunaprevir.(A Schematic representation of the inhibitor design of DENV protease.(B) The binding modes of DENV NS3/2B with Ray1216 and compound 1. (C) The binding modes of DENV NS3/2B with Asunaprevi and compound 2. DENV proteases are represented as cartoon models and the surface of the electro static potential.Ray1216 (green), compound 1 (cyan), Asunaprevir (blue), compound 2 (magenta) and interaction residues are shown as sticks.

Figure 6 .
Figure 6.Inhibitors proposed against DENV protease are based on Ray1216 and Asunaprevir.(A) Schematic representation of the inhibitor design of DENV protease.(B) The binding modes of DENV NS3/2B with Ray1216 and compound 1. (C) The binding modes of DENV NS3/2B with Asunaprevir and compound 2. DENV proteases are represented as cartoon models and the surface of the electrostatic potential.Ray1216 (green), compound 1 (cyan), Asunaprevir (blue), compound 2 (magenta), and interaction residues are shown as sticks.

Figure 7 .
Figure 7.The predicted binding complexes of HIV protease inhibitors against PAPs of Retrovirales.(A) Topology diagram of PAPs.Arrows represent β-sheets.Cylinders represent α helices.(B) The 3D structure of HIV PAPs.The proteins are represented as cartoon models.The catalytic domains are colored cyan and green, respectively.(C) The heat map shows the percentage of sequence identity for PAPs proteins between different viruses.(D) The heat map shows the predicted binding affinities (measured using MM-GBSA) for the nine HIV drugs against the 11 PAPs proteins of different viruses.(E) The complex structure and binding mode of HIV-1 PAPs with Saquinavir.Saquinavir is depicted as a stick.(F) The binding modes between WMSV, HTLV-1, BLV, or SFV PAPs and Saquinavir.
. The model selection was analyzed in the Find Best DNA/Protein Models (ML) option in MEGA 11.The model with the lowest BIC for model parameters was selected.The best-fit models of clan PA, clan CA, and clan SH are WAG + G models.The best-fit model of clan AA is the rtREV + G + I model.The best-fit models of clan CN, clan CE, and clan ME are LG + G models.The phylogenetic analysis was conducted using the Maximum Likelihood method.The dendrogram was generated by Evolview V3 [104].

Table 1 .
The grouped clans, catalytic residues, and the description of the function in the viral life cycle of viral proteases.

Table 1 .
The grouped clans, catalytic residues, and the description of the function in the viral life cycle of viral proteases.
Board Statement: Not applicable.