Computer-Aided Screening for Potential Coronavirus 3-Chymotrypsin-like Protease (3CLpro) Inhibitory Peptides from Putative Hemp Seed Trypsinized Peptidome

To control the COVID-19 pandemic, antivirals that specifically target the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) are urgently required. The 3-chymotrypsin-like protease (3CLpro) is a promising drug target since it functions as a catalytic dyad in hydrolyzing polyprotein during the viral life cycle. Bioactive peptides, especially food-derived peptides, have a variety of functional activities, including antiviral activity, and also have a potential therapeutic effect against COVID-19. In this study, the hemp seed trypsinized peptidome was subjected to computer-aided screening against the 3CLpro of SARS-CoV-2. Using predictive trypsinized products of the five major proteins in hemp seed (i.e., edestin 1, edestin 2, edestin 3, albumin, and vicilin), the putative hydrolyzed peptidome was established and used as the input dataset. To select the Cannabis sativa antiviral peptides (csAVPs), a predictive bioinformatic analysis was performed by three webserver screening programs: iAMPpred, AVPpred, and Meta-iAVP. The amino acid composition profile comparison was performed by COPid to screen for the non-toxic and non-allergenic candidates, ToxinPred and AllerTOP and AllergenFP, respectively. GalaxyPepDock and HPEPDOCK were employed to perform the molecular docking of all selected csAVPs to the 3CLpro of SARS-CoV-2. Only the top docking-scored candidate (csAVP4) was further analyzed by molecular dynamics simulation for 150 nanoseconds. Molecular docking and molecular dynamics revealed the potential ability and stability of csAVP4 to inhibit the 3CLpro catalytic domain with hydrogen bond formation in domain 2 with short bonding distances. In addition, these top ten candidate bioactive peptides contained hydrophilic amino acid residues and exhibited a positive net charge. We hope that our results may guide the future development of alternative therapeutics against COVID-19.


Introduction
The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) that first emerged in late 2019 (known as , can cause severe pneumonia in humans. This coronavirus has a high spread rate and has had tremendous impacts on multiple facets of society, human health, and economics. Although there have been several efforts to rapidly develop vaccines or repurpose small-molecule inhibitors against SARS-CoV-2 [1,2], there are no generally proven effective therapies for this particular disease [3]. Recently, peptides and peptide-based inhibitors have been considered as compelling alternatives to small

Hempseed Putative Antiviral Peptides Screening Using Computational Method
The input dataset of hemp seed (Cannabis sativa) peptide sequences as the putative trypsinized results of five main proteins (edestin1-3, albumin, and vicilin) is listed in the Supplementary file (Table S1). For our selection constraints, Cannabis sativa antiviral peptides (csAVPs) were selected based on the cut-off criterion that the predicted score had to be larger than half of the total values. For example, the support vector machine (SVM) probability must be over 50 in AVPpred (either in amino acid composition-based or physicochemical propertybased models) and the probability must be greater than 0.5 in Meta-iAVP and iAMPpred (focusing on the antiviral classification). Based on these constraints, 16,27, and 14 AVPs were predicted by AVPpred, Meta-iAVP, and iAMPpred, respectively. From the dataset of 127 input hemp seed peptide sequences, there were 45 sequences passing at least 1 AVP predictor, and only 10 of these peptides could pass at least 2 prediction servers ( Figure 1 and Table 1). The major csAVP candidate source was the trypsinized products from vicilin protein (5 selected csAVPs), while the rest were from edestin2, 3 and albumin. Notably, there was no selected csAVP from the edestin1 protein source at all. The majority (more than 60%) of these 45 putative csAVPs were categorized as short (6-15 amino acids) and hydrophilic peptides based on their overall characteristics and abundance ( Figure 2). Notably, the peptide net charge of AVPs were quite equally distributed into negatively charged, positively charged, and non-charged groups, as shown in Figure 2.
According to the information in the length distribution of 45 csAVP candidates, there were 27 (60%) and 14 (31%) peptides classified as short AVPs (6-15 residues) and medium AVPs (16-30 residues), respectively. Compared to other reported AVPs, the lengths of AVPs are quite variable, ranging from short to medium in length, containing 8-40 amino acid residues, and consisting of positively charged side chain amino acids [36][37][38][39][40][41]. In general, the recommended length for AVPs is between 10 and 40 amino acid residues, which could be used as a guideline for the AVPs length optimization to improve antiviral peptide properties [42,43]. According to the calculated physicochemical properties (Table 2), only five csAVPs (csAVP5, 7-10) are cationic peptides based on the net charge value. However, all csAVPs are considered as amphipathic peptides containing positively charged side chain amino acids, either arginine (R) and/or lysine (K). The amphipathic property of AVP has been suggested to be significantly influenced by the charge distribution profile around the polar-nonpolar interface, which can inhibit virus-induced cell fusion [44]. Even though the molecular function and mechanism of amphipathic antiviral peptides on coronavirus 3CL protease (3CLpro) has not been illustrated before, several amphipathic peptides were investigated through in silico and in vitro experiments as the potential 3CLpro inhibitors [31,32,[45][46][47].
The compositional analysis of csAVP amino acid sequences is shown in Figure 3. According to the analyzed result, the hydrophobic side chain amino acids (i.e., Ala, Leu, Ile, Lys, Cys, and Trp) were found in higher percentages compared to the non-csAVPs. Notably, the positively charged side chain amino acids (Lys and Arg) are still found in moderate abundance. This is reasonable due to the fact that these functional antimicrobial peptides are usually amphipathic, that is, composed of hydrophobic side chains and cationic residues at the same time [36,48,49]. In particular, the basic side chain residue (Lys) plays a really important role in the electrostatic properties of antiviral peptides and is commonly found as a preferential residue in therapeutic peptides [36]. Moreover, Lys provides the cationic property of the AVPs, enhancing the antiviral activity by facilitating the cell surface-peptide interaction and leading to the insertion into microorganisms, either through the anionic cell walls or phospholipid membranes [36,48,50,51]. Notably, in our case, Lys was not found to be a preferential amino acid of csAVPs, and the molecular mechanism of Lys toward the inhibition of viral enzymes has never been clearly revealed [36,52]. These putative csAVPs might not be as specifically involved in membrane-peptide interaction as other AVPs. Noticeably, the aliphatic and medium-sized hydrophobic side chain amino acids (Ala, Cys, Leu, and Ile) were also found to have higher percentage compositions in csAVPs. These non-polar residues have been reported to play a significant role in the amphipathic characteristics of antimicrobial peptides [36,53].

Predictive Antiviral Scores, IC 50 , and Physicochemical Properties of the Selected csAVPs
All ten selected csAVPs, classified as probable antivirals by at least two prediction programs with the cut-off as mentioned in the previous section, are listed in Table 1. All details about each candidate, including secondary structures, peptide sequences, and protein origin, are provided together with the predictive scores from four machine learningbased predictive programs. Focusing on the predicted secondary structures of each peptide, one of the most important peptide sequence features for predicting AVPs, peptide folding prediction of all selected csAVPs, was performed by the PEP-FOLD3 webserver. Our chosen peptides can be divided into three structural groups: helix-consisted loop (csAVP1, 4, 5, 6, and 8), random coiled (csAVP2, 3, 7, and 10), and β-sheet-consisted loop (csAVP9). This finding is consistent with the previous research reports that random coils and αhelices are the two major classes of AVP secondary structures, as opposed to the β-sheet structure [31,36,54].
Briefly, the red numbers indicate the significant predictive scores that pass our cut-off criteria, as mentioned before (over 50 in SVM scores for AVPpred and over 0.5 probability for Meta-iAVP and iAMPpred). Compared to the prediction result from the AMPfun (http://fdblab.csie.ncu.edu.tw/AMPfun/index.html (accessed on 10 February 2022)) web server in the previous work by our group [31], the iAMPpred tended to be more strict as an AVP screening tool. Since the first two models of AVPpred program (antiviral peptide motifbased and sequence alignment-based models) could only provide classification results (either AVPs or non-AVPs), we only consider the predictive SVM scores from Model 3 (M3) and Model 4 (M4) in the csAVP candidate selection procedure. The algorithms of these two models were based on amino acid compositions (M3), and physiochemical properties (M4), respectively. For the ENNAVIA program, both modes of the neural network prediction models (antiviral and anticoronavirus) were performed. The first two models (ENNAVIA-A and B) were used for antiviral property classification, while the other two models (ENNAVIA-C and D) were specifically used for anti-coronavirus property prediction [55].
To access information about the antiviral activity of our selected csAVPs, the AVP-IC 50 Pred server was used to predict the AVP half maximal inhibitory concentration (IC 50 ), which is the major inhibition profile. Since SARS-CoV-2 was not listed on the "virus specific" prediction platform, the RSV/INFV/HSV prediction model was selected as the closest virus choice. The other two hybrid models were also selected to predict the inhibitory doses for viruses in general, and the average predicted IC 50 (µM) was shown in the last column of Table 2. The particular calculation value relied on a regression-based algorithm, and various peptide features (i.e., amino acid composition, binary profile, physicochemical properties, and solvent accessibility) were considered to predict the IC 50 value of the AVP sequences [56]. From the virus-specific model prediction, all csAVPs were found to have quite comparable predicted IC 50 (45-48 µM). The higher predicted IC 50 of csAVP1 and csAVP5 above 46 µM indicated lower antiviral efficacy compared to other peptide candidates. However, csAVP8 and -4 were found to have the highest efficacy among putative antivirals based on the average IC 50 across three prediction models, with only 16.22 and 20.22 µM, respectively.
In the previous study on AVP screening for the human angiotensin-converting enzyme 2 (ACE-2) inhibitor, they divided the antiviral efficacy into four levels based on the estimated IC 50 values: highly effective (<1 µM), effective (1-10 µM), moderately effective (11-100 µM), and least effective (>100 µM) [30]. Our selected csAVPs could be classified as moderately effective to effective based on the predicted virus-specific SVM model and the average IC 50 values. However, the highly effective csAVPs could also be classified if only the virus-specific RF, Hybrid Model I and K Star, and Hybrid Model II-IBk and K Star were considered. Therefore, the predicted antiviral efficacy of our selected csAVPs is comparable to the selected fruit bromelain-derived peptide for human angiotensin-converting enzyme 2 (ACE-2) inhibition at 40.67 and 6.85 µM based on virus specific SVM and RF models, respectively [30]. It should be noted that these predicted IC 50 values are only predictions; in vitro testing is still required to validate the real efficacy of these candidates. Furthermore, Table 3 has provided information about the physicochemical property calculated scores of all selected csAVP candidates (csAVP1 to csAVP10) obtained from the ToxinPred webserver. The smallest peptide candidate was AVP10, with a molecular weight of 835.99 g/mole, while csAVP4 was the largest candidate, with a molecular weight of 4131.18 g/mole. Consistent with the previous study, all selected putative csAVPs were characterized as amphipathic functional peptides with high steric hindrance and sidebulk scores (over 0.5) [31].

In Silico Toxicity and Allergenicity Analysis of the Selected csAVPs
To obtain the high specificity and low side effects of the bioactive peptides, both toxicity and allergenicity are usually considered when designing effective and safe therapeutics. Aside from laboratory-based approaches, in silico screening of non-toxic peptides has recently been found to be the most efficient way to further verify the specificity and selectivity of functional peptides [57]. The estimation of probable toxic peptides was performed by the computational predictive tool, ToxinPred, to estimate the possible side effects of csAVPs on the host cell. In particular, ToxinPred analysis was based on eight  Table 4. The negative values indicate the non-toxic classified results, while the positive values indicate the possible toxicity of the analyzed peptides (highlighted in red). Only one selected peptide (csAVP3) was classified as a probable toxic peptide in some predictive models (model A and B). The ToxinPred prediction values were negative for the remaining candidates, indicating a very low toxigenic potential for host cells. Compared to the previous work of our group [31], only the toxicity side effects of the bioactive peptide candidates were considered in the potential AVP screening. To determine another possible side effect, bioinformatic tools for allergenicity prediction were employed in this study. Allergenic peptide identification is a crucial parameter for the development of vaccines and therapeutic peptides. Both AllerTOP v.2.0 and AllergenFP v.1.0 prediction tools could estimate the allergenicity factor of the polypeptides using amino acid sequencebased predictive algorithms. The prediction by the AllerTOP v2.0 server was based on an amino acid sequence and a physicochemical property-based machine learning model developed for allergenic peptide/protein classification [58]. In contrast, AllergenFP can distinguish the allergenic from the non-allergenic peptide by an alignment-free, descriptorbased fingerprint method [59]. The accuracy of the AllerTOP and AllergenFP programs were stated as 85.3 and 88%, respectively [58,59]. According to the allergenic prediction results in Table 4, only four csAVP candidates (csAVP4, 5, 6, and 8) were classified as nonallergenic peptides, while the rest were identified as probable allergenic peptides either by AllerTOP or AllergenFP servers (highlighted in red). It should be noted that the estimation value of the peptide property does not guarantee the activity of the candidate peptide in its real use. The efficacy of functional peptides can be influenced by several factors in real biological systems (i.e., ions, digestive enzymes, pH, temperature, and solvent tonicity). Therefore, laboratory experiments are still required for further validation of all selected peptide candidates to ensure their non-toxic and non-allergenic effects. Moreover, the redesign and optimization of some residues can be considered as a guideline to improve the efficiency, safety, and specificity of csAVP candidates in the future.

Molecular Docking of csAVPs with SARS-CoV-2 3CL Protease
The process of bioinformatics prediction and screening for the coronavirus 3CL protease (3CLpro) inhibitory peptides is quite challenging due to the limitations of the specific in silico screening tools available. Since there was no "ideal perfect docking program" that could give the highest accuracy and best performance in all cases, we tried to employ the best docking program in both docking approaches. According to the comparative study of 14 docking programs on protein-peptide complexes by Weng et al. [60], GalaxyPepDock (as a template-based docking approach) performs the best compared to other templatebased docking programs and significantly better than any template-free docking programs. HPEPDOCK, on the other hand, performs the best and is more computationally efficient for global docking compared to other programs. To estimate and rank the binding energy and affinity of all csAVP candidates on 3CLpro, both programs were selected to perform the molecular docking in this study.
The molecular docking simulation of these ten selected csAVPs to the apo state of the coronavirus 3CLpro crystal structure suggested the ability of csAVP1 to csAVP10 to bind near the substrate binding groove of the SARS-CoV-2 3CLpro structure ( Figure 4). To compare the docking areas of different AVPs on the SARS-CoV-2 3CLpro structure, all csAVP candidates shared common binding positions on the active site groove (Figure 4). The interaction between the side chains of these selected csAVPs and the side chains near the active site of the SARS-CoV-2 3CLpro with hydrogen bond formation was observed ( Figure 5). All hydrogen bonds observed between the candidate peptides (csAVP1 to csAVP10) and the binding pocket of coronavirus 3CL protease in the protein-peptide docking simulation are listed in Table 5. In particular, csAVP6 and -4 (the largest peptide candidates) could form ten and nine hydrogen bonds (mainly interacted with Arg123 and Glu134 residues) near the 3CLpro binding pocket, respectively. Compared to the previous research reports, the docking positions of all selected csAVPs closer to the catalytic site area of the SARS-CoV-2 3CLpro structure were very similar to the docking positions of the known 3CLpro inhibitory marine polyketides [61], antiviral drugs [62], and rice bran AVP candidates [31]. Focusing on the bonding energy types, hydrogen bonds with donor-acceptor distances of 2.2-2.5 Å are considered as "strong, mostly covalent", while those of 2.5-3.2 Å and 3.2-4.0 Å are considered as "moderate, mostly electrostatic" and "weak, electrostatic", respectively [63]. According to the interaction between the enzyme (3CLpro) and our selected inhibitors (csAVPs), the most optimized hydrogen bond acceptor-donor pair is usually found in the distance between 2.7 and 3.3 Å [64], and most hydrogen bonds between peptide molecules and protein backbones are starting to be considered as strong binding from a 3 Å bonding distance [65]. Based on our molecular docking simulation experiments, the distance in the hydrogen bond between selected csAVPs and SARS-CoV-2 3CLpro structures was found to range from 1.8 to 2.7 Å. In particular, the selected peptides tend to interact with Glu134, Gln177, Thr178, and Gln180 residues of SARS-CoV-2 3CLpro by hydrogen bonding. These hydrogen bonds were observed in relatively short bonding distances (1.8-2.5 Å), indicating the strong, mostly covalent interactions, and representing the high binding affinity.
Focusing on the calculated scores of binding affinity and binding energy obtained from PROGIDY and PIMA servers (Table 6), csAVP6 showed the strongest binding to the active site area of the SARS-CoV-2 3CLpro with a molecular docking score of −572.17 kJ/mol and a binding affinity (∆G) of −15.6 kcal/mol. These docking scores were calculated based on the template-based docking results from the GalaxyPepDock server [66]. Interestingly, the largest peptide candidate (csAVP4, as the top two in docking score at −447.27 kJ/mol) showed the highest binding affinity of −18.2 kcal/mol with the best dissociation constant of 4.30 × 10 −14 among all csAVPs. These significant high docking scores and binding affinity were promisingly better compared to the docking scores on coronavirus 3CLpro of several antiviral drugs, i.e., noscapine, chloroquine, ribavirin, and favipiravir with the scores of −292.42, −269.71, −214.17, and −153.91 kJ/mol, respectively [62]. Focusing on the docking scores obtained from another famous blind assessment of a protein−protein docking server (HPEPDOCK as a template-free docking tool) [67], csAVP4 was ranked as one of the top-four, while csAVP6 showed the least docking ability among all csAVPs. At this point, csAVP4 seemed to be the best candidate for further analysis based on the overall docking parameters and the non-toxic and non-allergenic characteristics. The 3CLpro-csAVP4 interface interactions involving affinity and binding energy distribution was obtained by high van der Waals energy −420.90 kcal/mol (hydrophobic interactions) and hydrogen bond energy −41.90 kcal/mol, which are considered as the most significant calculation to assess the binding stability. Based on the molecular docking results of this study, it was proven that the selected csAVPs (especially csAVP4 with the best overall docking results) could be strong potential candidates for SARS-CoV-2 3CLpro inhibitors in controlling COVID-19 disease.

Molecular Dynamics Simulation of 3CLpro-csAVP4 Complex
Since molecular docking and molecular dynamics methods can provide such valuable insights regarding the physicochemical properties of bioactive molecules, they are commonly used as a virtual screening strategy. These simulations can provide information about potential drug candidates' interactions and reactivity with protein targets. As discussed in the previous section, the binding affinity docking scores, non-toxicity, and non-allergenicity were taken into consideration, and csAVP4 was selected as the most promising candidate to be a coronavirus 3CLpro inhibitor. Thus, the molecular dynamics simulation of csAVP4 binding to the 3Clpro backbone was performed for 150 nanoseconds to examine the conformational stability and fluctuation analysis of the complex. The stabil-ity of the csAVP4 peptide and 3Clpro complex was estimated by RMSD, Rg, and RMSF trajectory analysis.
The analysis of the root-mean-square deviation (RMSD) profile is necessary to define the compactness of protein after the ligand-induced fit into the protein complex [68,69]. In particular, the simulation of the RMSD profile is based on the atomic coordinates of backbone atoms from the protein and ligand trajectories. The low fluctuation pattern of the RMSD profile represents the higher stability of the interested protein-peptide complex [70].
The RMSD values of the csAVP4 ligand, 3CLpro protein backbone, and their complexes were remaining stable in the range of 3-6 Å ( Figure 6). As shown in Figure 6A, the 3CLpro complex with csAVP4 was quite rigid, with less than 4 Å RMSD (blue line), and they had a similar trend along the 150 ns simulation time as the apo form of the protein backbone (red line) and the csAVP4 ligand (black line) for the last period of dynamics. The result indicated that the enzyme-inhibitor complex remained stable after a certain period of time. To define the structural activity of the enzyme-inhibitor complex, the radius of gyration (Rg) of the involved trajectories was also simulated ( Figure 6B). The Rg value fluctuated according to the folding state of the 3CLpro-csAVP4 complex. Low fluctuations were observed in the range of 25.5-26.5 Å, indicating the stability of the 3CLpro protein backbone while binding with csAVP4. After that, the RMSF profiles of the 3CLpro-csAVP4 complex were also generated to determine the conformational stability of the protein-peptide complex ( Figure 6C). The low fluctuation of coordinates in the range of 5-25 Å indicates the high stability of the protein-peptide complexes. Lastly, the hydrogen bond involvements were also analyzed to estimate the dynamic equilibration of the 3CLpro-csAVP4 complex. The hydrogen bonding profile with a high number of hydrogen bonds during the simulation period indicated the stable binding of csAVP4 with the target 3CLpro enzyme ( Figure 6D). In conclusion, these molecular dynamics profiles demonstrated the prolonged and robust binding of csAVP4 to the target coronavirus 3CLpro and the involvement of potential binding energies with the correlation of molecular dynamics profiling and the stability of the 3CLpro−csAVP4 complex.

Similarity of the csAVP4 against the Known Anti-Coronavirus Peptides Database
To ensure that the sequence of the selected bioactive peptide has not been reported as an anti-coronavirus peptides (ACovPs) before, the csAVP4 sequence was used as a query peptide to search against the anti-coronavirus peptides database (ACovPepDB) [71]. Using the ACovPBLAST option, the matching results of csAVP4 to the eight ACovPs stored in ACovPepDB are listed in Table S2. Even though the %identity of the short focusing sequence was high (26-100%), the percentage of matching residues (%Matching) calculated from the number of residues matched on csAVPs was low (only 10-30%). The matching residues are shown in red, indicating the very short overlap between the query peptide and the known ACovPs. The results indicated that our selected AVP is a potential novel ACovP that has never been reported before. Notably, additional comparative structural analysis and optimization could be used to improve antiviral activity against coronaviruses, specifically via the inhibition of the 3CLpro enzyme.

Materials and Methods
According to the bioinformatic pipeline in Figure 1, a computer-aided virtual screening workflow with in silico validation by molecular docking and molecular dynamic simulations has been developed and proposed. The workflow begins with predictive digested peptidomes of the five most abundant proteins in hemp seed (Cannabis sativa): edestin 1, edestin 2, edestin 3, vicilin, and albumin. Then, the putative trypsinized peptidome was established and used as the input data sets of 127 peptides until the selection of the antiviral peptides with the most proper predicted scores of unique Cannabis sativa antiviral peptides (csAVPs) without probable toxicity and allergenicity side effects to the host cells was performed.

Predictions of Allergenicity, Toxicity, and Physicochemical Properties
The theoretical allergenicity of the peptide was predicted using the Allergen FP v.1.0 (https://ddg-pharmfac.net/AllergenFP/, accessed on 23 January 2022) and AllerTOP v.2.0 (https://www.ddg-pharmfac.net/AllerTOP/, accessed on 23 January 2022) webservers to evaluate whether the selected csAVPs are probable allergens [58,59]. ToxinPred (https: //webs.iiitd.edu.in/raghava/toxinpred/protein.php, accessed on 23 January 2022) was used to predict whether the peptides are cytotoxic to the host cells or not. All 4 SVM-based methods with a default threshold value of 0.0 were chosen for predicting the toxicity, and values greater than 0.0 were considered as toxic. All calculated scores of physicochemical properties of selected csAVP candidates were also obtained by the ToxinPred (https://webs. iiitd.edu.in/raghava/toxinpred/protein.php, accessed on 23 January 2022) webserver in the batch submission option, and all possible physicochemical characteristics were selected.

IC 50 Prediction
The half-maximal inhibitory concentration (IC 50 ) of the peptide's antiviral activity was predicted using the AVP-IC 50 Pred server (http://crdd.osdd.net/servers/ic50avp/, accessed on 10 February 2022) [36] by selecting RSV/INFV/HSV for virus-specific prediction mode. The AVP-IC 50 Pred server's Hybrid Models I and II were also used as prediction models for antiviral activity determination with all default setting parameters.
The amino acid sequences of all selected csAVPs (csAVP1 to csAVP10) were docked to the coronavirus 3CLpro enzyme using both template-based and template-free docking webservers. The template-based molecular docking simulation was performed by GalaxyPep-Dock (http://galaxy.seoklab.org/pepdock, accessed on 25 January 2022) [75], while the HPEPDOCK (http://huanglab.phys.hust.edu.cn/hpepdock/, accessed on 26 January 2022) was used to estimate the template-free molecular docking scores. The docking results of the best model and hydrogen bond finding were visualized by UCSF Chimera 1.16 program [76]. After that, PIMA webserver [77], available at http://caps.ncbs.res.in/pima (accessed on 1 February 2022), and PRODIGY server [78] (https://wenmr.science.uu.nl/prodigy, accessed on 1 February 2022) were employed to investigate the protein-peptide interface interactions involved in the affinity and binding energy.

Molecular Dynamics Simulations Study
The Amber ff14DB force field and Amber16 software package [79] were used to perform MD simulations of the 3CLpro-csAVP4 complex. The TIP3P water model was used to solvate the system at a distance 10 Å from the protein, and sodium ions were added to neutralize the simulated systems. The initial conformations were heated to 300 K with a canonical ensemble (NVT) for 100 ps before being equilibrated for another 1200 ps. Then, until 300 ns of the production run, all-atom MD simulations were performed under the isothermal-isobaric ensemble (NPT) at 1 atm and 300 K with a simulation time step of 2 fs. The Berendsen barostat [80] with a pressure-relaxation time of 1 ps and the Langevin thermostat [81] with a collision frequency of 2 ps −1 were used to maintain pressure and temperature during MD simulation, respectively. The SHAKE algorithm [80] was used to constrain all chemical bonds involving hydrogen atoms, while the particle mesh Ewald's (PME) summation method [82] was used in the treatment of the long-range electrostatic interactions. The cut-off for non-bonded interactions was set at 10 Å. The CPPTRAJ module of AMBER16 was used to calculate structural analyses in terms of root-meansquare displacement (RMSD), root-mean-square deviation, root-mean-square fluctuation (RMSF), radius of gyration (Rg), and hydrogen bond profile investigated with during the 150 ns.

Similarity Searching of the csAVP4 against the Anti-Coronavirus Peptides Database
The ACovPBlast tool of the peptide database of anti-coronavirus peptides, or ACov-PepDB (http://i.uestc.edu.cn/ACovPepDB/index.html, accessed on 30 November 2022), was employed to search for similar peptides stored in the database. The amino acid sequence of csAVP4 was used as the query peptide. The default setting parameters had an expected value of 10 with a maximum result of 300, and the optimized parameters for short peptides (<15 residues) were selected for the blast search.

Conclusions
In this study, the selected csAVPs from our proposed bioinformatic computer-aided screening workflow were quite varied in length (6-39 amino acid residues) and physicochemical properties. These amphipathic csAVP candidates contain hydrophilic amino acid residues with cationic residues. The molecular docking performances could demonstrate that all csAVP candidates had strong hydrogen bonding with the coronavirus 3CLpro binding groove at Glu134, Gln177, Thr178, and Gln180 residues in domain 2 with strong bonding energy. Based on its high efficacy (very low predicted IC 50 ), high affinity and binding energy, and low possible side effects to the host cell, csAVP4 was selected as the best candidate for coronavirus 3CLpro inhibition among all selected csAVPs. Even though the conformational stability of the 3CLpro-csAVP4 complex was confirmed by its molecular dynamics profile, further in vitro and in vivo studies are required to authenticate the anti-COVID-19 potential of these csAVP candidates.

Conflicts of Interest:
The authors declare no conflict of interest.