Traditional and Computational Screening of Non-Toxic Peptides and Approaches to Improving Selectivity

Peptides have positively impacted the pharmaceutical industry as drugs, biomarkers, or diagnostic tools of high therapeutic value. However, only a handful have progressed to the market. Toxicity is one of the main obstacles to translating peptides into clinics. Hemolysis or hemotoxicity, the principal source of toxicity, is a natural or disease-induced event leading to the death of vital red blood cells. Initial screenings for toxicity have been widely evaluated using erythrocytes as the gold standard. More recently, many online databases filled with peptide sequences and their biological meta-data have paved the way toward hemolysis prediction using user-friendly, fast-access machine learning-driven programs. This review details the growing contributions of in silico approaches developed in the last decade for the large-scale prediction of erythrocyte lysis induced by peptides. After an overview of the pharmaceutical landscape of peptide therapeutics, we highlighted the relevance of early hemolysis studies in drug development. We emphasized the computational models and algorithms used to this end in light of historical and recent findings in this promising field. We benchmarked seven predictors using peptides from different data sets, having 7–35 amino acids in length. According to our predictions, the models have scored an accuracy over 50.42% and a minimal Matthew’s correlation coefficient over 0.11. The maximum values for these statistical parameters achieved 100.0% and 1.00, respectively. Finally, strategies for optimizing peptide selectivity were described, as well as prospects for future investigations. The development of in silico predictive approaches to peptide toxicity has just started, but their important contributions clearly demonstrate their potential for peptide science and computer-aided drug design. Methodology refinement and increasing use will motivate the timely and accurate in silico identification of selective, non-toxic peptide therapeutics.


Peptide Drugs Market and Discovery: A Bird's Eye View
Peptides are gaining traction on the new drug development agenda, and their number in the clinics grows annually [1]. The hypoglycemic hormone insulin stands out as a pioneering peptide in the medical industry that opened space for the search and applications of these small molecules in the pharmaceutical, diagnostic, cosmeceutical, clinical, and Main approaches employed to evaluate the toxicity of peptides. Traditional screening of non-toxic peptides is performed using different in vitro techniques, including MTT, LDH, erythrocyte lysis, and ATP-based assays. These methods are based on the measurement of intracellular markers released during cell death or lysis, such as hemoglobin (red blood assay), enzymes (LDH assay), or on the analysis of cell viability determined by enzymatic activity, measured, for example, by the MTT assay or by the amount of cell energy (ATP-based assay). Recently, computational models were reported to assist in peptide toxicity prediction.
Computational tools have revolutionized at a tremendous speed the chemical, biological and pharmaceutical fields, including peptide science [30,31]. In the last decade, the development of such predictive tools has permitted the discovery of novel toxic and non-toxic peptides as well as the design of analogs with reduced toxicity [32]. Recent machine learning (ML)-driven methods [33][34][35][36][37][38][39] predicting the peptide hemolytic action are described in this review. Such methods are considered cost-effective and time-saving strategies to support the development of peptide-based drugs. Current predictors are limited and possibly biased by the number of peptide sequences, their diversity, and the associated biological data [40]. The in silico predictive technologies of the cytotoxic action of peptides are still in their infancy but have offered ample opportunities that reduce the number of expensive failures. This work first discusses the primary cell model used to study the toxicity of peptide candidates and how new data-driven computational methods have been crucial to understanding structure-activity relationships (SAR) and contributing to the selection of possible safe peptide templates for synthesis and evaluation.
Many bibliographic studies have covered the therapeutic effects of peptides in the most diverse areas [41][42][43][44], including the discussion of market trends, current challenges, and prospects [1,5,45]. On the other hand, several reviews have recently highlighted the development of in silico methods to support the discovery of antimicrobial peptides [46][47][48]. However, to our knowledge, the computational advances in the field of peptide toxicity, although extremely important, have not been thoroughly addressed yet. Therefore, here, we focused on the in silico frameworks that speed the discovery and design of non-toxic peptides. We started by documenting the applicability of the standard hemolysis assay in initial hit screening. We then described reported predictive models for hemolytic activity. This review integrates the recent computational advances that support the identification, design, and synthesis of non-toxic peptides with greater probabilities of clinical translation. Lastly, we discussed possible directions and perspectives on how computational advances should shape the future of peptide-based drugs and the Main approaches employed to evaluate the toxicity of peptides. Traditional screening of non-toxic peptides is performed using different in vitro techniques, including MTT, LDH, erythrocyte lysis, and ATP-based assays. These methods are based on the measurement of intracellular markers released during cell death or lysis, such as hemoglobin (red blood assay), enzymes (LDH assay), or on the analysis of cell viability determined by enzymatic activity, measured, for example, by the MTT assay or by the amount of cell energy (ATP-based assay). Recently, computational models were reported to assist in peptide toxicity prediction.
Computational tools have revolutionized at a tremendous speed the chemical, biological and pharmaceutical fields, including peptide science [30,31]. In the last decade, the development of such predictive tools has permitted the discovery of novel toxic and non-toxic peptides as well as the design of analogs with reduced toxicity [32]. Recent machine learning (ML)-driven methods [33][34][35][36][37][38][39] predicting the peptide hemolytic action are described in this review. Such methods are considered cost-effective and time-saving strategies to support the development of peptide-based drugs. Current predictors are limited and possibly biased by the number of peptide sequences, their diversity, and the associated biological data [40]. The in silico predictive technologies of the cytotoxic action of peptides are still in their infancy but have offered ample opportunities that reduce the number of expensive failures. This work first discusses the primary cell model used to study the toxicity of peptide candidates and how new data-driven computational methods have been crucial to understanding structure-activity relationships (SAR) and contributing to the selection of possible safe peptide templates for synthesis and evaluation.
Many bibliographic studies have covered the therapeutic effects of peptides in the most diverse areas [41][42][43][44], including the discussion of market trends, current challenges, and prospects [1,5,45]. On the other hand, several reviews have recently highlighted the development of in silico methods to support the discovery of antimicrobial peptides [46][47][48]. However, to our knowledge, the computational advances in the field of peptide toxicity, although extremely important, have not been thoroughly addressed yet. Therefore, here, we focused on the in silico frameworks that speed the discovery and design of non-toxic peptides. We started by documenting the applicability of the standard hemolysis assay in initial hit screening. We then described reported predictive models for hemolytic activity. This review integrates the recent computational advances that support the identification, design, and synthesis of non-toxic peptides with greater probabilities of clinical translation. Lastly, we discussed possible directions and perspectives on how computational advances should shape the future of peptide-based drugs and the multidisciplinary drug development process. The most frequent approaches to increasing the selectivity and safety of therapeutic peptides are also reviewed.

Red Blood Cells as a Standard Model for Toxicity Assessment of Peptides
For a new peptide to be considered a promising therapeutic candidate, it must have minimal cytotoxic effects on healthy host cells [49]. The value of in vitro studies for toxicity prediction is very high since it allows obtaining baseline information on the harmful effects, allowing better administration and directing of resources to other study stages [50,51]. In other words, the in vitro toxicity assessment of a natural [52] or a synthetic compound [53] is the first step to be carried out in order to consider new pharmaceutical formulations at the in vivo scale and toward clinical trials [54].
As previously mentioned, there are several methodologies for determining the in vitro toxicity of a compound. Although the activities or characteristics obtained vary between methods, they basically are indicators of membrane rupture and cell death [45,[55][56][57]. Temporal analysis, data interpretation, and sensitivity are other important parameters that vary between techniques, including bioactive peptides studies [55]. Given this heterogeneity, there is no universal answer as to which is the most effective methodology. Some parameters are more advantageous than others, and their use will depend on different conditions (resources, time, and reproducibility). The aforementioned in vitro assays also have shortcomings that must be addressed before selecting the most suitable one(s) according to the initial objectives. Generally, the optimal technique presents high sensitivity, simple reproducibility, rapid results generation, and is cost-effective [58]. For this reason, hemolysis assay has been the reference protocol for the early toxicity screening phase [59].
Erythrocytes are the most abundant and crucial cells in the circulatory system, given their vital oxygen-carrying function. They are very particular cells, as they lack a nucleus and organelles with a membrane inside [60]. One can put into question the validity of their use as a standard and valuable in vitro model for cytotoxicity assessment [61], which is justified by their abundance in organisms, easy cultivation, and availability in significant amounts [54]. The isolation and cultivation of other cell types often present higher complexity, i.e., rat intestinal epithelial cells and human umbilical vein endothelial cells [62] or the use of mice-derived macrophages [9]. In addition, one can take advantage of the lack of internal membrane structures in red blood cells (RBCs), which can facilitate the standardization and interpretation of results. In fact, despite normal nucleated cells not being comparable to the structural simplicity of RBCs, these can experience morphological and quantitative alterations that reflect significant damage and can act as a predictive marker for the toxic impact of test compounds [59,63,64]. These features make the RBCs a cheap, fast, and effective template for evaluating toxicity.
Hemolysis is a natural or disease-induced event potentially caused by novel small molecule drug candidates or cationic peptides [65][66][67]. Briefly, hemolytic assays determine the disruption and destruction of RBCs and have been promoted not only due to their simplicity but also because of the structural and biochemical composition similarities between RBC membranes and those of other human cells [68]. In percentage terms, the erythrocyte membrane is made up mostly of proteins (39.5%), followed by lipids (35.1%), water (19.5%), and carbohydrates (5.8%) [59]. The interplay between their components, lipid composition, and high oxygen tension makes RBC membranes ideal models for studying disturbances caused by oxidative stress and induced by an external molecule [63].
Human-derived RBCs are the first line of use; however, some studies have used erythrocytes from other animals, such as cows [55], sheep [69], rats [70], pigs [71], dogs [29], and rabbits [72]. Dennison and Phoenix [73] have demonstrated that the hemolysis produced by the Modelin-5-CONH 2 peptide (300 µM) was 12% for sheep RBCs and 2% for those from humans and pigs. This observation has been attributed to the differences in the contents of phosphatidylcholine and sphingomyelin in the erythrocytes membranes. In general lines, sheep RBCs have a lower percentage of phosphatidylcholine and a higher percentage of sphingomyelin than human and porcine RBCs, demonstrating that those components can be key mediators for the hemolytic action of the peptide. In agreement with this investigation, Greco et al. [29] measured the activity of 24 synthetic peptides on RBCs of dogs, humans, rats, and cows. They noted heterogeneous reactions among Pharmaceuticals 2022, 15, 323 5 of 27 these species' cells, being those from dogs the most susceptible. In addition to interspecies variations in erythrocyte membrane components and their organization, there are also dissimilarities in the abundance of ion channels and aquaporins, resulting in differences in tolerance to sudden changes in permeability and osmotic balance [74,75].
The general hemolytic assay procedure is based on the exposure of the erythrocytes to a specific agent (in this case, peptides) at a selected range of concentrations for the subsequent spectrophotometric quantitation of released hemoglobin at a given wavelength [76]. Usually, measurements are performed at 405 [29], 414 [28], 450 [77], 540 [54], and 576 nm [73], among other wavelengths, taking advantage that the levels of hemoglobin are directly proportional to the number of RBCs that have been lysed. The absorption values obtained for the negative (0% hemolysis) and positive (100% hemolysis) controls are taken as a reference to calculate the percentages of hemolysis, which are analyzed and compared using standard statistical tests, e.g., one-way ANOVA and Tukey's test [9], t-test [78], Boltzmann sigmoidal equation [79], Mann-Whitney U-test [80], and others. Generally, the percentage of hemolysis at a given concentration is obtained by the following equation [28,77]: where Abs represents absorbance. In some studies, there are minimal variations to this formula [29,53,81,82]. Triton X-100 and melittin are the most recognized positive controls [28,53,83]. Melittin is a cationic 26-mer membrane-binding polypeptide very abundant in Apis mellifera venom proteome. This amphipathic molecule with a basic C-terminal region induces a high lytic activity on a large number of cell types and at very low concentrations [84]. However, its chemical synthesis and purification may represent an extra cost or time if the laboratory does not have the required equipment. Triton X-100 is often used as an alternative positive control for hemolysis. This nonionic membrane-damaging detergent is cheap, commercially available in high purity, and does not interfere with the spectrophotometric measurements [28].
Some studies take the selectivity index (SI) as the most suitable indicator for drug safety, as it broadens the vision by giving a two-dimensional aspect that integrates therapeutic and toxic components. The SI is the ratio between the concentration that is toxic to 50% of the reference host healthy cells in a cytotoxicity assay (HC 50 in the case of hemolysis assays) and the concentration that causes the desired therapeutic action on the target cells (e.g., concentration causing growth inhibition on 50% of pathogen cells, or IC 50 ). Hence, the SI reflects the therapeutic window between toxicity and biological effect [93]. High values drive the next steps in evaluating the test drug, e.g., bioactive peptides. Some examples of antimicrobial peptides with high SI are presented in Table 1. The hemolysis assay is a standard technique widely used in toxicity screening of drug candidates, especially peptides. The abundance and easy obtainment of RBCs, together with the simplicity of the experiment, contribute to its prioritization in toxicity studies. The RBCs lysis protocol involves a colorimetric assay, which determines the amount of hemoglobin released after peptide-induced cell damage. Serial dilutions of the peptides are first prepared in parallel with the RBCs suspension, which is obtained by centrifugation and dilution. Then, the peptides, positive, negative, and, eventually, other controls are incubated with the RBCs solution to deliver the raw data that is next analyzed and translated into an HC50.
Some studies take the selectivity index (SI) as the most suitable indicator for drug safety, as it broadens the vision by giving a two-dimensional aspect that integrates therapeutic and toxic components. The SI is the ratio between the concentration that is toxic to 50% of the reference host healthy cells in a cytotoxicity assay (HC50 in the case of hemolysis assays) and the concentration that causes the desired therapeutic action on the target cells (e.g., concentration causing growth inhibition on 50% of pathogen cells, or IC50). Hence, the SI reflects the therapeutic window between toxicity and biological effect [93]. High values drive the next steps in evaluating the test drug, e.g., bioactive peptides. Some examples of antimicrobial peptides with high SI are presented in Table 1.  The hemolysis assay is a standard technique widely used in toxicity screening of drug candidates, especially peptides. The abundance and easy obtainment of RBCs, together with the simplicity of the experiment, contribute to its prioritization in toxicity studies. The RBCs lysis protocol involves a colorimetric assay, which determines the amount of hemoglobin released after peptide-induced cell damage. Serial dilutions of the peptides are first prepared in parallel with the RBCs suspension, which is obtained by centrifugation and dilution. Then, the peptides, positive, negative, and, eventually, other controls are incubated with the RBCs solution to deliver the raw data that is next analyzed and translated into an HC50. A significant rule or consensus has not been established yet in the structural determinants underlying peptides' hemolytic activity [28]. This understanding is complex, requiring multidisciplinary efforts. The clues collected to date point to a preponderant role of peptide charge, amphipathicity, and hydrophobicity [91,101], contributing to the stabilization of an amphipathic secondary structure [102]. Of the 20 natural amino acids, tryptophan has been touted as a critical residue since its presence in a given peptide mediates molecular interactions with cholesterol present in mammalian membranes and, consequently, their disturbance or even disruption [103][104][105]. On the other hand, positively charged amino acid residues such as arginine and lysine have also been shown to influence RBCs membrane damage. Dathe et al. [106] evaluated the activity of analogs of magainin-II with variable charge, demonstrating that the increase in this parameter (+5) enhances the antimicrobial activity. However, this must be carefully analyzed when generating peptide analogs so that modifications to greater interaction with pathogenic targets do not increase lytic effects on host cells.
The basis of the hemolytic assay allows the evaluation of several peptide concentrations simultaneously, reducing time and costs, as well as enabling easy reproducibility and a significant reduction in the use of in vivo models in concordance with bioethical concerns in animal experimentation [58]. Notably, the human RBCs are a valuable and efficient cellular model to obtain a rapid in vitro approximation of the toxic damages of a peptide in the body, as well as to uncover patterns and (cellular/molecular) mechanisms that can directly influence peptide action [107,108]. An example of this is the study carried out by Ahmad et al. [109], in which the antibacterial and hemolytic activity of a peptide derived from the "leucine zipper" structural motif coined LZP and six analogs were analyzed. The native LZP peptide induced the highest percentage of hemolysis in the 0-30 µM range. The analogs, which possessed replacements of leucine by alanine residues at specific positions, had significantly reduced hemolytic activity; namely, the LZP (L8A/L11A), LZP (L4A/L8A), and LZP (L4A/L11A) analogs induced a percentage of hemolysis close to 0% in the same concentration range. Antibacterial activity of the native peptide and all its analogs did not change significantly, remaining in the 5.6-7.8 µM range against different bacteria.

Computational Tools and Databases for Hemolytic Activity Prediction
The expanding amount of available information on different peptide structures and their effects has made it possible to develop in silico prediction models on the hemolytic activity of a peptide, highlighting the most influential amino acids. For instance, Langham et al. [110] investigated the quantitative structure-activity relationships (QSAR) underlying the selectivity of five protegrin-like AMPs based on their main physicochemical properties. In brief, the authors demonstrated a strong correlation between the length, mean number of acceptors, and energy term of the β-hairpin peptides and their toxicities. Experimental analysis of toxicity revealed that the model accurately identified the most and least toxic peptide. The advent of these modern computational techniques is adding a further dimension and alternative to toxicity testing panels. These emerging tools effectively analyze multidimensional data, may recognize patterns, and formulate reasonable conjectures [111], generating a new paradigm for the early stages of peptide drug development.
Two seminal approaches preceded the current computational tools for unraveling the hemolytic activity of peptides. The first one occurred in 2009 when Naamati et al. developed the first classification model to find out whether or not animal proteins could be toxic [112]. The second milestone occurred in 2013 when Gupta et al. created ToxinPred as the first web server to estimate the toxicity of peptides [40]. Thereafter, the focus and development of toxicity prediction models were directed at the peptide hemolytic capacity. In 2016, Chaudhary et al. developed the first hemolytic peptide classifier called HemoPI (https://webs.iiitd. edu.in/raghava/hemopi/ (accessed on 25 January 2022)) [26]. The following year, Win et al. developed HemoPred (http://codes.bio/hemopred/ (accessed on 25 January 2022)) [33]. However, the rise of machine learning-guided prediction of peptide toxicity to RBCs occurred only in 2020, when three novel methods became available on the web for such purpose: HemoPImod (https://webs.iiitd.edu.in/raghava/hemopimod/ (accessed on 25 an open-access program in Python and compared different machine learning algorithms in which three models stood out as the optimal prediction of hemolytic activity [37]. The novelty of this study lies in establishing the applicability domain of hemolytic models using multivariate outlier detectors. In 2021, Capecchi et al. created another in silico model to elucidate the lytic capacity of peptides on blood cells [38], Yaseen et al. developed the most recent hemolysis model, which considers N/C-terminal modifications and L or D amino acids in a primary sequence [113]. In the same year, a web server was established based on a new peptide toxicity predictor named ATSE. Figure 3 summarizes this chronology, comprising 10 different algorithms to predict peptides' hemolytic activity. (accessed on 25 January 2022)) [33]. However, the rise of machine learning-guided prediction of peptide toxicity to RBCs occurred only in 2020, when three novel methods became available on the web for such purpose: HemoPImod (https://webs.iiitd.edu.in/raghava/hemopimod/ (accessed on 25 January 2022)) [34], HLPpred-Fuse (http://thegleelab.org/HLPpred-Fuse/ (accessed on 25 January 2022)) [35], and HAPPENN (https://research.timmons.eu/happenn (accessed on 25 January 2022)) [36]. In addition to these, also in 2020, Plisson and co-workers developed an open-access program in Python and compared different machine learning algorithms in which three models stood out as the optimal prediction of hemolytic activity [37]. The novelty of this study lies in establishing the applicability domain of hemolytic models using multivariate outlier detectors. In 2021, Capecchi et al. created another in silico model to elucidate the lytic capacity of peptides on blood cells [38], Yaseen et al., developed the most recent hemolysis model, which considers N/C-terminal modifications and L or D amino acids in a primary sequence [113]. In the same year, a web server was established based on a new peptide toxicity predictor named ATSE. Figure 3 summarizes this chronology, comprising 10 different algorithms to predict peptides' hemolytic activity. Historical overview of the development of freely available tools and models for prediction of peptide toxicity. Big biomedical peptide data have been explored to design new predictive methods that facilitate adequate access to the full potential of peptides. Despite the many years of peptide science, our literature review demonstrates that these in silico approaches are relatively new. From the pioneering and innovative ClanTox [112] and ToxinPred [40] launched in 2009 and 2013, respectively, ten high-throughput computer toxicity prediction tools were developed that mainly predict peptides' hemolytic effects. Most of them have been released in the last 5 years. Capecchi et al. [38] and HemoNet [113] are the latest hemolytic classifiers. Some predictors such as HAPPENN [36] and HemoPI [26] have more than one version. HemoPI has 5 SVM methods, while HAPPENN is composed of 3 methods. However, due to the difference in performance reported by the authors, for this chronology, we considered only the best-in-class performance methods. The three Plisson models [37] were considered due to high similarity in performance metrics. Peptide toxicity predictors are highlighted in blue, and the classifiers for predicting peptides' hemolytic activity are colored in red.
The emergence of peptide toxicity predictive programs is probably due to: (i) the ease of access to high-performance computers to process biological information; (ii) a better scope and understanding of peptide structural and functional characteristics; and (iii) an increase in the elaboration of databases of both AMPs and hemolytic peptides.
Concerning the second point (ii), it is notorious how the number of parameters used in predictive algorithms has increased dramatically from 2013 to 2020. For example, ToxinPred only uses the composition of amino acids, dipeptides, and structural motifs as . Historical overview of the development of freely available tools and models for prediction of peptide toxicity. Big biomedical peptide data have been explored to design new predictive methods that facilitate adequate access to the full potential of peptides. Despite the many years of peptide science, our literature review demonstrates that these in silico approaches are relatively new. From the pioneering and innovative ClanTox [112] and ToxinPred [40] launched in 2009 and 2013, respectively, ten high-throughput computer toxicity prediction tools were developed that mainly predict peptides' hemolytic effects. Most of them have been released in the last 5 years. Capecchi et al. [38] and HemoNet [113] are the latest hemolytic classifiers. Some predictors such as HAPPENN [36] and HemoPI [26] have more than one version. HemoPI has 5 SVM methods, while HAPPENN is composed of 3 methods. However, due to the difference in performance reported by the authors, for this chronology, we considered only the best-in-class performance methods. The three Plisson models [37] were considered due to high similarity in performance metrics. Peptide toxicity predictors are highlighted in blue, and the classifiers for predicting peptides' hemolytic activity are colored in red.
The emergence of peptide toxicity predictive programs is probably due to: (i) the ease of access to high-performance computers to process biological information; (ii) a better scope and understanding of peptide structural and functional characteristics; and (iii) an increase in the elaboration of databases of both AMPs and hemolytic peptides.
As for the last point (iii), over 10 databases (DBs) of AMPs have been made publicly available on the web. These DBs are compiled in Table 2, along with four examples of DBs of hemolytic/toxic peptides (HLPs) that have been developed to set forth toxicity data. The first of these four DBs (https://webs.iiitd.edu.in/raghava/toxinpred/dataset.php (accessed on 25 January 2022)) was created during the development of the ToxinPred server and focuses on describing toxicity in general [40]. It was developed using toxic proteins/peptides obtained from different databases. The second DB, Hemolytik (http: //crdd.osdd.net/raghava/hemolytik/ (accessed on 25 January 2022)), focuses on peptides' hemolytic activity [138]. The HemoPiMOD DB also details hemolytic activity (https://webs.iiitd.edu.in/raghava/hemopimod/download.php (accessed on 25 January 2022)) but focuses on modified peptides instead [34]. The DBAASP-Hemo DB is a subset of peptides from the Database of Antimicrobial Activity and Structure of Peptides (DBAASP; https://dbaasp.org/ (accessed on 25 January 2022)) whose hemolytic activity is also displayed [38]. It should be noted that five (HemoPI, HemoPred, HLP-pred-Fuse, HAPPENN, and Plisson models) out of the eight algorithms thus far developed to predict peptides hemolytic action took data from the Hemolytik DB and were subjected to the criteria set by Chaudhary et al. [26].

Scheme and Scope of Hemolytic Classifiers
To generate an in silico model to unravel the hemolytic activity of peptides, researchers generally focus on the following stages: (1) pre-processing of the data sets chosen for the classification models; (2) sampling and preparation of training and test data sets; (3) development, selection, and validation of predictive models; (4.) application of predictive models. Although generalist, these stages are shared by most modern computational tools with some modifications.
In the first stage, the developers select the peptide sequences from online DBs that they are going to use to build their supervised models. To date, the eight aforementioned models have been using one of the following peptide libraries; Hemolytik [26,33,35,37,113], HemoPiMOD [34] or DBAASP-Hemo [36,38,113]. Each sequence is then labeled with the property or activity of interest (e.g., hemolytic or non-hemolytic). For example, a binary classifier has two classes where a value of 1 indicates hemolytic peptides, whereas a value of 0 stands for their non-hemolytic counterparts or vice-versa. Of note, most hemolytic predictors were developed using (binary) classification algorithms and not regressions due to the discrepancy in biological data (e.g., HC 50 ) from the many research laboratories. In general, QSAR models such as these hemolytic predictors are based on the hypothesis that there is a mathematical relationship between the biological activity or property (e.g., hemolytic activity) and the diversity of bioactive peptide sequences. The developers will then identify and measure a series of features/variables that approximate the differences, such as various local or global physicochemical descriptors, amino acid compositions (i.e., single residues, k-mers, etc.), atomic composition, or other structural motifs [26,[33][34][35][36][37][38]113]. These features or variables can be easily accessed once the peptide sequences are encoded using different programs in Python [34][35][36][37][38]113], Motif-EmeRging and with Classes-Identification (MERCI) [26] and indirectly with Simplified Molecularinput Line-entry System (SMILES) and RDKit or modlamp packages [34]. The last step in data pre-processing requires removing duplicated information or missing values and data normalization. For example, in HAPPENN, data reduction was employed when the sequence similarity was higher than 90%, according to the CD-hit software [36]. In the development of the Plisson models, the databases associated with HemoPI-1, HemoPI-2, and HemoPI-3 were cleaned of missing data, duplicates, and later normalized [37]. Finally, many classification models must take into consideration balancing the distribution of their classes, using sampling methods to avoid possible sampling bias. The Capecchi model used 2907 inactive peptide sequences to balance the classes, where 1453 were designed based on the same data subset length distribution, and 1454 were randomly generated sequences [38].
During the second stage; data preparation, the developers divide the labeled peptide data set into two subsets; the main model/training set (75-90% of the whole data set) used for model building and one smaller data set (25-10%) used as external validation [152]. To minimize the risks of overfitting, the model set is often subjected to cross-validation. For instance, in ten-fold cross-validation, sequences are randomly divided into 10 subsets (folds): 9 sets train the models, and the remaining set is the internal testing set.
In the last stage, the best models are applied against an unlabeled external library of natural peptides or randomly generated sequences (testing set) to predict their hemolytic activity. Usually, peptides are divided into classes (0: non-hemolytic and 1: hemolytic peptide). Additionally, a class probability p is assigned, which reflects the chance to belong to a given class (probabilistic prediction values can range from 0.00 to 1.00). These scores are converted into binary classification values employing a threshold, such as 0.5. Thus, for example, the three models developed by Plisson (GBoost, LDA, and XGBoost) were later applied to the APD3 database. The predictive results showed that ≈70% of the 3081 natural peptides evaluated are able to induce hemolysis [37]. On the other hand, Capecchi et al. used their adjusted generative models to sample 50,000 amino acid sequences. Nonhemolytic peptides with antibacterial activity were filtered. In summary, 3046 peptides were considered as antimicrobial agents against Gram-negative bacteria, while 2717 sequences were predicted to be active against Gram-positive [38]. Table 3. Accuracy, Matthew's correlation coefficient, summary of data set, and evaluation strategies of the models used by each of the ten programs used to predict the hemolytic activity of peptides. Accuracy and MCC are only reference values and are not comparable between different models unless they have used the same database. HemoPiMod is the only hemolytic action prediction server that takes into account chemical modifications of peptides. HemoNet is the only server that focuses on C/N terminal and L or D amino acid modifications to perform classification. All other tools predict hemolytic activity from natural sequences without any modification. A total of 8 of the 10 machine learning models can predict hemolysis for natural peptide sequences. HemoPiMod classifies the hemolytic activity of chemically modified peptide sequences [34], and HemoNet predicts the hemolytic action by peptides that include modifications in the C/N termini or L and D amino acids [113]. Classification models have a minimum accuracy of 76% and a minimum MCC of 0.52. The highest values of Acc and MCC achieved 98.4% and 0.97, respectively. These results emphasize a high predictive power useful in the drug discovery process. Relevantly, despite addressing general toxicity, i.e., not explicitly targeted at RBCs, ToxinPred and ATSE were included in Table 3 due to their historical value to other models for peptide hemolytic activity [39,40].
All these predictive models are easily accessible and user-friendly, even for nonexperts in artificial intelligence with the development of web servers (API). The user does not need to have programming skills to run several peptide sequences simultaneously. Their responses to a query FASTA could be obtained within seconds. Some of these servers only allow one query (one peptide sequence) at a time. The more recent predictors such as HemoPred [33], HLPpred-Fuse [35], and the programs created by Plisson et al. [37], Yaseen et al. [113], and Capecchi et al. [38] perform the predictions against sizable libraries.
The latter examples require Python skills to predict multiple sequences.
In principle, most developers of these computational strategies only validated their generative models through in silico approaches using statistical parameters calculated based on reliable data sets that include experimental data from hemolytic and non-hemolytic peptides previously characterized. The investigation performed by Capecchi and collaborators is an exception [38], which also contains experimental verification. In this study, a combination of supervised and unsupervised learning aided the selection of a library of peptide sequences of maximum of 15 residues for chemical synthesis. The evaluation of the hemolytic and antimicrobial effects of such short peptides reaffirmed the potential of virtual strategies to guide the discovery of non-toxic antibiotic candidates. In line with experimental validation, researchers have incorporated modern in silico strategies into their workflow [38]. Next, we review some examples.
Mnif and collaborators evaluated through in silico and in vitro approaches the hemolytic properties of a 19-mer cell-penetrating peptide with antibacterial activity against Staphylococcus epidermidis. The hemolysis analysis confirmed the non-toxicity suggested by HemoPred online software [153]. Similarly, the hemolytic results of cecropins are in close agreement with previous conclusions from this web server [154]. In silico screening of the properties of peptides from the genome of Lactobacillus casei HZ1, including bioinformatics analysis by HemoPI, allowed the identification of a highly active AMP against S. aureus [155] and an anticancer sequence [156], both with low hemolytic effects. Hemolytic Peptide Identification Server has also been used to design non-toxic agents for use in aquaculture. RY12WY, a high thermostable peptide, was synthesized, and its hemolytic potency was evaluated. Its low hemolytic tendency is consistent with virtual analysis [157]. Despite most successful cases, some incorrect predictions have been reported. HemoPred classified Enterocins K1 and EJ97 as hemolytic and non-hemolytic, respectively. However, through a traditional evaluation of hemolytic nature, Reinseth and colleagues demonstrated that both Enterococcus spp. bacteriocins are non-hemolytic peptides [158]. Insignificant erythrocyte lysis was observed up to a concentration of 1 mg/mL. Taken together, these findings underline the usefulness of computational technologies in peptide drug discovery. New peptide sequences, experimental validations, and updated databases should improve the reliability of the results.

Case Study
The high performance of hemolytic activity methods enables accurate screening of non-toxic peptides. The use, evaluation, and refinement of models are key to new advances in computational peptidology. In this direction, to further explore the model performances, we predicted the hemolytic activity of peptides using seven of the computational methods presented in Table 3. Seven data sets based on HemoPI-1 (main/validation), HemoPI-2 (main/validation), HemoPI-3 (main/validation) and HAPPENN were employed to benchmark the main models. These data sets were chosen to represent the diversity and criteria used to generate the virtual screening tools. The first data set consists of peptides with hemolytic action or lack thereof. The second is formed by high hemolytic and non-hemolytic peptide sequences. Similarly, HemoPI-3 discriminates highly and poorly hemolytic peptides. The last data set, HAPPENN, is composed of both hemolytic and non-hemolytic peptides, including sequences with N-terminal acetylation and C-terminal amidation. HAPPENN is recognized as a high-quality and well-established data set [113].
Our initial assessment represents a closer look into state-of-the-art prediction accuracy for the hemolytic nature classification task. However, future analyses are required to elucidate possible biases, as well as the standardization of the classification of non-hemolytic and hemolytic peptides used for the construction of data sets. Some of them include peptides with low hemolytic as non-hemolytic activity. In two-layer prediction frameworks such as HLPpred-Fuse, these peptides are classified as low-intensity hemolytic [35]. The concentration limits between non-hemolytic and low hemolytic activity differ significantly in some studies [36]. In our analysis, non-hemolytic and poor hemolytic peptides were considered as negative examples. Motivated by these considerations, HLPpred-Fuse was challenged considering the second-layer prediction (low/high) results, except for analysis of HemoPI-1 data sets. In this case, the first-layer prediction results were used.
Only peptides consisting of 7-35 residues in length and made of natural amino acids were selected for our bioinformatics screening. In brief, this approach produced seven data sets: HemoPI-1 7-35main, HemoPI-1 7-35val, HemoPI-2 7-35main, HemoPI-2 7-35val, HemoPI-3 7-35main, HemoPI-3 7-35val, and HAPPENN 7-35, which are composed by 846,207,765,190,1175,294, and 1547 sequences, respectively. Details of the composition (positive and negative) of each data set are summarized in Supplementary Table S1. All data is freely available on GitHub at https://github.com/albert-robles1101/hemolyticprediction-of--peptides,accessed on 25 January 2022. The screening was repeated five times, and the classification was assigned according to most predictions. (0: non-hemolytic and 1: hemolytic). Divergent analyzes were observed in HemoPred for some peptide sequences. Four widely used metrics, including Acc, sensitivity (SN), specificity (SP), and MCC, were adopted to evaluate the performance of selected models. These statistical parameters were calculated as follows: where TP, TN, FP, and FN stand for the number of true positives, true negatives, false positives, and false negatives, respectively. Table 4 summarizes the prediction results of all seven high-throughput computational methods on the test data sets. In general, the models were able to distinguish hemolytic and non-hemolytic sequences. In some cases, best-in-class performance was achieved. Each program showed an Acc higher than 90% for at least one of the data sets. Most Acc and MCC values are in the ranges previously determined by the developers (Acc: 76.0%-98.4% and MCC 0.55-0.97). Some models, such as HemoPI-1 and Plisson models, showed low Acc and MCC values when challenged by HemoPI-2 (main/val) HemoPI-3 (main/val), and HAPPENN data sets. Our findings are in agreement with previous analysis reporting the better discriminative power on the hemolytic effect of peptides of HAPPENN [36] and HLPpred-Fuse [35] models than the HemoPi-1 server [26], particularly when they were applied to estimate the activity of peptide sequences derived from HemoPI-2 and HemoPI-3 data sets.
Since the models were not trained under the same conditions and data sets, a direct comparison is not completely adequate. This is a simple case study for an initial assessment to encourage the application of high-throughput screening of peptide libraries. The performance metrics determined for some predictors can be overestimated due to the similarity of some benchmark data sets and the peptide sequence data previously used in optimizing the hyperparameters of these computational methods. Because of that, future studies should consider different and large data sets. Additionally, for a fair comparison, the specifications of each model and their training data must be thoroughly explored and merged. However, collectively, our large-scale analysis confirms the applicability and robustness of virtual tools and their contributions to a sustainable and cost-effective design and discovery of non-toxic peptides. Table 4. Test results for state-of-the-art predictors based on seven data sets. HemoPI-1 7-35main, HemoPI-1 7-35val, HemoPI-2 7-35main, HemoPI-2 7-35val, HemoPI-3 7-35main, HemoPI-3 7-35val, and HAPPENN 7-35 data sets were employed to challenge the main predictors.

Current Strategies to Improve the Selectivity Index of Therapeutic Peptides
Tuning selectivity is a daunting task on the path toward developing peptide-based drugs. Determining peptide selectivity is a mandatory requirement during preclinical research [159,160]. As discussed above, selectivity, quantified by the SI, results from a finetuned fit between toxicity and biological effect. This delicate balance remains a challenge explored in many SAR studies [161,162]. For this reason, we discuss and illustrate the leading design and synthesis strategies used to reduce toxicity and increase the potency of peptides. Figure 4 shows approaches that have been useful in this regard.

Optimization and Complementation of the Physicochemical Properties
Physicochemical properties such as positive net charge and amphipathicity significantly influence the bioactivity of most AMPs (mainly reported as α-helical structures) [163,164]. The net positive charge is related to the initial electrostatic interaction with anionic phospholipids and lipopolysaccharides that make up the membranes of certain pathogens [165]. On the other hand, mammalian cells such as RBCs have zwitterionic phospholipids in the outer leaflet of their membranes, which are not as much affected by the positive charge as they can be by the hydrophobic properties of the peptide [159]. Highly hemolytic peptides interact with phosphatidylcholine, an abundant component of zwitterionic membranes [166]. In contrast, the cholesterol in mammalian membranes inhibits peptide binding [167].
drugs. Determining peptide selectivity is a mandatory requirement during preclinical research [159,160]. As discussed above, selectivity, quantified by the SI, results from a finetuned fit between toxicity and biological effect. This delicate balance remains a challenge explored in many SAR studies [161,162]. For this reason, we discuss and illustrate the leading design and synthesis strategies used to reduce toxicity and increase the potency of peptides. Figure 4 shows approaches that have been useful in this regard. . Peptide selectivity optimization strategies. Selectivity is a favorable characteristic of considerable significance for the success rate of drug candidates. However, selectivity optimization is a complex task that must balance the properties that govern toxic and therapeutic effects. Different design and synthesis strategies have contributed to this objective. In the design step, the evaluation . Peptide selectivity optimization strategies. Selectivity is a favorable characteristic of considerable significance for the success rate of drug candidates. However, selectivity optimization is a complex task that must balance the properties that govern toxic and therapeutic effects. Different design and synthesis strategies have contributed to this objective. In the design step, the evaluation of physicochemical properties and SAR relationships integrated with computational techniques play a decisive role in peptide selection. In the same context, cyclization, use of D amino acids, and peptidomimetics have been key for the development of stable and selective peptides.
The chemical (solid-phase) peptide synthesis presents versatility and accessibility to generate several analogs of the same parental molecule, which is very useful for determining the residues and regions responsible for the biological activity of interest. Some examples of increased selectivity, and therefore, a decreased hemolytic activity due to modifications in the physicochemical properties of analogous peptides, have been observed for peptides such as: peptides derived from leucine zipper [109], magainins [106], hybrids of the melittin [168], PMAP-36-derived peptides [169] and clavaspirin [170].
Not all the structural parameters of a peptide are independent, so it is challenging to determine what characteristics can significantly influence selectivity [159]. In many studies, reducing the peptide hydrophobicity might lead to an unintended decrease in antimicrobial activity [171]. In another case, the substitution of a single hydrophobic residue for a positively charged residue has reduced hemolytic activity without compromising antimicrobial activity [172].
There is also evidence that the interchange of amino acids with similar chemical properties can affect peptide activity. Four phenylalanine residues of the cathelicidin-BF15-a1 peptide were replaced by tryptophan residues, which resulted in increased activity against E. coli (MIC reduction from 9.6 to 2.1 µM) and Bacillus subtilis (MIC reduction from 38.7 to 4.3 µM). The low hemolytic activity was not affected by this modification (HC 50 > 320 µg/mL in both cases) [173]. In order to maintain the native primary structure properties, one can choose to carry out end modifications such as C-amidation or Nacetylation, which could increase the activity of the molecule and its stability within an organism [25]. Conjugations with nanoparticles are another interesting strategy that has diminished toxicity and increased biological activity [174].

Cyclization
Cyclization has been an adequate tool in peptide science, mainly to increase peptide stability and constrain a three-dimensional structure to enhance its desired biological effect [159,175]. For instance, the cyclization has prevented the hydrophobic region of a cyclized melittin analog from being altered, retaining antimicrobial capacity while reducing hemolytic activity [176]. However, such a process is sequence dependent, and a case-bycase analysis is required. For example, the cyclization of a magainin analog significantly reduced both its antimicrobial and hemolytic activities [176]. In contrast, the cyclization of the RRWWRF peptide increased both its hemolytic and antimicrobial activities [106].

Incorporation of D Amino Acids
Peptide selectivity can also be improved by gaining proteolytic resistance. The peptide lifetime in the body is extended, reaching the same therapeutic effects at lower peptide doses and reducing toxic effects [159]. The incorporation of amino acids in the D configuration confers resistance to the action of proteolytic enzymes, which cannot exert their actions due to steric hindrances [177,178]. This has allowed the evaluation of some peptides on an in vivo scale, obtaining highly relevant results [179,180]. Examples of this type of modification with favorable results in selectivity include melittin and other peptides composed mainly of leucine and lysine [181,182]. Additionally, incorporation of D amino acids in the sequence of an AMP may further alter its amphipathic structure in such a way that peptide hydrophobic interactions with the zwitterionic host cell membranes, and consequently hemolysis, can be substantially reduced [183].

Use of Peptoids
Another way to disturb the amphipathic α-helices in certain AMPs is the use of peptoids, i.e., peptide analogs comprising N-substituted glycines, i.e., amino acid residues where side chains are linked to the nitrogen rather than to the α-carbon; consequently, amide groups in peptoids are unable to act as hydrogen bond donors [166]. An example of this approach concerns the Leu/Lys-rich KLW peptide, in which introduction of Nsubstituted glycines at positions 9 and 13 decreased the hemolytic activity (0% hemolysis up to a 100 µM peptide concentration) while enhancing the antimicrobial action from minimal inhibitory concentrations (MIC) of 4 to 8 µM (native sequence) to 1 to 4 µM (peptoid analog) against several bacteria [184]. The same strategy was applied to other peptides such as melittin [185] or cathelicidins [186] to obtain better selectivity.

Bioinformatics Tools
Although chemical synthesis is essential to the development of new peptide candidates, each synthetic process implies a considerable investment in the human workforce, techniques, and resources [187,188]. The use of alternative approaches, including machine learning tools, has risen in recent years [189]. Such tools greatly reduce costs, as their outputs direct laboratory resources and efforts in a most cost-effective manner, through estimations, predictions, and comparisons of relevant properties based on available information and adequate prediction models [72]. Many computational tools that are complementary to toxicity predictors promise to accelerate the development of bioactive peptides, as follows.
The investigation carried out by Kamech et al. [92] is one emblematic example of the relevance of bioinformatics toward the development of peptide therapeutics. These authors have developed software named Mutator, which creates specific sequence substitutions that enhance the selectivity and effectiveness of peptides. This resource generated analogs of the native peptides XT-7 and ascaphin-8, which were later synthesized and evaluated in vitro. According to Mutator, the suggested substitutions should raise the SI from <37 to >80. Experimentally, SI > 130 were found for S. aureus and E. coli. Moreover, the mutant peptide derived from XT-7 displayed an SI that was increased from 5 to >270 on Pseudomonas aeruginosa. Mutator has produced data on 26 peptides with potent antimicrobial activity and SI values > 20. Note that the training data were predominantly linear α-helical amphipathic peptide sequences that adopt an α-helical amphipathic structure; hence, the Mutator has so far shown its limitations to molecules with these structural characteristics.
Machine learning (ML) is another important and very useful tool for the generation of novel peptide sequences [190]. For example, Capecchi et al. [38], developed a generative model using RNN and the DBAASP database that revealed 28 new peptides of 15 residues in length at best that differed in point mutations with the data from the training set. Of the 28 sequences, 8 peptides were recognized as non-hemolytic and likely to be active against different bacteria strains. Another example of a tool based on ML and focused on predicting AMP sequences is AMAP [191]. This tool has the ability to use a multilevel classification that allows predicting 14 different types of biological activities for a given peptide. The value generated is correlated to the possible antibacterial effectiveness. AMAP was evaluated with the proof-of-concept peptide P276, which has a powerful antimicrobial activity. The tool cataloged it precisely as a promising AMP candidate (AMAP score = 1.70), another similar server (MLAMP) classified it as a non-antimicrobial peptide.
The selectivity of specific peptides toward their target cells can be assured by a few predictive models, as outlined by Li et al. [192]. These authors aimed at identifying the characteristics and factors having a broad influence on peptide selectivity by means of an RF algorithm that correlates the properties of the peptide sequences with their biological activity. The models generated yielded high precision predictions, and their interpretation indicated that selectivity is mediated by a strong relationship between properties related to solubility and charge.
Antimicrobial activity is not the only biological function affected by typical AMPs since their structural and functional diversity enables them to act on other lipid membranes, such as those of cancer cells. In this context, Gabernet et al. [193] developed an ML model that discriminated peptides with or without anticancer activity. The predictive model was experimentally validated through the synthesis and biological evaluation of 12 model peptides, revealing 83% of successful predictions. A design algorithm, known as simulated molecular evolution, was also used that increased the selectivity of the peptides by 5 times with respect to human endothelial cells and by 10 times with respect to RBCs, illustrating the benefits of using ML-guided design and optimization for peptide-based drugs.

Future Perspectives and Concluding Remarks
Accurate computational predictions are extremely attractive in the early stages of drug discovery and are revolutionizing the development of peptide-based therapeutics. Bioinformatics tools constitute a safe gateway that gives valuable hints on which therapeutic peptides are worthy of being progressed while preventing the advancement of likely toxic molecules. Toxicity remains a major hurdle toward the clinical translation of peptides, and standard approaches to avoid it are time and resources consuming, often running in the opposite direction of the 3Rs guidelines for animal experimentation. In silico tools have provoked a boom in the ability to predict peptides as hemolytic or non-hemolytic due to the accessibility and abundance of DBs, enabling the understanding and systematization of structural determinants and key properties underlying peptides' hemolytic effects. These DBs are essential for the construction of models capable of classifying a vast number of peptide sequences and guiding de novo peptide design. Currently, 12 tools are freely available, 10 of which specifically address peptides' hemolytic action. Based on the metrics evaluated, our findings support that these virtual tools are of great use for the scientific community. The relatively high reliability and resolving power open new avenues for the design and development of prospective clinical peptide drugs with minimal cost, time, and resources. The experimental validation of these predictions is necessary and should contribute to the refinement of the models. A comparison of the predictive results obtained using the current methods to evaluate mega data of peptide databases is a promising source of valuable clues for the improvement and accuracy of high-quality computational technologies. Future studies should assess the sampling bias within available models, which may be influenced by structures, sequence diversity, and amino acid composition. The transition to quantitative analysis, such as the development of regression models to predict HC 50 values, is also of great relevance and usefulness for in silico peptide design. Finally, innovative programs must consider predicting the balance between toxicity and therapeutic effect, i.e., selectivity. Thus, the future design of peptide pharmaceuticals should be greatly favored by the interplay between computational, in vitro, and in vivo approaches.

Supplementary Materials:
The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/ph15030323/s1. Table S1: The size and composition of data sets. Data Availability Statement: Publicly available datasets were analyzed on this study. This data can be found here: https://github.com/albert-robles1101/hemolytic-prediction-of--peptides (accessed on 24 January 2022).