Screening of Novel Bioactive Peptides from Goat Casein: In Silico to In Vitro Validation

Food-derived bioactive peptides are of great interest to science and industry due to evolving drivers of food product innovation, including health and wellness. This study aims to draw attention through a critical study on how bioinformatics analysis is employed in the identification of bioactive peptides in the laboratory. An in silico analysis (PeptideRanker, BIOPEP, AHTpin, and mAHTPred) of a list of peptides from goat casein hydrolysate was performed to predict which sequences could potentially be bioactive. To validate the predictions, the in vitro antihypertensive potential of the five peptides with the highest potential was first measured. Then, for three of these, gastrointestinal digestion was simulated in vitro, followed by the analysis of the resulting ACE inhibitory activity as well as antioxidant capacity. We thus observed that the use of new computational biology technologies to predict peptide sequences is an important research tool, but they should not be used alone and complementarity with various in vitro and in vivo assays is essential.


Introduction
Peptide discovery, especially from different sources of food proteins, has an important interest in potential advancements in biology, chemistry, pharmacology, medicine, biotechnology, and the food industry [1,2]. Peptides (with lengths ranging from 2 to 50 amino acids) have crucial roles as antimicrobials, growth factors, hormones, biological messengers, and neurotransmitters [2]. Peptides can exert several bioactivities such as antiangiogenic, antibacterial, anticancer, anti-fungal, and others [2,3]. Therefore, there is an interest in researching their chemical and biological properties. Naturally, many proteins or protein fragments (peptides) perform their biological functions in their native form; however, some peptides require changes to become bioactive. There are three different approaches for peptide generation: the action of proteolytic microorganisms, the digestive enzymes in the gastrointestinal tract, and the external hydrolysis with proteolytic enzymes [3,4]. External enzymatic hydrolysis is the most common method for the generation of bioactive peptides; however, the fermentation method has been considerably relevant to products such as milk [1].
Research on food-derived bioactive peptides is a hot topic, especially from cow's milk and milk products, in their identification, characterization, and use [5,6]. With much lower coverage, goat milk has also been studied and characterized for its bioactive peptides, with beneficial properties such as antioxidant, ACE inhibitory peptides, anti-diabetic, and antimicrobial. However, more studies are needed to validate these health claims [6].
Considering the potential of bioactive peptides as novel therapeutics, nutraceuticals, and functional food ingredients, the discovery and prediction of novel bioactive peptides is an exciting area of research. Metabolomic, proteomic, and genomic screening of toxins and other natural product sources can identify bioactive peptides [7]. Advances in peptide screening and computational biology have come to support this area. The number of bioactive peptide databases covering a range of activities is at a promising seed stage. A good example is BIOPEP, a database of biologically active peptide sequences resulting in a tool for the evaluation of proteins as bioactive peptide precursors [5]. Tools such as this, and computational biology in general, are important for developing general predictors of bioactive peptides and identifying candidate peptides most likely to be bioactive. Isolating, identifying, and characterizing are critical; therefore, the development of new technologies to improve this is important. The use of machine learning to identify functional peptides and protein sequence data is a major advance and can improve the rapid and accurate detection of new biopeptides. Sequence-based in silico approaches can be used to select the best peptides before their synthesis and testing in laboratory experimentation, thus optimizing the design of therapeutic peptides. Improvements in peptide screening and computational biology will continue to support peptide drug discovery. However, this new technology must be complemented with in vitro studies. This article presents a holistic perspective on recent advancements in silico peptide prediction and their relationship with in vitro assays focused on selected peptides from a goat casein hydrolysate. Figure 1 schematizes the approach to our idea. Research on food-derived bioactive peptides is a hot topic, especially from cow's milk and milk products, in their identification, characterization, and use [5,6]. With much lower coverage, goat milk has also been studied and characterized for its bioactive peptides, with beneficial properties such as antioxidant, ACE inhibitory peptides, anti-diabetic, and antimicrobial. However, more studies are needed to validate these health claims [6].
Considering the potential of bioactive peptides as novel therapeutics, nutraceuticals, and functional food ingredients, the discovery and prediction of novel bioactive peptides is an exciting area of research. Metabolomic, proteomic, and genomic screening of toxins and other natural product sources can identify bioactive peptides [7]. Advances in peptide screening and computational biology have come to support this area. The number of bioactive peptide databases covering a range of activities is at a promising seed stage. A good example is BIOPEP, a database of biologically active peptide sequences resulting in a tool for the evaluation of proteins as bioactive peptide precursors [5]. Tools such as this, and computational biology in general, are important for developing general predictors of bioactive peptides and identifying candidate peptides most likely to be bioactive. Isolating, identifying, and characterizing are critical; therefore, the development of new technologies to improve this is important. The use of machine learning to identify functional peptides and protein sequence data is a major advance and can improve the rapid and accurate detection of new biopeptides. Sequence-based in silico approaches can be used to select the best peptides before their synthesis and testing in laboratory experimentation, thus optimizing the design of therapeutic peptides. Improvements in peptide screening and computational biology will continue to support peptide drug discovery. However, this new technology must be complemented with in vitro studies. This article presents a holistic perspective on recent advancements in silico peptide prediction and their relationship with in vitro assays focused on selected peptides from a goat casein hydrolysate. Figure 1 schematizes the approach to our idea.

In Silico Prediction
An in silico analysis of a list of peptides from goat casein hydrolysate (from a study in progress) was performed to predict which sequences could potentially be bioactive. Table 1 shows the top 10 peptides analyzed with PeptideRanker [8]. PeptideRanker is a server for the prediction of bioactive peptides based on a novel N-to-1 neural network. This server returns the probability that the peptide will be bioactive. PeptideRanker was trained at a threshold of 0.5, i.e., any peptide predicted over a 0.5 threshold is labeled as bioactive. Table 1 shows that only 5 of the 10 sequences are potentially bioactive, the most likely being SWMHQPP.

In Silico Prediction
An in silico analysis of a list of peptides from goat casein hydrolysate (from a study in progress) was performed to predict which sequences could potentially be bioactive. Table 1 shows the top 10 peptides analyzed with PeptideRanker [8]. PeptideRanker is a server for the prediction of bioactive peptides based on a novel N-to-1 neural network. This server returns the probability that the peptide will be bioactive. PeptideRanker was trained at a threshold of 0.5, i.e., any peptide predicted over a 0.5 threshold is labeled as bioactive. Table 1 shows that only 5 of the 10 sequences are potentially bioactive, the most likely being SWMHQPP.  1 The score ranges from 0 to 1, and the closer to 1, the higher the probability of biological activity.
After this first selection, the five potentially bioactive sequences were considered for analysis with another widely used tool for in silico prediction of bioactive peptides, BIOPEP [9] Table 2 shows all the potential bioactivities that each peptide encodes in its sequence. The table shows two of the parameters used to classify potential bioactivities in the BIOPEP database, parameter A (frequency of occurrence of bioactive fragments in a protein sequence) and B (potential biological activity). Among the five peptides only two potential bioactivities were repeated, angiotensin-I converting enzyme (ACE) inhibitor and dipeptidyl peptidase IV (DPP-IV) inhibitor, respectively antihypertensive and antidiabetic activities. Based on ACE inhibitor and DPP-IV inhibitor, the ranking was not the same as with PeptideRanker. Concerning B, the peptide with the highest potential to inhibit ACE was QSLVYPFTGPIPNSL (#4 for PeptideRanker), while the peptide SWMHQPP (#1 for PeptideRanker) was in second place with almost the same potential as the sequence YPYQGPIVL (#5 for PeptideRanker). Likewise, the latter peptide showed the highest potential to inhibit DPP-IV, with SWMHQPP coming in second place once again. Following the BIOPEP analysis, we focused the analysis more on ACE inhibitory activity, for which we resorted to other novel tools of increasing use in predicting potentially antihypertensive peptides, AHTpin [10] and mAHTPred [11]. These are online servers that use machine learning trained to predict antihypertensive peptides from the reported peptides' characteristics of all lengths. Once again, the ranking was different from the previous ones (Tables 3 and 4). While again the change was not dramatic, it was for the peptide SWMHQPP, for which the AHTpin estimated the least antihypertensive capacity, with a very low value for what can be considered to have that potential.   [14]. Furthermore, we have previously reported peptide hydrolysates of soy protein, i.e., a mixture of peptides, with IC 50 values lower than 60 µg mL −1 [15]. That is why, when analyzing the inhibitory capacity of a pure peptide, we consider that an IC 50 value higher than 50 µg mL −1 is not significant compared to the benchmarks. In this sense, a priori, only the peptide QSLVYPFTGPIPNSL would be antihypertensive according to the IC 50 estimated by BIOPEP.

Antihypertensive Activity
To validate the in silico predictions, we measured the ability of the five peptides to inhibit ACE in vitro (iACE). Table 5 (as raw) shows the values of the bioactivities, which accounted for the low inhibitory capacity of the peptides, being SWMHQPP the only one that presented a considerable capacity (IC 50 223 µg mL −1 ). Even so, as already specified for a pure peptide which is considered antihypertensive, this level of inhibitory activity is low. For SWMHQPP, when comparing the observed value with the value that we calculated from BIOPEP, the observed IC 50 was 3.1 times higher. In the case of QSLVYPFTGPIPNSL, which was predicted as the one with the highest antihypertensive potential by BIOPEP and AHTpin, and second by mAHTPred, the inhibitory activity was one of the lowest, practically nil.

Simulated Gastrointestinal Digestion
Now, BIOPEP establishes the potential bioactivity based on the sequences encoded in the peptides. Thus, if these sequences were released through a lytic process, such as that which occurs in gastrointestinal digestion, the bioactivity could increase and approach that predicted. Thus, the three peptides of greatest interest from the in silico predictions (QSLVYPFTGPIPNSL, SWMHQPP, and YPYQGPIVL) were selected for in vitro simulated gastrointestinal digestion. We performed the digestions considering reaching a concentration of 50 µg mL −1 in the simulation of the intestinal compartment. For the three peptides, in addition to the antihypertensive potential, we analyzed the antioxidant potential by ORAC. Bioactivity analyses were performed with both the peptides before digestion and after digestion.
Before digestion, we found a moderate antioxidant capacity (Table 5), particularly for QSLVYPFTGPIPNSL and YPYQGPIVL [16,17]. In the case of SWMHQPP, the antioxidant capacity was considerable, being the highest of the three. After digestion of the three peptides, we observed a high increase in the antioxidant capacities of all peptides. The digest of QSLVYPFTGPIPNSL showed the highest antioxidant activity, followed by YPYQGPIVL, with SWMHQPP being below these two. Although the values between the digested peptides were different, they all demonstrated high bioactivity. This accounted for the release of antioxidant sequences encoded in each of the peptides. Concerning this, what was striking is that BIOPEP only estimated QSLVYPFTGPIPNSL with antioxidant potential. Although this peptide ended up being the one that showed the highest antioxidant activity after ingestion, the other two proved to be antioxidants both before and after digestion.
On the iACE activity, digestion had no positive effect. None of the three digested peptides showed any inhibitory activity, so the IC 50 was clearly not reached under the conditions tested. This means that if there is any inhibitory activity, it is well above 50 µg mL −1 .

Discussion
The current interest in the study of peptides has triggered a technological evolution, becoming a common practice to predict bioactive peptide sequences from parent proteins. These sequences can be released by lysis of the protein structure, as is the application of enzymatic hydrolysis [3,15]. In turn, peptides produced by proteolysis can still release other bioactive sequences in the enzymatic hydrolysis of the gastrointestinal digestive process [15]. Different results are reported in the literature for the residual antioxidant capacity of peptides and casein hydrolysates, after the in vitro-simulated gastrointestinal digestion, with an increase, reduction, or unchanged in this activity [18][19][20]. Contreras et al. (2013) observed a small increase in the antioxidant capacity of the AYFYPEL peptide, after the digestion process, from the activity of 3.216 ± 0.114 to 4.160 ± 0.623 µmol TE µmol −1 peptide [21]. However, for the other analyzed peptides (RYLGY and YQKFPQY), the antioxidant activity was reduced.
The most used bioinformatics tools to predict putative biologically active sequences are based on databases constructed from known structures, structural and physicochemical analyses that allow establishing different residue interactions that provide certain bioactivities [2,22,23]. However, the information behind these databases is still quite limited, both in terms of the spectrum of bioactive properties that each sequence can exert and the mechanisms that endow these structures with these properties. From this work, we could see that different tools predict differently, leading to different results. Furthermore, it is still unclear which tool may be the most reliable for the peptides discussed here. Although PeptideRanker predicted these peptides with bioactive potential, without specifying the properties, in the case of the antioxidant property they presented as interesting, the order of significance did not correspond to the predicted one. On the other hand, BIOPEP did not even predict most of the analyzed sequences as antioxidants, the only one for which it did was QSLVYPFTGPIPNSL. However, BIOPEP did predict these peptides as antihypertensives, just as we were able to achieve the same result using AHTpin and mAHTPred, considered the most reliable tools when involves in predicting antihypertensive activity [2]. However, this bioactivity was not observed in in vitro validation tests, which exposes the limitations that these tools still have.
This result does not mean that these bioinformatics tools are obsolete or unreliable, but that they still need to develop and grow. Likewise, these same tools have proven to be very timely in other works, with which the global utility in the search for therapeutic peptides cannot be denied [2,15]. To this end, it is very important to continue developing in vitro and in vivo assays to relate peptide primary and secondary structures to different biological properties, as well as to increase the spectrum of potential properties that peptides already reported can exert.
Although this work is an initial exploratory work, it still needs to be further expanded, which would require in vitro analysis of the DPP-IV inhibition property, as well as establishing the fragmentation of peptide sequences along the gastrointestinal tract.

Bioactivity Prediction
To predict in silico the bioactivity of each peptide, a ranking was first performed by PeptideRanker [8], which is a server for the prediction of bioactive peptides based on a novel N-to-1 neural network [5]. The first 5 peptides in the ranking were then analyzed using the online BIOPEP database [9], estimating the different potential bioactivities [22,24]. One of the important parameters that are estimated by BIOEPEP is the potential biological activity of a protein (B) [µM −1 ]: where a i is the number of repetitions of the i-th bioactive fragment in protein sequence; EC 50i is the concentration of i-th bioactive sequence corresponding to its half-maximal activity [µM] or half-maximal inhibition (IC 50 ) in case of peptides with inhibitory activity; k is the number of different fragments with given activity; N is the number of amino acid residues. In parallel, these same hundred peptides were analyzed about their antihypertensive potential using the online tools AHTpin [10,25] and mAHTPred [11,23]. For AHTpin, peptides with support vector machine (SVM) scores > 0.0 were considered as predicted with the antihypertensive property.
The analysis with all servers was performed on 14 December 2021. An update of the data obtained by BIOPEP was performed by entering the platform on 15 February 2022.

ACE Inhibitory Activity
The ACE inhibitory activity was carried out using the fluorometric assay described by Coscueta, Brassesco, and Pintado (2021) [26]. The iACE of each sample (raw peptide and digested) was evaluated in duplicate and expressed as the concentration capable of inhibiting 50% of the enzymatic activity (IC 50 ). To calculate the IC 50 values, non-linear modeling was used, and the results were expressed as µg mL −1 to inhibit 50% of the enzymatic activity.

Antioxidant Activity
The ORAC assay was performed according to Coscueta, Brassesco, and Pintado (2021) [26]. Each sample (raw peptide and digested) was analyzed in duplicate and the results were expressed in µmol TE by mg of peptide (µmol TE mg −1 ), after the calculation of Trolox concentration through regression curve equation.

Simulated Gastrointestinal Digestion
Simulated gastrointestinal digestion was performed according to the method described by Madureira et al. (2011) [27]. Briefly, mouth digestion was mimicked by introducing the peptide sequence (2.5 mg/mL) into 15 mL of a 1 mM CaCl 2 solution of 100 U mL −1 α-amylase, under constant stirring (200 rpm) for 2 min at 37 • C, simulating masticatory movements [28]. A 1 M NaHCO 3 solution was used to adjust the pH of artificial saliva to 6.9. Subsequently, a pepsin solution, 25 mg/mL, was added with a ratio of 0.05 mL/mL of a sample at pH 2.0 (simulated gastric solution, stomach) following incubation for 2 h, 37 • C at 130 rpm orbital agitation for the gastric phase. Finally, intestinal digestion and absorption were simulated by adjusting pH values to 5 using a 1 M NaHCO 3 solution and adding a solution of bile salts and pancreatin to the digest [27].

Statistical Analysis
Statistical analysis was carried out with the aid of RStudio V 1.2.1335. The mean values from two replicates were analyzed statistically by analysis of variance followed by Tukey's post hoc test. Separation of means was conducted by using the least significant difference at the 5% level of probability.