A growing body of evidence suggests that milk and dairy products have unique metabolic, signalling and antimicrobial effects, beside their high nutritional content. This bioactivity is mainly mediated by peptides naturally occurring during their digestion by proteases along the gastrointestinal GI tract [1
]. As shown over the last decade, such peptides mediate a broad spectrum of activities including modulation of inflammatory and immune response, signalling and metabolic processes, antihypertensive and antioxidative effects, besides acting as antimicrobial agents [3
However, while evidence supporting the bioactive potential of milk and other food-derived peptides is accumulating, it remains unclear if the peptides of interest (a) can withstand the high proteolytic activity in the gastrointestinal tract for long enough to exert an effect before being fully degraded and (b) their permeability through the intestinal epithelium is such that they can reach the target tissue or organ at physiologically relevant concentrations. In fact, it has been suggested in a critical evaluation that di- and tripeptides can permeate the intestinal epithelium and exert a biological function; however, there is not yet convincing evidence supporting the same for longer oligopeptides [4
]. On the other hand, antimicrobial peptides (AMPs) with sufficient stability with respect to proteolysis are not subject to epithelial absorption and can likely have an immediate effect on the gut microbiome. The latter could be considered as an important aspect in maintaining a healthy GI tract and controlling dysbiosis, as it was recently shown that AMPs are able to suppress the growth of opportunistic pathogens like Helicobacter pylori
], Escherichia coli and Staphylococcus aureus
AMPs are typically positively charged 12–50-amino acid (a.a.)-long oligopeptides either forming secondary structures, which include α-helixes, β-sheets, loops, or remaining as extended oligopeptides [7
]. Their mechanism of action involves direct microorganism killing after penetrating and disrupting the membrane bilayer, membrane proteins or intracellular targets [8
]. More recently, it has been suggested that AMPs can exert their antimicrobial effect also via immunomodulation [9
], although the exact mechanisms are not entirely clear. To identify biofunctional peptides in foods, one has to consider peptides’ individual amino acid composition, charge, solubility, length, amphiphilic features and secondary structure similarity with known characterised endogenous peptides in the organism of interest as well as with peptides produced by the gut flora [8
Naturally, the composition of biofunctional peptides from milk and dairy products from different animal breeds is unique, offering a broad range of sequences to screen for peptides with functional traits suggesting their scientific, medical and commercial importance [7
]. Milk and dairy products (e.g., yogurt) have already been characterised and identified to be effective against specific pathogens [2
]. Sheep and goat milk was found to be rich in biofunctional peptides sourced mainly from α-, β- and k- caseins [13
]. A relatively less explored proteome space to probe for AMP is represented by milk whey [14
] (non-casein-rich phase) and by fermented milk products like cheese [17
]. In this work, we focused on the potential antimicrobial properties of milk whey from two goat and three sheep pure breeds endogenous in Greece and of feta cheese, to probe for AMPs following an assessment of their stability in an intestine-like environment. Following the computational workflow shown in Figure 1
, which combines existing and newly developed approaches, we characterised the antimicrobial “load” of the proteomes of interest. As shown in Figure 1
, protein sequences from each breed’s milk whey and feta cheese were screened using the publicly available tool AMPA [10
] to find sequence stretches with predicted high antimicrobial potential (i.e., low AMPA propensity). The same protein sequences were digested in silico to identify which peptides that can actually occur in the GI tract, matched the predicted AMPA stretches. The matching peptides were further assessed for their stability using as a proxy the number of cleavage sites by human endogenous proteases. The stability assessment was complemented with their respective half-life estimation by the peptide Half-Life Predictor (HLP) [19
] using its Support Vector Machine (SVM) model trained on datasets obtained from crude intestinal extracts. The final ranking of the selected peptides was based on a combined antimicrobial score (CAS) calculated as a function of (a) peptide antimicrobial propensity, i.e., the potential to penetrate bacterial membranes and (b) peptide stability i.e., the peptide survival rate within an intestine-like environment necessary to have an effect. Our work resulted in a top 100 set of AMPs which are predicted to hold the highest combined stability and antimicrobial effect.
Running all five proteomes shown in Figure 1
A (totalling 1665 unique protein sequences) in AMPA returned, as shown in Figure 1
B, a total of 3285 stretches with predicted antimicrobial properties, from which 2506 were unique across all proteomes. As expected, the milk whey proteomes from Chios and C. prisca
returned the highest number (~1300) of predicted AMPs, since their proteomes have the highest number of identified proteins. On the contrary, the feta proteome returned the smallest set, comprising 861 AMPs.
The same protein sequences were digested in silico as described in Methods (Section 2
). Proteins are exposed to different proteases during digestion along the GI tract, with pepsin in acidic stomach conditions (pH < 1.8, Pa) acting before the proteases present in the duodenum and intestinal tract, such as Pb (pH > 2.0), CT, T and E, as well as the proteases of microbial origin or in located in the blood, such as T. In order to adhere to and approximate the above spatiotemporal separation between protease activities, we initially extracted all possible peptides assuming complete (i.e., Pa hydrolyses all Pa-specific cleavage positions) and partial (i.e., only some Pa-specific cleavage positions are hydrolysed) pepsin digestion. For cleavage following pepsin exposure, we considered only endogenous proteases (CT, T, Pb, E and Th) and, in the spirit of simplicity, we omitted microbial proteases like Arg-C proteinase and Asp-N endopeptidase.
As shown in Figure 2
A and Table 1
, the resultant pepsin-digested set screened against the AMPA-identified set returned 1327 unique matching sequences out of a total of 1532 sequences. The latter set did not include the matching digested peptides with lengths outside the selection range [12–47 a.a. residues] or peptides with residual sequences upstream and downstream of the N- and C-termini over 2 a.a.-long. While this threshold was set arbitrarily, we empirically found a reasonable balance between under- and over-represented AMP peptides in the datasets. Furthermore, the set tested in CellPPD [21
] confirmed that approximately 95% of the set was predicted to be able to penetrate membranes (Supplemental Information Figure S1
). The physicochemical characteristics of the predicted antimicrobial peptides given in Supplemental Information
per peptide, summary statistics in Table S1
and distributions in Figure S2
) showed an average length of 18 a.a., an amphipathicity index of 0.77, a net charge of +3.6, a hydrophobicity index of −0.17, an isoelectric point (pI) of 10.34 and a molecular weight of 2.1 kDa.
Overall, approximately 80% of the AMPA-predicted AMPs were rejected since pepsin cleavage sites were found at positions within or over 2 a.a. upstream and/or downstream of the target sequence. The selected set of the 1327 AMPs derived from pepsin digestion comprised 83 exact matches, i.e., the pepsin cleavage positions matched the starting and ending a.a. residue from the AMPA prediction, while the remaining set was cleaved at one to two residues over the starting or ending positions.
The selected set of 1327 AMPs was back-traced across the original proteome sets as shown in Figure 2
A. Interestingly, the order of the number of selected AMPs did not follow the size of the proteome for all breeds, as shown in Table 1
. For example, the CP milk proteome produced approximately 602 AMPs from 595 protein sequences, while the Ch proteome, which was the largest set (685 sequences, ratio = 1.01), ranked lower with approximately 407 selected AMPs and a ratio of 0.65. On the other hand, the feta cheese proteome produced the lowest number of matching AMPs, while indeed being the smallest proteome. The various features of the population of the selected AMP set followed skewed normal distributions, as shown in Figure 2
B, for number of cleavage sites (non-pepsin-specific), CSS, AMPA propensity score and HLP relative stability score, while peptide length and half-life reflected a log normal distribution.
Comparing the sheep- and goat-milk-derived proteomes shown in Figure 2
C, we identified 84 AMPs that are common across all animal breeds, while CP milk proteome presented the highest number of unique AMPs (~320). The feta cheese proteome was predicted to have 64 and 63 AMPs in common with the three sheep and goat breeds’ proteomes, respectively, while unique AMPs were overrepresented in feta considering its small proteome size relative to the other sets.
Ranking the selected AMP set on the basis of the CAS and selecting the top 100 AMP peptides revealed an interesting imbalance in their representation across proteomes. Their population metrics are given in Table 2
. Figure 3
A shows that 36 top AMPs were traced in CP, 34 in Ch and <33 in the remaining animal species. Worth noting, the highest number of top AMP (44) were traced in feta cheese, from which 21 were not found in any of the other proteomes, albeit feta cheese having the smallest proteome size. The CAS score boxplots in Figure 3
B show that F followed by Ch have the highest share of the top 100 AMP set and antimicrobial potential relative to the other proteomes analysed in this work. The top 100 AMP set is given in Supplemental Table S2
(top 100 entries) and summarised in the network shown in Figure 3
The milk whey from sheep and goat breeds [15
] and a specific fermentation dairy product, i.e., feta cheese [18
] were found from our analysis to comprise a rich source of proteins with antimicrobial traits [2
]. More importantly, several peptides derived from protein digestion, early along the GI tract, matched the sequences predicted by AMPA with the aforementioned antimicrobial traits. This suggests that the peptides resulting from milk digestion can potentially have a modulatory effect on the human gut microbiome profile [5
]. Comparing the physicochemical properties (given in Supplemental Information
) with those in several publicly available AMP databases, the selected AMP set in this work showed agreement with similar distributions reported in several databases containing experimentally validated AMPs, such as dbAMP [32
], DBAASP [33
], APD [34
], CAMP [35
] and LAMP [36
]. The physicochemical properties were further evaluated by analysing the AMPs found in DBAASP, which contains the highest number of entries. Supplemental Table S1
shows that the DBAASP values approximate the corresponding values of the selected AMP set in this work. Finally, nearly 95% of the AMP set was predicted to have cell-penetrating ability by CellPPD [21
In this work, we considered that the magnitude of the antimicrobial effect of a given peptide can be approximated as a function of two factors: (a) The antimicrobial propensity emerging by its amino acid physicochemical characteristics, i.e., the ability to either penetrate membrane bilayers and/or modulate host immune responses and (b) Its bioavailability which is proportional to its resistance to proteolysis within the compartment of interest. The former was derived from the AMPA antimicrobial peptide predictor [10
], while the latter was quantified with respect to the peptides’ affinity to endogenous proteases. Yet, the amount of cleavage recognition patterns in a given peptide sequence is only one factor in a more complex scheme that determines its actual decay rate reflecting the differential stability of peptides with different amino acid composition and different biological behaviours [37
]. In order to approximate a more accurate estimation, we also incorporated HLP [19
] in our ranking, a peptide half-life prediction model trained on peptide decay data from crude intestine extracts. These metrics allowed us to reach a relative assessment of the proteomes under study for the AMP set of interest rather than a physical quantification of AMP properties which was out of the scope of this work.
Our results suggest that the diversity of the proteome does not necessarily correlate with the AMP diversity that can actually occur via protease digestion. Also, some AMPs which scored low in antimicrobial propensity did not necessarily ranked high with respect to CAS, since they were predicted to be more susceptible to rapid proteolysis. More specifically, the AMP predicted with the highest antimicrobial potential, i.e., the lowest propensity (FHKFICKMMKIYL) ranked only at the 965th CAS position due to a high predicted decay rate (d529_21 = 1.868 s−1) and a CSS score (6.67) slightly lower than the mean.
Comparing the milk whey from the animal breeds of interest, we observed that the two goat breeds (Skopelos and C. prisca
) showed higher AMP-to-proteome size ratios than the sheep breeds, but these differences were not statistically significant in Kruskal–Wallis non-parametric tests. Feta cheese returned a relatively low number of selected AMPs but surprisingly it resulted to be the most represented proteome in the top 100 AMP set which comprises the AMPs with the highest antimicrobial effect and resistance to proteolysis. Since feta cheese is produced using milk from the goat and sheep breeds discussed above, an interesting future research avenue will be to decipher whether this bias in more stable AMPs is introduced during the fermentation process and which mechanisms are responsible for it. Recent work has suggested that lactic acid microbes have a central role in the release of encrypted bioactive peptides during this process [40
Finally, this work aimed at profiling the diverse range of AMPs that can occur and be active within the GI tract. We followed the rational that exposure of whole proteins to gastric pepsin precedes proteolysis from other proteases, therefore, peptides produced by pepsin digestion are predominant and more likely to occur. Yet, under conditions of incomplete pepsin digestion, a broader diversity of active AMPs can be produced as a result of digestion from the other endogenous or bacterial proteases. Future research can focus on the top predicted AMPs to determine experimentally their antimicrobial activity and degradation rate under intestine or intestine-like conditions. Simultaneously, an intriguing prospective will be to employ more sophisticated protease cleavage models as well as quantitative proteomics data in order to predict a range of AMPs concentration with respect to the relative abundance of their parent proteins. Under ideal conditions and given sufficient time, all proteins can be fully degraded through hydrolysis by endogenous proteases and proteases from commensal microbes. Nevertheless, during this dynamic process, it is expected that some peptides will be stable enough to exert temporarily their effects. Incorporating enzyme kinetics to model dynamically the cleavage activity of each type of protease can aid towards shedding light on these dynamics under intestine-relevant conditions. Such approaches have already being demonstrated with promising results [41
We anticipate that adapting and employing this workflow to obtain AMP profiling in other functional foods, but also extending it to probe for other types of bioactive peptides, can shape a better understanding of the complex interaction landscape between the host, its microbiome and its dietary habits. Finally, the workflow we employed, allowing fast screening of entire proteomes for antimicrobial peptides that can occur during digestion, can assist the ongoing effort to design peptides as medicinal products which can be efficiently delivered through the oral route [39