In Silico Analysis of Individual Fractions of Bovine Casein as Precursors of Bioactive Peptides—Inﬂuence of Post-Translational Modiﬁcations

: Bovine casein is one of the most known precursors of bioactive peptides among food proteins. Thus far, in silico investigations addressing casein have taken no account of the impact of modiﬁcations of amino acid residues on the feasibility of bioactive peptide release. The present study aimed to determine the effect of such modiﬁcation on the possibility of release of bioactive peptides from casein during simulated digestion. The α s1 -, α s2 -, β -, and κ -casein sequences were deposited in the BIOPEP-UWM protein database considering phosphorylated amino acids, cysteine residues forming disulﬁde bridges, and pyroglutamic acid residues. The frequency of occurrence of bioactive fragments and the frequency of their release by digestive enzymes were determined for the analyzed modiﬁed and unmodiﬁed proteins. Peptides found exclusively in the sequences of unmodiﬁed proteins were deemed as false-positive results. From 1.74% ( β -casein A2) to 4.41% ( α s2 -casein B and D) of the false-positive results were obtained for the total frequency of occurrence of bioactive fragments (sums of frequencies computed for all activities). In turn, from 1.78% ( κ -casein B) to 9.18% ( β -casein A2 and A3) of false-positive results were obtained for the predicted total frequency of release of bioactive peptides by the system of digestive enzymes (pepsin, trypsin, and chymotrypsin).

The factors determining biological and functional properties of milk proteins (which affect their behavior during technological processes) include post-translational modifications, such as phosphorylation, and also substitutions of amino acid residues and deletions leading to the occurrence of genetic variants [27][28][29][30]. Milk proteins have been extensively studied as precursors of bioactive peptides representing activities of various types [31][32][33][34][35][36].
Though computer-assisted methods have been widely applied in research addressing bioactive peptides derived from food proteins, including milk [36][37][38][39][40], the impact of modifications of amino acid residues on the profiles of their bioactive peptides has not been explored so far.
In view of the above, the present study aimed to quantify the effect of considering modifications of amino acid residues, including mainly phosphorylation, on the feasibility of bioactive peptide release from individual fractions of bovine milk casein.

Materials and Methods
Amino acid sequences of individual casein fractions derived from the UniProt database [41,42] were modified via the removal of signal peptides and the addition of the localization of phosphate residues and L-pyroglutamic acid residues based on information provided in the UniProt database. In the case of phosphorylation, only those residues were considered whose localization had been confirmed experimentally, while those included in the database as theoretically predicted based on the similarity of sequences with other proteins (annotation: "By similarity") were omitted. Account was also taken of the fact that cysteine residues form disulfide bridges in individual casein fractions [43,44]. In the case of κ-casein, analyses included only the sequences without saccharide residues. The list of post-translational modifications considered in the present work is provided in Table 1.  1 Repository of amino acids and modifications in the BIOPEP-UWM database [45][46][47]. 2 PubChem database [48,49].
Amino acid sequences were deposited in the BIOPEP-UWM database of proteins [33]. Substitutions of amino acids or deletions of fragments in individual genetic variants were provided based on information derived from the UniProt database and a review by Caroli et al. [50]. Protein sequences used in this work are presented in Table S1.
The analyzed sequences were also quantified for the frequency of occurrence of bioactive fragments in protein sequences (parameter A) [52] (Equation (1)).
a-number of fragments with a given activity in a protein sequence. N-number of amino acid residues in a protein sequence. The second analyzed quantitative parameter was the frequency of release of fragments with a given bioactivity as a result of the coupled action of the aforementioned proteolytic enzymes (parameter A E ) [53], computed based on Equation (2): d-number of peptides with a given activity potentially released by the abovementioned enzyme combination from a given protein.
N-number of amino acid residues in a protein sequence. Computer simulations of proteolysis and computations of quantitative parameters were performed using applications available in the BIOPEP-UWM database [46,47].
The casein fractions with and without post-translational modifications were compared in terms of the frequency of occurrence of bioactive fragments in sequences of individual proteins.
The percentage of false-positive (FP) results calculated from Equations (3) and (4) was used as the parameter describing the effect of considering the modifications on the computed frequency of occurrence of bioactive fragments in a protein sequence (A) and the frequency of release of fragments with a given activity by digestive enzymes (A E ): A 0 -value of parameter A determined for non-modified protein.
A-value of parameter A determined for protein containing post-translational modifications.
A E0 -value of parameter A E determined for non-modified protein.
A E -value of parameter A E determined for protein containing post-translational modifications.
Identity between a protein sequence with post-translation modifications and a sequence of the same protein without modifications was computed based on the following equation: N-number of amino acid residues in a protein chain. M-number of modified amino acid residues in a protein chain. Visualization of the results in the figures was made by means of HeatMapper software [54,55].
Biological activity (interaction with human body proteins) of di-and tri-peptides containing post-translation modifications was predicted using SwissTargetPrediction software [56,57].

Results
The values of A and A E parameters computed for the sequences with and without post-translational modifications are presented in Figures 1 and 2 and in Tables S2-S9 in the Supplementary Materials. The figures present only those activities for which the predicted values of quantitative parameters changed as a result of considering post-translational modifications in calculations. The values of quantitative parameters determined for all peptides occurring in sequences of the analyzed proteins or being predicted products of their digestion are collated in Tables S2-S9. The percentage of false positive results obtained due omission of post-translational modifications is presented in Figure 3 and Tables S10 and S11.
Consideration of the post-translational modifications during analyses led to changes in the numeric values of parameters characterizing proteins as precursors of bioactive peptides only in the case of certain activities. The percentage of these activities determined for individual fractions and genetic variants of casein is presented in Table 2.
One of the aims of protein amino acid sequence analysis is to predict the likelihood of bioactive peptide release during protein proteolysis followed by their detection deploying a targeted peptidomics strategy. An example of such an experiment was provided in a work by Darewicz et al. [58], describing the detection of bioactive peptides among products of enzymatic hydrolysis of oat proteins. Table 3 lists peptides with a known activity, being putative products of digestion of individual casein fractions.
The putative products of casein digestion include 46 peptides with a known bioactivity, described in the BIOPEP-UWM database. Among these, 38 peptides exhibit more than one activity. The highest number of identified peptides include dipeptidyl peptidase IV inhibitors-36 compounds and angiotensin-converting enzyme inhibitors-24 compounds. Peptides exhibiting these activities have been most extensively investigated in recent years and represented in the highest numbers in the BIOPEP-UWM database [45]. Four of the peptides listed in Table 3 inhibit dipeptidyl peptidase III. Three inhibit α-glucosidase, and another three exhibit antioxidative activity. Nine of the listed peptides exhibit other types of activity. Peptides identified as inhibitors of the angiotensin-converting enzyme, dipeptidylpeptidase IV, and α-glucosidase as well as those exhibiting antioxidative activity may be valuable in terms of preventing the so-called metabolic syndrome [59].  Simulated in silico proteolysis enables predicting the release of peptides containing modified amino acid residues, including mainly those of phosphopeptides. The bioactivity of casein hydrolysates, composed of phosphopeptides, has already been investigated [35]; however, the activities of individual compounds have not been registered in the BIOPEP-UWM database so far. For this reason, theoretical predications of bioactivity were performed by means of the SwissTargetPrediction software. Authors of this software [57] recommend it as a tool for the preliminary analysis of interactions of low-molecular-weight compounds with proteins that enables predicting the bioactivity of di-and tri-peptides.     3) or (4) (dividing by A 0 = 0 or A E0 = 0). Activities with FP = 0 for all analyzed proteins were omitted. Numeric values of FP for A and A E parameters are provided in Tables S10 and S11, respectively, in the Supplementary Materials.      Table 4 lists proteins showing the highest likelihood of interactions with modified peptides. Table 4. Di-and tri-peptides containing post-translational modifications, potentially released via simulated in silico digestion, and their potential interactions with proteins predicted using the SwissTargetPrediction software. Proteins being receptors of ligands containing phosphate residues or enzymes catalyzing reactions with compounds containing such residues are denoted in italics.  Table 1; 2 Table also presents IDs of proteins available in the UniProt [41,42] and ChEMBL [60,61] databases; 3 p-probability that a given peptide will be a ligand of a given protein; 4 Table lists proteins for which the probability of interactions with the ligand is the highest among all proteins typed by the SwissTargetPrediction software.

Discussion
A comparison of the frequency of occurrence of bioactive fragments (A and A 0 ) and the frequency of their predicted release during proteolysis (A E and A E0 ) indicates that taking account of the post-translation modifications during computations leads to decreased values of these parameters found for many activities. Part of the bioactive fragments containing serine (S symbol) do not appear in the results obtained for the sequences containing post-translation modifications because these sequences contain phosphoserine denoted with the <S[3*]> symbol. A similar observation was made for the cysteine residues involved in the formation of disulfide bridges (denoted with <C>). Di-peptides and tripeptides account for major part of bioactive fragments found in both the analyzed protein sequences as well as in sequences of many other proteins [62]. The sites of their occurrence may, accidentally, coincide with the localization of modifications, if the latter are not taken into account during analysis. Prediction of the occurrence of peptides containing nonmodified amino acid residues at the localization sites of post-translation modifications or considering such peptides while quantifying parameters that characterize proteins as potential precursors of bioactive fragments should, therefore, be treated as a false-positive result. Such a result means that also other fragments do appear in real protein sequences that are different than those predicted based on sequence analysis taking no account of modifications.
The qualitative analysis results (i.e., "yes" or "no" responses) are divided into four categories: true-positive, false-positive, true-negative, and false-negative results. The percentages of results from these categories are used to assess, e.g., methods of qualitative analysis deployed in analytical chemistry [63]. The present analysis may allow defining a true-positive result (the presence or the probability of release of a bioactive fragment of both a protein containing post-translational modifications or a protein having an identical amino acid sequence but not containing such modifications) or a false-positive result (the presence of a bioactive fragment or the possibility of its release from only a protein devoid of modification, as well as a lack of such a fragment or no possibility of its release during hydrolysis of a protein having an identical amino acid sequence and containing modifications). The value of parameter A or A E (computed from Equations (3) and (4), respectively) may be interpreted as the number of true-positive results per protein chain length. The A 0 − A or A E0 − A E difference may be interpreted as the number of falsepositive results per protein chain length.
The false-negative results cannot be defined while analyzing protein sequences as precursors of bioactive peptides. Such a result would mean a lack of the fragment with a given activity in a protein sequence. In reality, the negative result achieved for a given activity means a lack of information about such fragments. There is no complete information on all possible types of biological activity of all possible fragments of a given protein. The false-negative result would mean that a given fragment would occur in a sequence of the modified protein, whereas the sequence analysis of the unmodified protein would not indicate its presence. In the present analysis, the following inequality has not been met for any activity: A > A 0 or A E > A E0 . Such an inequality could have been met if information was available about the activity of peptides containing phosphoryl amino acid residues or L-pyroglutamic acid residue. It cannot be excluded, however, that activities of such peptides will be investigated in the future. Antioxidative activity of peptides containing L-pyroglutamic acid has been recently described by Vitale et al. [64].
Identity between a protein sequence containing post-translational modifications and a sequence of the same protein devoid of modifications computed based on Equation (5) (parameter I) has a similar physical sense as the identity computed using specialist software for matching protein sequences such as, for example, BLAST [65], Clustal Omega [66], or HMMER [67]. Equation (5) is very simplified compared to the aforementioned computer programs because it takes no account of insertion and deletion of protein fragments and considers only the equivalent of amino acid substitutions.
Even though the analyzed proteins contain a relatively low number of modified amino acid residues, the changes noted in the predicted values of parameters describing proteins as precursors of bioactive peptides seem to be significant. The appearance of false-positive results for, e.g., four out of nine activities of fragments potentially released from β-casein, is likely ( Table 2).
The percentage of false-positive results obtained for particular types of activities varied within a broad range. FP = 0 means that the frequency of occurrence of bioactive fragments in protein sequences or the predicted frequency of release of bioactive fragments are identical for the proteins with post-translational modifications considered and for the same sequences analyzed without considering modifications (A = A 0 or A E = A E0 ). In turn, FP = 100% indicates that the value of the analyzed parameter is 0 for the proteins with posttranslational modifications considered and differs from 0 for the same proteins analyzed without considering these modifications (A = 0 and A 0 = 0 or A E = 0 and A E0 = 0). In the case of certain activities and certain genetic variants of individual casein fractions, the bioactive fragments do not occur either in modified or in unmodified sequences (A = 0 and A 0 = 0 or A E = 0 and A E0 = 0). In such a case, computing the percentage of false-positive results is impossible. Predictions of frequent activity types (relatively high values of A and AE), such as inhibition of angiotensin-converting enzyme (EC 3.4.15.1) or inhibition of dipeptidylpeptidase IV (EC 3.4.14.5), and also a sum of all activities yield a relatively low percentage of false-positive results. In the case of the less known activities (relatively low values of A and A E ), the percentage of false-negative results ranged from 0 to 100%.
Among proteins summarized in Table 4, HLA-A is a protein of the immune system responsible for presenting antigens, e.g., those derived from viruses or neoplasms [68][69][70]. DPPIV (EC 4.4.14.5) is a commonly known proteolytic enzyme whose inhibitors are applied as anti-diabetic drugs [59,71]. Recent studies have focused on DPPIV role in the course of also other diseases, such as Alzheimer's disease [72] or neoplasms [73]. The XIAP protein is ubiquitin ligase (EC 2.3.2.27), involved in apoptotic processes, modulation of inflammatory signals and immune response, and cell spreading, including tumor metastases [74][75][76][77]. The LPAR1 and LPAR3 proteins are receptors of lysophosphatidic acid (PubChem CID: 5497152; ChEMBL ID: CHEMBL117021), which together with its receptors are involved in signal transfer in cells of, i.e., nervous, respiratory, and gastrointestinal systems [78]. In turn, the PLK1 (EC 2.7.11.21) and PRKCE (EC 2.7.11.13) proteins are kinases catalyzing the phosphorylation of amino acid residues (serine in particular) in proteins. Phosphorylation is the most common type of post-translational modifications of proteins in live organisms [79,80]. BRCA1 (Breast cancer type 1 susceptibility protein) (EC 2.3.2.27) is a ligase catalyzing protein ubiquitination. Its malfunction due to, among other things, mutations has been implicated in the course of breast cancer [81,82]. The PIN1 protein is an enzyme (EC 5.2.1.8) catalyzing the reaction of cis-trans isomerization of peptide bonds formed by amine groups of proline and carboxyl groups of phosphorylated serine or threonine. Its malfunction is observed, e.g., in the course of the Alzheimer's disease [83], whereas its over-expression and hyper-activity play a significant role in carcinogenesis [84]. The FPR2 acronym (N-formyl peptide receptor 2) denotes a protein being a receptor of chemotactic peptides possessing a formic acid residue at the N-terminus. These receptors play a meaningful role during inflammatory states and angiogenesis [85]. The latter process is essential to wound healing and also to cancer development. A group of proteins referred to as S1PR (Sphingosine 1-phosphate receptor) serves the function of sphingophospholipid receptors. Their hyperactivity is observed in the course of non-specific inflammatory bowel disease. Compounds that modulate the activity of S1PR receptors have been investigated as potential drugs to be used in this disease condition [86]. The use of S1PR receptor modulators is also considered in the treatment of multiple sclerosis [87]. The TNFRSF10A (Tumor necrosis factor receptor superfamily member 10A) protein (UniProt: P50591) is a receptor of a cytotoxic ligand TRAIL that stimulates apoptosis [88]. The protein abbreviated as TK1 (EC 2.7.1.21) is a kinase accelerating the phosphorylation of thymidine nucleoside, being one of the key enzymes to the metabolism of nucleosides and nucleotides and, thereby, to DNA synthesis [89]. The DLG4 (Disks large homolog 4) compound is a homolog of proteins that play a key role in the formation of synapses [90]. The FNT acronym (FNTA; FNTB) denotes farnesyltransferase (EC 2.5.1.58), i.e., an enzyme involved in the modification of cysteine residues by attaching oligo-isoprene residues. Its malfunction contributes to the improper pattern of post-translational modifications of proteins in certain neoplasms [91]. Also, peptides built of the non-modified protein amino acids may-theoretically-serve as farnesyltransferase inhibitors [92]. The FDPS (Farnesyl diphosphate synthase) or FPPS (Farnesyl pyrophosphate synthase) (EC 2.5.1.1) enzyme is a synthetase of farnesyl diphosphate (pyrophosphate) (PubChem: No 445713), being a substrate of farnesyltransferase.
Summing up data collated in Table 4, it may be concluded that most of the proteins listed therein are related to either reactions (as enzymes) or bioactivities (as receptors) of compounds containing phosphate residues (e.g., of phospholipids). The malfunctions of these proteins are observed in the course of various diseases such as, for example, neoplasms. In the case of peptides built from unmodified protein amino acids, proteolytic enzymes serve as the main group of potentially interacting proteins [93]. The highest probability of peptide-protein interactions, presented in Table 4, is lower than the analogous probability predicted for the peptides described in previous works [92][93][94]. The difference in the probability of interactions with proteins noted between the peptides built of the nonmodified amino acids and phosphopeptides may stem from two reasons. The first entails differences in the structure and physicochemical properties triggered by modifications, whereas the other one may be due to knowledge gaps. The SwissTargetPrediction software compares the structure of analyzed compounds with data from the ChEMBL database [57]. This database provides a load of experimental data related to the interactions of short, unmodified peptides with proteins [95], while it lacks information about phosphopeptides and, therefore, has insufficient data for comparative analyses. Further studies are, hence, needed to resolve the doubts regarding the putative bioactivity of phosphopeptides.
Considering that modifications of amino acid residues would enable more precise description of the bioactivity of protein fragments, the sequence analysis made tookg no account of such modifications results in the appearance of false-positive results. On the other hand, the analysis of sequences built exclusively from protein amino acids is simpler than taking into account the changes induced by chemical or enzymatic modifications of amino acid residues. This fact may prove essential while working with large sets of sequences. Bovine milk proteins (including casein) have been thoroughly investigated as precursors of bioactive peptides. These investigations addressed even the impact of single substitutions of amino acids, leading to the formation of genetic variants of individual casein fractions of various species [96][97][98][99][100]. Certain cases require a different strategy, wherein large sets of sequences can be reduced. After selecting the first sequence, proteins with an identity higher than a pre-determined threshold, e.g., 90%, are eliminated. Such a strategy has been deployed in our previous research [58,101,102]. The proteins with 90% identity may be expected to belong to the same family defined based on the presence of appropriate domains, described in the InterPro database [103]. Proteins belonging to the same family share similar profiles of potential biological activity [104]. Substitutions of amino acid residues cannot be deemed the drivers of false-positive results but can be the likely reasons of changes in the contents of bioactive fragments. Considering or neglecting differences in the contents of bioactive fragments in analyses is up to the decision of individual authors. Taking account of modifications may be recommended in the analysis of peptide sequences with a relatively high content of untypical amino acids such as, for example, peptides containing hydroxyproline and methionine sulfoxide described by Bougatef et al. [105].
Peptides containing the modified residues may exhibit bioactivity. Even when the methodology of their isolation or synthesis is unavailable, they may be investigated by means of in silico methods [106,107]. Due to the established repository of amino acids and modifications, the BIOPEP-UWM database may prove well in such analyses.
Several databases of bioactive peptides, containing components other than proteinogenic amino acids, have been recently launched [9][10][11][12][13][14][15][16][17][18]. In comparison with them, BIOPEP-UWM possesses both strong points and limitations. Three main advantages of the BIOPEP-UWM database may be pointed out: -Possibility of matching peptide and protein sequences, containing modified amino acids annotated using code available in the database; -Possibility of proteolysis simulation using sequences with modified residues; -Possibility of easy translation of sequences into SMILES code, used in chemical databases (e.g., PubChem) and programs from the area of cheminformatics (e.g., SwissTargetPrediction). It is possible due to direct input of modified residues into sequences. For instance, proteins in UniProt database are annotated as sequences containing unmodified proteinogenic amino acid residues, with information about modifications added as a text. Only unmodified sequences in UniProt are available for processing (e.g., sequence alignments).
Possibilities of annotation in the BIOPEP-UWM database are restricted to linear peptides or proteins. For instance, the current version of BIOPEP-UWM code allows for the provision of information that the given cysteine residue is involved in the formation of disulfide bond, but there is no detailed location of this bond in peptide or protein sequence (Table S1 in Supplementary Materials). On the other hand, macrocyclic or branched peptides may be annotated using, for example, BILN code [25], but there is no simple algorithm for translation of such peptide structures into SMILES code. Another restriction of BIOPEP-UWM is that it provides only possibility of exact match, i.e., finding in protein sequences fragments with 100% identity and the same length as queries. However, all existing algorithms and programs for sequence alignments (e.g., BLAST, Clustal Omega, and HMMER) accept sequences containing only unmodified, proteinogenic amino acids. Development of algorithms and programs for automated sequence alignments incorporating post-translational modifications may require radically new ideas. The third restriction of BIOPEP-UWM is that this tool is able to annotate and process sequences containing only residues present in repository of amino acids and modifications. Enrichment of space of peptides and proteins annotated in BIOPEP-UWM may require prior submission of new amino acid residues or other monomers. The addition of individual monomers to the repository may depend on the possibility of adapting its SMILES representation to the CHUCKLES algorithm [108] and further validation of the result of rearrangement as recommended previously [109]. Taking the above into account, we would like to encourage BIOPEP-UWM users to submit new compounds to the repository of amino acids and modifications.

Conclusions
Today, the BIOPEP-UWM database serves as a tool enabling the analysis of peptide and protein sequences with amino acid residues that underwent chemical and enzymatic modifications (e.g., of phosphopeptides and phosphoproteins). The likely benefit of implementing the new options is the increased precision of predictions of bioactivity of protein fragments. On the one hand, taking into account modifications of amino acid residues makes it possible to avoid false-positive results, while on the other hand, it allows for research on modified peptides. The analysis of individual casein fractions presented in this paper is the first example of a comprehensive sequence analysis of proteins containing modified amino acid residues as precursors of biologically active peptides.
Supplementary Materials: The following supporting information can be downloaded at: https:// www.mdpi.com/article/10.3390/app13148091/s1, Table S1: Protein sequences used for analyses; Table S2: Frequency of occurrence of bioactive fragments A and A 0 in individual genetic variants of α s1 -casein; Table S3: Frequency of occurrence of bioactive fragments A and A 0 in individual genetic variants of α s2 -casein; Table S4: Frequency of occurrence of bioactive fragments A and A 0 in individual genetic variants of β-casein; Table S5: Frequency of occurrence of bioactive fragments A and A 0 in individual genetic variants of κ-casein; Table S6: Frequency of predicted release of bioactive fragments A E and A E0 in individual genetic variants of α s1 -casein; Table S7: Frequency of predicted release of bioactive fragments A E and A E0 in individual genetic variants of α s2 -casein; Table S8: Frequency of predicted release of bioactive fragments A E and A E0 in individual genetic variants of β-casein; Table S9: Frequency of predicted release of bioactive fragments A E and A E0 in individual genetic variants of κ-casein; Table S10