High-Resolution Mass Spectrometry and Chemometrics for the Detailed Characterization of Short Endogenous Peptides in Milk By-Products

The process of cheese-making has long been part of human food culture and nowadays dairy represents a large sector of the food industry. Being the main byproduct of cheese-making, the revalorization of milk whey is nowadays one of the primary goals in alignment with the principles of the circular economy. In the present paper, a deep and detailed investigation of short endogenous peptides in milk and its byproducts (whole whey, skimmed whey, and whey permeate) was carried out by high-resolution mass spectrometry, with a dedicated suspect screening data acquisition and data analysis approach. A total of 79 short peptides was tentatively identified, including several sequences already known for their exerted biological activities. An unsupervised chemometric approach was then employed for highlighting the differences in the short peptide content among the four sets of samples. Whole and skimmed whey showed not merely a higher content of short bioactive peptides compared to whole milk, but also a peculiar composition of peptides that are likely generated during the process of cheese-making. The results clearly demonstrate that whey represents a valuable source of bioactive compounds and that the set-up of processes of revalorization of milk byproducts is a promising path in the obtention of high revenue-generating products from dairy industrial waste.


Introduction
The earliest evidence for cheese-making can be traced back to the sixth millennium BC, representing a critical development for milk preservation in a transportable and long conservation form [1]. Dairy farms and industries had a significant expansion at the turn of the 20th century thanks to the technological advancements of the second industrial revolution [2]. In 2019, the value of the dairy market was estimated to be about 720 billion US dollars worldwide, and it is projected to grow to 1,032 billion dollars by 2024 (2021 report by Statista), given the continuous growth of the world population and the improvement of living conditions in developing countries. The significant growth in the production of milkderived products, such as milk powder, butter, and cheese, has also led to a massive release of industrial waste [3]. Every year, it is estimated that 4-11 million tons of dairy waste are released into the environment, causing several dangers to biodiversity due to the depletion of dissolved oxygen inducted by the fat constituents [3,4]. The main byproduct of dairy farming is whey, which is produced following casein coagulation in the cheese-making process. Milk whey contains a broad range of bioactive compounds, such as proteins, lipids, carbohydrates, vitamins, and minerals [5], and has been the object of several studies

HRMS Data Acquisition and Short Peptide Identification
Despite being particularly interesting due to their enhanced biological properties compared to medium-and long-sized amino acid sequences, short peptides have been largely neglected [23]. The commonly employed analytical platforms that peptidomics has borrowed from bottom-up proteomics, and that include nano-HPLC separation, multi-charged ion acquisition, and database search, are not suitable for short amino acid sequences. These compounds, in fact, are more closely related to metabolites than to longer peptide sequences when it comes to untargeted identification. Their small dimensions, in fact, hinder the generation of multi-charged adducts, and singly charged ions are much more likely to be generated. Moreover, the physicochemical properties of short peptide sequences are more dependent on the nature of the a.a., reflecting in more significant variability of the fragmentation pathways and resulting in the need for manual validation for proper identification [24]. Finally, database search, which associates amino acid sequences to existing peptides present in databases, cannot be employed with satisfying results for very short sequences.
For assessing the short peptide profile of milk byproducts derived from the cheesemaking industry, whole milk (WM), whole whey (WW), skimmed whey (SW), and permeated whey (PW) were purified with graphitized carbon black (GCB) [25] and analyzed by HPLC coupled to HRMS using a suspect screening approach. A database of all combinations of the 20 natural a.a. in di-, tri-, and tetrapeptides was obtained by MATLAB, filtered to remove repeating masses, and employed to generate a list of m/z to be implemented as an inclusion list in the MS method. The inclusion lists allow for compensating of the limits of data-dependent acquisition (DDA) mode, in which the top n most intense ions in each full-scan are sequentially isolated and fragmented. When complex matrices are analyzed (as in the case of milk and its derivatives), several coeluting compounds could cause mask effects on the compounds of interest (e.g., short peptide sequences). When comprehensive lists of m/z are available, suspect screening approaches allow bypassing of the mask effects with the selection of the sole m/z present in the inclusion list.
For short peptide annotation, a dedicated data processing approach, which was previously implemented on Compound Discoverer by our research group [24], was employed. MS spectra were therefore extracted and aligned, adducts were evaluated and grouped, and MS/MS spectra were associated with the features. Thanks to the fill gaps tool on Compound Discoverer, compounds with low abundances in one of the sets of samples could still be annotated. Whenever the peaks were absent in one of the sets of samples, the noise level was chosen as the area.
Thanks to the described analytical platform, 79 short peptide sequences were tentatively identified after careful manual validation of the features. In Table 1, the annotated short peptides are reported alongside some details, i.e., retention time, proposed formula, experimental m/z, MS accuracy, and main diagnostic product ions. Furthermore, in Table S1, further details are available, including adducts, proposed molecular weight, calculated m/z, and GRAVY indices. It is important to highlight that MS/MS spectra obtained with higher collisional dissociation (HCD) do not allow the distinguishing of leucine and isoleucine, which would instead require MS 3 experiments or other fragmentation techniques [26]. The nomenclature Xle was employed throughout the manuscript and Supplementary Materials for indicating that one of the two isomeric a.a. was present in the peptide sequence. Of the 79 annotated a.a. sequences, 13 were dipeptides, 31 were tripeptides, and 35 tetrapeptides. The most abundant a.a. were leucine/isoleucine (25.8%), followed by tyrosine (14.3%), phenylalanine (10.8%), arginine (8.5%), and valine (7.7%). On the other hand, aspartic acid and methionine constituted less than 1% of the total a.a. content of the identified short peptides, and no cysteine was reported. The high abundance of branchedchain and aromatic a.a. reflects on the grand average of hydropathicity indexes (GRAVY) of the identified peptides, a parameter that measures the hydrophilicity/hydrophobicity of peptide sequences. The more negative the GRAVY is, the more hydrophilic the sequence is. In general, positive GRAVY values are associated with hydrophobic peptides, whereas negative values are associated with hydrophilic ones. As shown in Figure 1, most annotated short peptides had positive GRAVY values (ca. 62%), with a median GRAVY of 0.45 and ten extremely hydrophobic annotated peptides (GRAVY > 2.00). Of the 79 annotated a.a. sequences, 13 were dipeptides, 31 were tripeptides, and 35 tetrapeptides. The most abundant a.a. were leucine/isoleucine (25.8%), followed by tyrosine (14.3%), phenylalanine (10.8%), arginine (8.5%), and valine (7.7%). On the other hand, aspartic acid and methionine constituted less than 1% of the total a.a. content of the identified short peptides, and no cysteine was reported. The high abundance of branchedchain and aromatic a.a. reflects on the grand average of hydropathicity indexes (GRAVY) of the identified peptides, a parameter that measures the hydrophilicity/hydrophobicity of peptide sequences. The more negative the GRAVY is, the more hydrophilic the sequence is. In general, positive GRAVY values are associated with hydrophobic peptides, whereas negative values are associated with hydrophilic ones. As shown in Figure 1, most annotated short peptides had positive GRAVY values (ca. 62%), with a median GRAVY of 0.45 and ten extremely hydrophobic annotated peptides (GRAVY > 2.00). The results were dramatically different than those previously reported for milk-derived peptides. In 2014, Dziuba and his coworkers reported 59 antimicrobial peptides from milk proteins with a median GRAVY of −0.4 [27]. Similarly, Piovesana et al. reported that 80% of the total peptides identified from donkey milk protein hydrolysis had a negative GRAVY [28]. Keeping in mind that reversed-phase separation favours hydrophobic The results were dramatically different than those previously reported for milkderived peptides. In 2014, Dziuba and his coworkers reported 59 antimicrobial peptides from milk proteins with a median GRAVY of −0.4 [27]. Similarly, Piovesana et al. reported that 80% of the total peptides identified from donkey milk protein hydrolysis had a negative GRAVY [28]. Keeping in mind that reversed-phase separation favours hydrophobic peptide separation (and eventually identification), the results could be linked to the ori-gin of such peptides-considering that caseins have a median GRAVY of −0.3, whereas β-lactoglobulin, the main whey protein, is close to 0.
Short peptide sequences are promising bioactive compounds, since, unlike longer peptides, they can be absorbed by the gastrointestinal tract [21]. The a.a. composition of the peptide sequence has a great impact on determining the exerted biological activities. For example, branched-chain and aromatic a.a. (e.g., Ile, Pro, Val, Phe, and Trp), as well as positively charged residues at the N-terminus (i.e., His, Lys, and Arg), have been associated with antioxidant activity [29]. The annotated short peptides were searched for in the milk bioactive peptide database (MBPDB) [30], which is dedicated to bioactive peptides in milk and derivatives, and in the BIOPEP-UWM database [31] for their reported bioactivities. Moreover, the sequences were submitted to PeptideRanker, a tool for predicting whether a peptide sequence is bioactive based on a neural network [32]. Forty-nine sequences were either reported in MBPDB or BIOPEP or had a score > 0.50 on PeptideRanker. The results are reported in Table S2.
As shown in Figure 2a, nine peptides were common to the three subsets, i.e., Tyr-Xle, Xle-Trp, Tyr-Pro, Xle-Pro-Tyr, Arg-Tyr-Xle, Xle-Gly-Tyr, Tyr-Xle-Xle, Xle-Arg-Phe-Phe, and Trp-Xle-Gln-Pro. Not unexpectedly, the nine peptides presented several branched-chain and aromatic a.a. (mainly Xle, Tyr, and Trp). Eight more sequences were common to MBPDB and BIOPEP, whereas eleven peptides-mainly dipeptides-were only present in BIOPEP. The fifteen peptides which were predicted by PeptideRanker as possibly bioactive but which were not reported in either MBPDB or BIOPEP demonstrate not merely that short peptides are very likely to exert biological functions, but also that there is still much to discover in the field of short bioactive peptide research. peptide separation (and eventually identification), the results could be linked to the origin of such peptides-considering that caseins have a median GRAVY of −0.3, whereas βlactoglobulin, the main whey protein, is close to 0. Short peptide sequences are promising bioactive compounds, since, unlike longer peptides, they can be absorbed by the gastrointestinal tract [21]. The a.a. composition of the peptide sequence has a great impact on determining the exerted biological activities. For example, branched-chain and aromatic a.a. (e.g., Ile, Pro, Val, Phe, and Trp), as well as positively charged residues at the N-terminus (i.e., His, Lys, and Arg), have been associated with antioxidant activity [29]. The annotated short peptides were searched for in the milk bioactive peptide database (MBPDB) [30], which is dedicated to bioactive peptides in milk and derivatives, and in the BIOPEP-UWM database [31] for their reported bioactivities. Moreover, the sequences were submitted to PeptideRanker, a tool for predicting whether a peptide sequence is bioactive based on a neural network [32]. Fortynine sequences were either reported in MBPDB or BIOPEP or had a score > 0.50 on Pep-tideRanker. The results are reported in Table S2.
As shown in Figure 2a, nine peptides were common to the three subsets, i.e., Tyr-Xle, Xle-Trp, Tyr-Pro, Xle-Pro-Tyr, Arg-Tyr-Xle, Xle-Gly-Tyr, Tyr-Xle-Xle, Xle-Arg-Phe-Phe, and Trp-Xle-Gln-Pro. Not unexpectedly, the nine peptides presented several branchedchain and aromatic a.a. (mainly Xle, Tyr, and Trp). Eight more sequences were common to MBPDB and BIOPEP, whereas eleven peptides-mainly dipeptides-were only present in BIOPEP. The fifteen peptides which were predicted by PeptideRanker as possibly bioactive but which were not reported in either MBPDB or BIOPEP demonstrate not merely that short peptides are very likely to exert biological functions, but also that there is still much to discover in the field of short bioactive peptide research. The reported bioactivities are summarized in Figure 2b based on the results obtained by the BIOPEP-UWM database. Not unexpectedly, most sequences were reported to exert angiotensin-converting enzyme (ACE) inhibitory activity. Peptides can inhibit ACE in three different ways, e.g., the inhibitor way the substrate way, and the prodrug way [33]. Since several short peptides cannot be hydrolyzed by ACE, they can exert a strong inhibitor-like activity, interacting with ACE but preventing the activation of its hydrolyzing activity. Twelve short peptides were reported to inhibit the enzyme dipeptidyl peptidase IV (DPP-IV), which is involved in the increase of blood glucose levels [34]-thus exerting an antidiabetic activity. Other than the eleven peptides with reported antioxidant activity, seven short a.a. sequences were previously linked to inhibitory activity on DPP-III, a metalloprotease that is involved in the cleavage of the N-terminal extremity of various bioactive peptides, including angiotensins and endorphins [35]. Finally, minor reported bioactivities included immuno-stimulating action, α-glucosidase inhibition, anxiolytic activity, The reported bioactivities are summarized in Figure 2b based on the results obtained by the BIOPEP-UWM database. Not unexpectedly, most sequences were reported to exert angiotensin-converting enzyme (ACE) inhibitory activity. Peptides can inhibit ACE in three different ways, e.g., the inhibitor way the substrate way, and the prodrug way [33]. Since several short peptides cannot be hydrolyzed by ACE, they can exert a strong inhibitor-like activity, interacting with ACE but preventing the activation of its hydrolyzing activity. Twelve short peptides were reported to inhibit the enzyme dipeptidyl peptidase IV (DPP-IV), which is involved in the increase of blood glucose levels [34]-thus exerting an antidiabetic activity. Other than the eleven peptides with reported antioxidant activity, seven short a.a. sequences were previously linked to inhibitory activity on DPP-III, a metalloprotease that is involved in the cleavage of the N-terminal extremity of various bioactive peptides, including angiotensins and endorphins [35]. Finally, minor reported bioactivities included immuno-stimulating action, α-glucosidase inhibition, anxiolytic activity, renin inhibition, neurological activity, phosphoinositol regulatory activity, and antibacterial action.

Principal Component Analysis of Datasets
For the global evaluation of the short peptide content, an unsupervised chemometric approach was employed, since a proper supervised approach would not have been possible with the available number of samples. The principal component analysis (PCA) on the data matrix obtained following the HRMS data processing was submitted to MetaboAnalyst-a freeware for metabolomics and chemometrics analyses [36]. The 79 annotated short peptide sequences were used as variables. The PCA is the most employed tool for exploratory data analysis, and it is based on the least-square approximation of the data projected on a reduced set of latent variables known as the principal components, which describe the largest possible variability of the experimental datasets [37]. Through the inspection of the PCA results, information on the relationships between the samples is obtained along the principal components on the scores plot, whereas the interpretation of the chemical species of the observed score patterns can be investigated on the loadings plot.
In Figure 3, the PCA modeling of milk and its byproducts is shown. The contribution of the first principal component (PC) was 46.0%, whereas the second PC contributed 34.6% of the total variance (the two PCs combined constituted more than 80% of the total variance). WW and SW samples were clearly discriminated from WM and WP along PC1, with negative values for WW and SW and positive values for WM and WP. On the other hand, WW and SW samples were evidently discriminated along PC2, with positive values for WW (as well as WM and WP) and negative values for SW samples (Figure 3a). The WP samples are hardly differentiated on the scores plot, since their content in most annotated short peptides was negligible compared to the other three sets of samples. The short peptidome profile can be therefore efficiently employed to discriminate WM and the byproducts derived from the cheese-making industrial process. renin inhibition, neurological activity, phosphoinositol regulatory activity, and antibacterial action.

Principal Component Analysis of Datasets
For the global evaluation of the short peptide content, an unsupervised chemometric approach was employed, since a proper supervised approach would not have been possible with the available number of samples. The principal component analysis (PCA) on the data matrix obtained following the HRMS data processing was submitted to Metabo-Analyst-a freeware for metabolomics and chemometrics analyses [36]. The 79 annotated short peptide sequences were used as variables. The PCA is the most employed tool for exploratory data analysis, and it is based on the least-square approximation of the data projected on a reduced set of latent variables known as the principal components, which describe the largest possible variability of the experimental datasets [37]. Through the inspection of the PCA results, information on the relationships between the samples is obtained along the principal components on the scores plot, whereas the interpretation of the chemical species of the observed score patterns can be investigated on the loadings plot.
In Figure 3, the PCA modeling of milk and its byproducts is shown. The contribution of the first principal component (PC) was 46.0%, whereas the second PC contributed 34.6% of the total variance (the two PCs combined constituted more than 80% of the total variance). WW and SW samples were clearly discriminated from WM and WP along PC1, with negative values for WW and SW and positive values for WM and WP. On the other hand, WW and SW samples were evidently discriminated along PC2, with positive values for WW (as well as WM and WP) and negative values for SW samples (Figure 3a). The WP samples are hardly differentiated on the scores plot, since their content in most annotated short peptides was negligible compared to the other three sets of samples. The short peptidome profile can be therefore efficiently employed to discriminate WM and the byproducts derived from the cheese-making industrial process.  The loadings plot can be employed for studying the variation in the short peptide profiles of the four sets of samples. In fact, the scores and loadings plot are correlated and the position of each peptide on the loadings plot corresponds to a higher concentration in the samples on the analog position of the scores plot. Four groups of peptides can be highlighted, as shown in Figure 3b, which were labeled Short Peptides 1 (SP1), SP2, SP3, and SP4-with only five remaining peptides that did not fall into any of the four groups. It is worth pointing out that both WW and SW samples showed a generally higher content of short bioactive peptides compared to WM.
Since only 18 of the annotated peptides were reported in MBPDB, a manual search of the a.a. sequences was manually carried out by inspection of the main milk protein sequences. For this purpose, the four main caseins were considered (α1, α2, β, and κ). Amongst the whey proteins, the a.a. sequences of lactoglobulin, lactalbumin, bovine serum albumin, lactotransferrin, and lactophorin were inspected. The manual sequence search allowed the tentative identification of the origins of 64 of the annotated short peptides-a result that not merely contributed to valorizing the identification platform employed for short peptide annotation, but also demonstrated again the lack of knowledge on short peptides. The protein sources of the short peptides (caseins vs. whey proteins) are reported in Table S2 along with the group to which each compound belonged (SP1 vs. SP2 vs. SP3 vs. SP4 vs. none). In Figure 4, the concentrations of four exemplary peptides from each group in the four sets of samples are reported.
The loadings plot can be employed for studying the variation in the short peptide profiles of the four sets of samples. In fact, the scores and loadings plot are correlated and the position of each peptide on the loadings plot corresponds to a higher concentration in the samples on the analog position of the scores plot. Four groups of peptides can be highlighted, as shown in Figure 3b, which were labeled Short Peptides 1 (SP1), SP2, SP3, and SP4-with only five remaining peptides that did not fall into any of the four groups. It is worth pointing out that both WW and SW samples showed a generally higher content of short bioactive peptides compared to WM.
Since only 18 of the annotated peptides were reported in MBPDB, a manual search of the a.a. sequences was manually carried out by inspection of the main milk protein sequences. For this purpose, the four main caseins were considered (α1, α2, β, and κ). Amongst the whey proteins, the a.a. sequences of lactoglobulin, lactalbumin, bovine serum albumin, lactotransferrin, and lactophorin were inspected. The manual sequence search allowed the tentative identification of the origins of 64 of the annotated short peptides-a result that not merely contributed to valorizing the identification platform employed for short peptide annotation, but also demonstrated again the lack of knowledge on short peptides. The protein sources of the short peptides (caseins vs. whey proteins) are reported in Table S2 along with the group to which each compound belonged (SP1 vs. SP2 vs. SP3 vs. SP4 vs. none). In Figure 4, the concentrations of four exemplary peptides from each group in the four sets of samples are reported. The compounds belonging to the SP1 subset were characterized by higher concentrations in the WM and WW samples (Figure 4a). These 17 peptides were likely derived from endogenous proteases originally present in milk or from protein turnover biocycles in the animals. Compounds belonging to SP1 were relatively shorter compared to the other The compounds belonging to the SP1 subset were characterized by higher concentrations in the WM and WW samples (Figure 4a). These 17 peptides were likely derived from endogenous proteases originally present in milk or from protein turnover biocycles in the animals. Compounds belonging to SP1 were relatively shorter compared to the other groups, with many dipeptides that, in light of their extremely short sequence, could not be easily attributed to specific proteins (and could indeed derive from the turnover of other proteins). With regards to the tri-and tetrapeptides, they were mostly matched to the sequences of casein proteins, which hinted at the role of endogenous proteases in their generation.
The second group (SP2) comprised 22 peptides with higher concentrations in WW samples (Figure 4b). Being more concentrated in WW than in WM, these short peptide sequences were likely produced during the process of coagulation of caseins for the separation of the milk solids from the liquid whey. It has been recently reported that short peptides are produced by longer sequences when subject to heating-especially in acid conditions [38]. Similar to peptides belonging to SP1, most of the annotated peptides were matched to casein protein sequences, which could confirm the generation of these peptides during casein precipitation. The lower concentrations of peptides of both SP1 and SP2 groups could be explained by a partial co-precipitation alongside the fat components of milk whey during the skimming process.
With regards to the third group of peptides (SP3, 18 peptides), they were characterized by mostly equally high concentrations in WW and SW (Figure 4c). Similar to SP2, these peptides were likely generated during casein precipitation. Moreover, they could have been derived from the activity of endogenous proteases on whey proteins. As such, the peptides of SP3 were found in both casein and whey protein sequences.
Finally, the peptides belonging to SP4 (17 peptides) were mostly tetrapeptides and were characterized by higher concentrations in SW samples (Figure 4d). Being more abundant in SW than in WW, these sequences were probably produced by the activity of endogenous whey proteases on whey proteins or endogenous medium to long-sized peptides. Similar to the compounds of SP3, these peptides were matched to both casein and whey protein sequences. Evidence for the activity of whey proteases is furnished by some of the annotated short peptides. Tyr-Pro-Glu-Xle, whose sequence was found in that of α-S1 casein (a.a. 161-164), is one of the peptides of the SP3 group and had a somewhat balanced abundance in WW and SW samples, but with higher concentrations in WW (Figure 5a). Among the compounds of SP4, on the other hand, there were two peptides, i.e., Tyr-Pro-Glu and Tyr-Pro, that could be effectively derived from the hydrolysis of Tyr-Pro-Glu-Xle and had, subsequently, higher concentrations in the SW samples (Figure 5b,c). groups, with many dipeptides that, in light of their extremely short sequence, could not be easily attributed to specific proteins (and could indeed derive from the turnover of other proteins). With regards to the tri-and tetrapeptides, they were mostly matched to the sequences of casein proteins, which hinted at the role of endogenous proteases in their generation.
The second group (SP2) comprised 22 peptides with higher concentrations in WW samples (Figure 4b). Being more concentrated in WW than in WM, these short peptide sequences were likely produced during the process of coagulation of caseins for the separation of the milk solids from the liquid whey. It has been recently reported that short peptides are produced by longer sequences when subject to heating-especially in acid conditions [38]. Similar to peptides belonging to SP1, most of the annotated peptides were matched to casein protein sequences, which could confirm the generation of these peptides during casein precipitation. The lower concentrations of peptides of both SP1 and SP2 groups could be explained by a partial co-precipitation alongside the fat components of milk whey during the skimming process.
With regards to the third group of peptides (SP3, 18 peptides), they were characterized by mostly equally high concentrations in WW and SW (Figure 4c). Similar to SP2, these peptides were likely generated during casein precipitation. Moreover, they could have been derived from the activity of endogenous proteases on whey proteins. As such, the peptides of SP3 were found in both casein and whey protein sequences.
Finally, the peptides belonging to SP4 (17 peptides) were mostly tetrapeptides and were characterized by higher concentrations in SW samples (Figure 4d). Being more abundant in SW than in WW, these sequences were probably produced by the activity of endogenous whey proteases on whey proteins or endogenous medium to long-sized peptides. Similar to the compounds of SP3, these peptides were matched to both casein and whey protein sequences. Evidence for the activity of whey proteases is furnished by some of the annotated short peptides. Tyr-Pro-Glu-Xle, whose sequence was found in that of α-S1 casein (a.a. 161-164), is one of the peptides of the SP3 group and had a somewhat balanced abundance in WW and SW samples, but with higher concentrations in WW ( Figure  5a). Among the compounds of SP4, on the other hand, there were two peptides, i.e., Tyr-Pro-Glu and Tyr-Pro, that could be effectively derived from the hydrolysis of Tyr-Pro-Glu-Xle and had, subsequently, higher concentrations in the SW samples (Figure 5b,c). The findings on the origins of the annotated peptides, which were often derived from casein sequences, disproved the precedent hypothesis on the source of the peptides based on their GRAVY values. The relatively hydrophobic character of the annotated peptides could have derived from the characteristics of the RP separation or from the cleavage preferences of the endogenous proteases. The findings on the origins of the annotated peptides, which were often derived from casein sequences, disproved the precedent hypothesis on the source of the peptides based on their GRAVY values. The relatively hydrophobic character of the annotated peptides could have derived from the characteristics of the RP separation or from the cleavage preferences of the endogenous proteases.
Compared to WM, WW and SW showed a generally higher content of short endogenous peptides. Moreover, their peptide composition was peculiar, with several compounds present in either WW or SW that were almost absent in WM. On the other hand, WP, which is the final product of the cheese-making industry, showed extremely low abundances of most annotated short peptides-as Figures 4 and 5 suggest.

Chemicals and Materials
Organic solvents of the highest grade available were purchased from VWR International (Milan, Italy). Optima LC-MS grade water and acetonitrile (ACN), used for short peptide analysis, were purchased from Thermo Fisher Scientific (Waltham, MA, USA). Cartridges packed with 500 mg Carbograph 4 were supplied from Lara S.R.L. (Lara S.r.l., Formello, Italy).

Sample Collection
The samples were collected from "Capurso Azienda Casearia srl", Gioia del Colle (BA, Italy). For this work, the following samples were selected, representing four steps of the cheese-making industry: whole milk (WW), whole whey (WW), skimmed whey (SW), and whey permeate (WP). WW is the byproduct obtained after coagulation of casein proteins through rennet addition for stretched-curd cheese production (cheeses obtained through the method of pasta filata). SW is obtained after WW defatting for obtaining whey butter. WP is obtained through reverse osmosis for isolating whey proteins for the obtention of cattle feed. Three distinct samples for each product were furnished by the dairy farm (12 samples) and the samples were freshly aliquoted and stored at −80 • C.

Sample Preparation
For each of the 12 samples, two distinct experiments were carried out (24 data points, 6 per sample type). Concentrated TFA was added to ten milliliters of each sample to reach pH 2. Samples were then centrifuged at 8000× g for 35 min at 4 • C to remove any debris. The supernatants were purified on Carbograph 4 cartridges using a previously developed protocol for short peptide purification and pre-concentration [25]. Briefly, the cartridge was first washed to remove impurities, then activated with 10 mL of H 2 O 0.1 mol L −1 HCl and conditioned with 10 mL of H 2 O 20 mmol L −1 TFA. The extracts were then loaded on the cartridge, which was then washed with 10 mL of H 2 O 20 mmol L −1 TFA. The short peptides were eluted with DCM/MeOH 80:20 (v/v) with 20 mmol L −1 TFA in backflushing elution mode. The eluates were dried in a thermostated bath at 25 • C under nitrogen flow. The residue was reconstituted in 1 mL of water for subsequent RP separation.

Liquid Chromatography-Mass Spectrometry Analysis
Endogenous short peptide extracts were analyzed by reverse phase chromatography, as described in a previous paper [22]. A Vanquish UHPLC binary pump was used, coupled to a hybrid quadrupole-Orbitrap Q Exactive mass spectrometer (Thermo Fisher Scientific, Bremen, Germany) through a heated electrospray source (HESI). The short peptides were separated by a Kinetex XB-C18 (100 × 2.1 mm, particle size 2.6 µm, Phenomenex, Torrance, CA, USA) operated at 40 • C. Spectra were acquired in the positive ion mode in the range m/z 150-750 with a resolution (full width at half maximum, FWHM, m/z 200) of 70,000, using a suspect screening approach. An inclusion list containing the exact m/z of the protonated ions of all the unique short peptide masses was employed (4980 unique m/z). The inclusion lists were prepared using MatLab R2018, as previously described [39]. The acquisition of the higher collisional dissociation (HCD) MS/MS spectra was performed using the top 5 DDA mode at 35% normalized collision energy and 35,000 (FWHM, m/z 200) resolution. For each of the 24 data points, three instrumental replicates were run. The average peak areas of the instrumental replicates were employed for statistical analysis.

Short Peptide Identification
The identification of endogenous short peptides was carried out using a dedicated data processing workflow implemented on Compound Discoverer 3.1 (Thermo Fisher Scientific, Bremen, Germany) by our research group, as described in a previous paper [24]. The optimized workflow allowed the extraction of the masses from the RAW data files, according to customized parameters. It also made it possible to predict composition, align the spectra, remove missing blank or MS/MS spectrum signals, and use complete short peptide lists to match the extracted characteristics. The identification of short peptides was confirmed by interpreting the MS/MS spectra aided by mMass, which allows for the in-silico fragmentation of peptides [40].

Statistical Analysis
For statistical analysis, MetaboAnalyst 5.0 [36] was employed. The data matrix obtained following short peptide identification was submitted as a text file that was prepared according to specific indications furnished by the developers. For data filtering, the interquartile range (IQR) was selected, whereas for data scaling, autoscaling (mean-centered and divided by the standard deviation of each variable) was chosen. For the evaluation of the four sets of samples, an unsupervised chemometric approach based on the principal component analysis was chosen.

Conclusions
Whey represents the main milk byproduct originating from the cheese-making industry, and it is known for being a rich source of valuable bioactive compounds. As demonstrated, both whole and skimmed whey contain a large number of short bioactive peptides (a higher content than whole milk) that are most likely originated following the acid treatment of milk, the heating processes, and the activity of endogenous whey proteases on proteins and longer peptides. The short peptide profile WW and SW, despite the sample being differentiated by the sole skimming process, present several differences that were attributed to the partial co-precipitation of some compounds during fat removal and the activity of endogenous peptidases on whey proteins or long peptide sequences. The short a.a. sequences, being known for their ability to preserve their structure (and subsequently their biological activities) and to be absorbed by the gastrointestinal tissue, short peptide-rich industrial waste could be of significant interest in light of the principles of the circular economy. As such, the renovation of the food industry is intimately linked to the revalorization of byproducts for the generation of high-revenue bioactive compounds.
Supplementary Materials: The following are available online, Table S1: Retention times (Rt, min), proposed formulas, adducts, molecular weights, experimental m/z, accuracy (∆, ppm), main diagnostic product ions, and GRAVY values of the 79 tentatively identified short endogenous peptides in whole milk, whole whey, skimmed whey, and permeate whey. Table S2