Peptidome: Chaos or Inevitability

Thousands of naturally occurring peptides differing in their origin, abundance and possible functions have been identified in the tissue and biological fluids of vertebrates, insects, fungi, plants and bacteria. These peptide pools are referred to as intracellular or extracellular peptidomes, and besides a small proportion of well-characterized peptide hormones and defense peptides, are poorly characterized. However, a growing body of evidence suggests that unknown bioactive peptides are hidden in the peptidomes of different organisms. In this review, we present a comprehensive overview of the mechanisms of generation and properties of peptidomes across different organisms. Based on their origin, we propose three large peptide groups—functional protein “degradome”, small open reading frame (smORF)-encoded peptides (smORFome) and specific precursor-derived peptides. The composition of peptide pools identified by mass-spectrometry analysis in human cells, plants, yeast and bacteria is compared and discussed. The functions of different peptide groups, for example the role of the “degradome” in promoting defense signaling, are also considered.


Introduction
Peptides play key roles in numerous processes, including growth regulation, stress response, and immune signaling in all living organisms [1][2][3][4][5][6]. Systemic studies of the biodiversity of peptides, which began in the early 1990s, demonstrated modest progress in the first several years owing to the limitations of the available analytical techniques. For example, in 2005-2006, the list of peptides of studied samples usually contained not more than a few hundred discrete peptide sequences [7][8][9]. However, the rapid development of modern mass-spectrometry analysis coupled with the explosive growth of genetic data banks has led to the considerable expansion of the list of characterized native peptidomes. Tens of thousands of peptides that significantly differ in their origin, function and properties have been identified in the tissue and biological fluids of multiple organisms [10][11][12][13][14][15][16][17]. Even though the peptidomes of prokaryotic and eukaryotic cells comprise thousands of peptides, the majority of them are generated during protein degradation [18]. These peptides are referred to as the "protein degradome" [19] and perhaps are no more than cell "trash" remaining after unspecific proteolysis. The bulk of the intracellular "protein degradome" appears to be generated by the proteasomal degradation of functional proteins into 5-22 amino acid (aa) peptides, followed by oligopeptidases cleavage [20,21].
In addition to peptides from functional proteins, some peptide hormones, antimicrobial peptides, etc., released from specific protein precursors by proteolytic cleavage can be found in peptidomes. In addition, the translation of thousands of small open reading frames (smORFs; <100 codons) located on long non-coding RNAs (lncRNAs) or mRNAs was confirmed experimentally and, therefore, is another source of peptides in cellular and secreted peptidomes [22][23][24][25][26][27][28]. However, the abundance of these groups of peptides, their half-life and degradation mechanisms are still poorly understood [29][30][31][32].
Although intracellular peptides were first described in the 1950s [33,34], our understanding of their possible function is still insufficient. For example, peptides presented by 2 of 19 major histocompatibility complexes (MHC) are generated from cellular proteins and play a role as antigens in self-recognition [35]. It was recently shown that previously unannotated "non-canonical" proteins generate major histocompatibility complex I (MHC I)-bound peptides 5-fold more effectively than annotated proteins [36]. As another example, the contribution of alternative open reading frames (altORFs) [37] to shaping the composition of intracellular or secreted peptidomes is still unknown. The translation of such altORFs may be higher than longer protein-coding ORFs [38] and their degradation can make a significant contribution to native peptidomes. In recent years, a growing body of evidence has emerged that biologically active peptides may be hidden in the sequences of functional proteins, and in most cases, the functional activity of these peptides may differ from the respective proteins. Such peptides are called "cryptides" or cryptic peptides [39]. In plants, only three cryptides involved in the immune response have been identified [40][41][42]. There are more examples of mammalian cryptides derived from proteins such as hemoglobin [43][44][45][46], mitochondrial proteins [47], proteasome [48] and others [49].
The antigen presentation of peptides derived from cellular proteins in mammals is an example of the complex roles of peptidomes in promoting cellular homeostasis and the response to external stimulus [50,51]. Furthermore, recent studies have shown that the innate immune system in animals is based on the perception of "proteinaceous" signals both from pathogens and from host cells [52][53][54][55][56]. Plants have a similar system of release and recognition of damage-associated molecular patterns (DAMPs), as well as pathogenassociated molecular patterns (PAMPs) [57]. The receptors involved in this type of danger signaling have been found in a broad range of organisms, from insects and mammals to plants [58]. Stress conditions influence the composition of peptide pools, thereby, resulting in the release of potential antimicrobial agents from functional proteins [15,[59][60][61]. This rapid stress response at the peptidome level based on protein degradation can be considered as a concerted action of the whole peptidome.
Thus, the peptidomes of tissue and biological fluids have a complex nature and should not be considered as just a set of independent functional and non-functional peptides, but as a self-complementing biologically active matter. In this review, we summarize our knowledge about the generation of different types of peptides, their precursors and biological function. In addition, we analyzed selected peptidome datasets (Table 1) from plants [15,16,62], bacteria [63], humans [12,64] and yeast [65] to identify some common trends in their composition and physicochemical properties.

Mechanisms of Peptide Generation from Protein Precursors
The known mechanisms of peptide generation include the specific proteolysis of functional or non-functional proteins by different classes of proteolytic enzymes (proteases), ubiquitin-dependent or independent digestion by proteasomes and the translation of small ORFs into peptides ( Figure 1). Nonribosomal peptides [66] are outside the scope of this review.

Mechanisms of peptide generation from protein precursors
The known mechanisms of peptide generation include the specific proteolysis of functional or non-functional proteins by different classes of proteolytic enzymes (proteases), ubiquitin-dependent or independent digestion by proteasomes and the translation of small ORFs into peptides ( Figure 1). Nonribosomal peptides [66] are outside the scope of this review.

The protease-specific cleavage of protein precursors
The release of peptide hormones during the protease-specific cleavage of corresponding protein precursors at a specific site is well-studied in many organisms [67,68]. The architecture of these, mainly unfunctional, protein precursors is quite similar among plants, mammals, bacteria and yeast. They contain an N-signal sequence, a cleavage site for a certain protease and a functional sequence [69][70][71][72]. Interestingly, the precursors of some human peptide hormones contain other protein-coding sequences along with the bioactive peptides, as in the case of vasopressin and oxytocin [73,74]. Apparently, the specific proteolytic cleavage results in the generation of "peptide ladders" encompassing bioactive amino acid motifs [12,75].
Based on catalytic active sites, all known proteases are divided into five families, such as aspartyl-, cysteine-, metallo-, serine-, and threonine proteases that are well established among different organisms [18]. It has been shown that serine proteases (subtilases) play a pivotal role in the release of peptide hormones in plants and mammals [16,70,76]. For example, subtilase S1P (SITE 1 PROTEASE)/SBT6.1 is responsible for the biogenesis of the RGF/GLV/CLEL and RALF peptide hormones in plants [77]. In mammals, seven subtilisin/kexin-like endoproteases named prohormone convertases (PCs) are responsible for the release of neuropeptides [78]. Precursors of human growth factors are reported to be

The Protease-Specific Cleavage of Protein Precursors
The release of peptide hormones during the protease-specific cleavage of corresponding protein precursors at a specific site is well-studied in many organisms [67,68]. The architecture of these, mainly unfunctional, protein precursors is quite similar among plants, mammals, bacteria and yeast. They contain an N-signal sequence, a cleavage site for a certain protease and a functional sequence [69][70][71][72]. Interestingly, the precursors of some human peptide hormones contain other protein-coding sequences along with the bioactive peptides, as in the case of vasopressin and oxytocin [73,74]. Apparently, the specific proteolytic cleavage results in the generation of "peptide ladders" encompassing bioactive amino acid motifs [12,75].
Based on catalytic active sites, all known proteases are divided into five families, such as aspartyl-, cysteine-, metallo-, serine-, and threonine proteases that are well established among different organisms [18]. It has been shown that serine proteases (subtilases) play a pivotal role in the release of peptide hormones in plants and mammals [16,70,76]. For example, subtilase S1P (SITE 1 PROTEASE)/SBT6.1 is responsible for the biogenesis of the RGF/GLV/CLEL and RALF peptide hormones in plants [77]. In mammals, seven subtilisin/kexin-like endoproteases named prohormone convertases (PCs) are responsible for the release of neuropeptides [78]. Precursors of human growth factors are reported to be embedded in the membranes of vesicles and bioactive peptides can be released by extracellular proteases, such as serine proteases, upon the merging of vesicles with plasma membranes [79]. Vasopressin and oxytocin are derived from their precursors by subtilisin-like prohormone convertases SPC3 [73].
Recently, it was also shown that metalloproteases (referred to as a "sheddases") take a considerable part in the process named "ectodomain shedding" in animals [80]. Through this process, many membrane-bound peptides such as growth factors and cytokines are released in specific conditions [81]. "Shedding" also contributes to signal transmission, liberating intracellular parts of transmembrane proteins into cytoplasm [82].
Another important protease family involved in the release of bioactive peptides is the cysteine proteases. In plants, this family includes papain-like proteases and meta-caspases and participates in the release of some immune peptides [83][84][85][86]. In mammals, cysteine protease cathepsin V produces the neuropeptides enkephalin and neuropeptide Y (NPY) [87].
However, the role of proteases in shaping the whole intra-and extracellular peptidomes still remains poorly studied. Presumably, proteases make a significant contribution to the shaping of the secreted peptidome [16,68]. It was observed that treatment with stress phytohormones triggered the increase in activity of subtilisin-like serine protease, such as P69B, and papain-like cysteine proteases, such as PIP1 and some others in plants [83,88]. Analysis of the N-and C-ends of peptides in the plant secretome under stress conditions also showed the predominance of serine-and metalloproteases [16]. Serine, metalloprotease and cysteine protease activity have also been shown in secreted peptidomes of human bodily fluids [12,68].

The Proteasomal Degradation of Functional Proteins
The proteasomal degradation pathway apparently plays a major role in the formation of intracellular protein "degradome". Proteasomes are multisubunit complexes that are responsible for the degradation of functional proteins in cells. Proteasomal subunits possess caspase-like (β1), trypsin-like (β2) and chymotrypsin-like (β5) proteolytic activities and degrade proteins into 3-25 aa peptides that are subject to further degradation by proteases [20,21]. In a recent study, several thousand peptides associated with proteasomes were identified in human cells [64]. These data were in line with previously published results showing that specific reversible and irreversible proteasome inhibitors, such as bortezomib and epoxomicin, influence the process of the generation and degradation of intracellular endogenous peptides in mammalian cells [89][90][91][92]. These studies clearly showed that thousands of intracellular peptides are a by-product of proteasomal degradation. However, no correlation was found between the number of identified peptides and the abundance of the corresponding precursors in different organisms [64,93]. Nevertheless, the abundance of the intracellular peptides can be influenced by different factors. For example, the stimulation of HEK293 cells with the cytokines TNF-α and IFN-γ for 24 h resulted in changes in the abundance of numerous proteasome-associated peptides [64]. Upon stress conditions, proteasomes in human cells tended to cleave protein precursors of known self-antigens such as histones [64].
It is well established that proteasomal subunits target specific amino acid motifs enriched in negatively charged residues (D, E; caspase-like), hydrophobic residues (W, F, M, Y; chymotrypsin) and positively charged residues (R, K; trypsin-like) [64]. This specific cleavage results in specific compositions of intracellular peptide pools. For example, C-ends of the proteasome-associated peptides were consistent with caspase-like and chymotrypsinlike activities of proteasomes, but not with the trypsin-like activity [64]. Our analysis of terminal amino acids of different peptidome datasets showed that lysine (K) and arginine (R) were among the most represented at C-terminal peptide cleavage sites of the considered peptidomes, except the specific proteasome-associated peptidome of human cells. Wolf-Levy et al. suggest that this is owing to either biological or technical reasons (Figure 2a) [64]. This might indicate that trypsin-like protease activity makes a significant contribution to the shape of native peptidomes. The overall discrepancy in C-terminal amino acids in different datasets may be owing to various reasons, such as nonspecific proteolysis, cutting up the ends of peptides by exopeptidases, technical features of the isolation method or biases of MS analysis technology (Figure 2a).  [15], human [64], yeast [65], cotton [62] and extracellular peptides from human plasma [12], moss [15,16] and bacteria [63]; (b) -Density plot showing the distribution of MS-identified peptides across the precursor lengths in peptidomic datasets from moss [15,16], human [12,64], yeast [65], cotton [62] and bacteria [63]. The positions of each identified peptide were normalized to protein lengths and represented as percentages. The steps of visualizing and analysing the data in all figures are available in the GitHub code repository: https://github.com/IgorFesenko/Peptidome_review.

Properties of mass-spectrometry-based peptidomes
Our understanding of intra-and extracellular peptidomes is tightly coupled with mass-spectrometry (MS) analysis of extracted peptides from tissue and biological fluids. In selected datasets (Table 1), the median length of MS-identified endogenous peptides ranges from approximately 11 to 18 residues and is similar across cellular and secretome datasets from different organisms (Figure 3a).
Peptidomic analysis usually includes the following steps: sample collection, peptide extraction, fractionation, LC-MS/MS analysis, peptide identification and data mining [111]. Therefore, cellular or extracellular peptide pools can be represented as a juxtaposition of peptides generated in tissue or biological fluids in native conditions and the result of postmortem and/or extraction artifacts [112]. In addition, methods of sample preparation [12] and LC-MS/MS analysis can contribute to the predominant identification of peptides with certain physicochemical properties. However, the physicochemical properties of MS-identified peptide pools are poorly studied. It has been previously shown that peptides from the secretome of the moss Physcomitrium (Physcomitrella) patens tended to have less positively charged amino acids than intracellular peptides and contain more hydrophobic amino acids (Figure 3b) [15]. This fact could reflect the properties of membrane or secreted proteins [113] that are, apparently, the main source of peptides in the secretome. However, further experiments and efforts are needed to shed light on this question.
Most of the known plant peptide hormones have been reported to originate from the C-terminus of their respective protein precursor [106][107][108]. On the other hand, peptide hormone precursors from humans and animals often have a multi-domain structure, generating multiple identical, homologous or entirely different functional peptides from different parts of a single precursor [50,70,71,109,110]. It has been previously shown that identified peptides are not evenly distributed across the protein lengths and native intracellular peptidomes are often the N-or C-terminal fragments of the corresponding protein precursors [10].
To determine whether peptides tend to originate from precursor ends, we calculated the frequency of their occurrence across the length of the corresponding proteins in different peptidomes. These data were presented as density plots showing the probability distributions of these frequencies. Indeed, the comparison of different datasets showed that peptides released from the C-or N-terminus tended to be more represented in the intracellular or extracellular peptidomes than we would expect in the case of random cleavage of proteins (Figure 2b). Overall, the degradation patterns of protein precursors from different datasets are similar with the predominance of C-terminal peptides in intracellular and N-terminal peptides in extracellular peptidomes. The unique patterns of human plasma and cotton peptidomes may reflect the technical variability during peptides isolation or the specificity of plant root tissue (Figure 2b).

Properties of Mass-Spectrometry-Based Peptidomes
Our understanding of intra-and extracellular peptidomes is tightly coupled with mass-spectrometry (MS) analysis of extracted peptides from tissue and biological fluids. In selected datasets (Table 1), the median length of MS-identified endogenous peptides ranges from approximately 11 to 18 residues and is similar across cellular and secretome datasets from different organisms (Figure 3a).
Peptidomic analysis usually includes the following steps: sample collection, peptide extraction, fractionation, LC-MS/MS analysis, peptide identification and data mining [111]. Therefore, cellular or extracellular peptide pools can be represented as a juxtaposition of peptides generated in tissue or biological fluids in native conditions and the result of postmortem and/or extraction artifacts [112]. In addition, methods of sample preparation [12] and LC-MS/MS analysis can contribute to the predominant identification of peptides with certain physicochemical properties. However, the physicochemical properties of MS-identified peptide pools are poorly studied. It has been previously shown that peptides from the secretome of the moss Physcomitrium (Physcomitrella) patens tended to have less positively charged amino acids than intracellular peptides and contain more hydrophobic amino acids (Figure 3b) [15]. This fact could reflect the properties of membrane or secreted proteins [113] that are, apparently, the main source of peptides in the secretome. However, further experiments and efforts are needed to shed light on this question.
Indeed, the proteome structure, methods of peptide isolation and identification seem to influence the amino acid composition of MS-identified peptidomes (Figure 3c). These differences can impede the comparative analysis of peptidome datasets from different organisms. In a recent study, peptidomic analysis of HK-2 cells treated with TGF-β1 revealed that the GRAVY indices, indicating the hydrophobicity of the peptide sequence [114], of significantly altered endogenous peptides were mostly below zero, suggesting that most of them were hydrophilic peptides [92]. It seems that the identification of less hydrophobic peptides than expected by chance might be a general trend in peptidomic studies. For example, the GRAVY indices and the proportion of aromatic amino acids were significantly lower in almost all analyzed peptidomic datasets in comparison with sets of random peptides generated from the same proteins by chance (Figure 3d). This could reflect biological trends in the composition of cellular and secretome peptidomes or be a result of biases in sample preparation and LC-MS/MS analysis. For example, very hydrophilic short peptides can be lost during C18 separation [115].
It can be suggested that different groups of precursors can generate peptides with specific properties. For example, the hydrophobicity of human MS-identified peptides from smORFs was significantly higher than those of proteins (Figure 3e). This is in line with recent studies that show that novel adaptive smORFs are prone to containing transmembrane domains [116][117][118]. Therefore, our view of naturally occurring peptidomes, based on MS analysis, may be biased toward peptides with certain physicochemical properties.
It can be suggested that different groups of precursors can generate peptides with specific properties. For example, the hydrophobicity of human MS-identified peptides from smORFs was significantly higher than those of proteins (Figure 3e). This is in line with recent studies that show that novel adaptive smORFs are prone to containing transmembrane domains [116][117][118]. Therefore, our view of naturally occurring peptidomes, based on MS analysis, may be biased toward peptides with certain physicochemical properties.  [15], cotton [62], human [64], yeast [65] and extracellular peptides from human plasma [12], moss [15,16] and bacteria [63] datasets. (b) The percent of positively charged amino acids in peptides from different datasets. All calculations were performed by iFeature tool [119]. (c) Principal component analysis of the physicochemical properties of composition, transition and distribution (CTD) of the peptidomes from different organisms. The 2D plot demonstrates separation of peptidome's amino acid composition in different datasets and clusterization of intracellular and extracellular datasets. All calculations were performed by iFeature tool [119]. (d) A comparison of the GRAVY indices and the proportion of aromatic acids between MS-identified peptides and sets of random peptides from the same proteins. The sets of  [15], cotton [62], human [64], yeast [65] and extracellular peptides from human plasma [12], moss [15,16] and bacteria [63] datasets. (b) The percent of positively charged amino acids in peptides from different datasets. All calculations were performed by iFeature tool [119]. (c) Principal component analysis of the physicochemical properties of composition, transition and distribution (CTD) of the peptidomes from different organisms. The 2D plot demonstrates separation of peptidome's amino acid composition in different datasets and clusterization of intracellular and extracellular datasets. All calculations were performed by iFeature tool [119]. (d) A comparison of the GRAVY indices and the proportion of aromatic acids between MS-identified peptides and sets of random peptides from the same proteins. The sets of random peptides were separately generated from the corresponding precursors for each dataset. All calculations were performed by Biopython [120]. (e) Comparison of the GRAVY indices and the proportion of aromatic acids in mass-spectrometry identified peptides from intracellular proteins [64] and small open reading frames [37] from human. *** p < 10-5 Mann-Whitney U-test. The steps of visualizing and analyzing the data in all figures are available in the GitHub code repository: https://github.com/IgorFesenko/Peptidome_review.

The Functional Protein Precursors of Peptides
Are there specific sets of protein precursors that are the main source of naturally occurring peptides? Are there similar degradation patterns of these precursors in different organisms? According to a conservation analysis of yeast Saccharomyces cerevisiae and mammalian protein precursors, at least 30% of the yeast precursors had orthologs in mammalian peptidomes, such as ribosomal proteins, heat shock proteins and proteins involved in metabolic pathways [65]. The degradation patterns of some of these precursors, for example acyl-Co-A-binding protein, were similar [65]. A comparison of the cellular location of human and yeast precursors showed that most of the identified peptides originated from cytoplasmic proteins and mitochondrial proteins [65]. In addition, a substantial portion of precursors in yeast and human cells constitute nuclear proteins [65]. The GO enrichment analysis of precursors showed that most of them were involved in metabolism, the maintenance of reduction/oxidation balance, translation/protein synthesis, chaperone/protein folding, protein/vesicle trafficking and proteolysis [65].
In plant green tissue, a significant portion of peptides come from chloroplast and mitochondrial proteins, as was shown in the moss Physcomitrium (Physcomitrella) patens peptidomes [15,16,121,122]. In addition, intracellular peptides are derived from proteins involved in photosynthesis, the Calvin cycle, glycolysis and sucrose biosynthesis in P.patens [15]. Precursors of peptides extracted from the roots of Gossypium arboreum after inoculation with Verticillium dahliae also included pathogenesis-related protein STH2, eukaryotic aspartyl protease family protein and histone H2A [62]. Thus, a significant portion of intracellular peptides in different organisms is released from organellar proteins and some housekeeping proteins.
Besides intracellular peptidomes, extracellular peptides have been analyzed in a number of studies [12,15,16,63]. The precursors of Lactococcus lactis bacterial secreted peptides belonged to extracellular, intracellular and transmembrane proteins [63]. Peptides were also released from a stable pool of precursor proteins, and the presence of peptides from intracellular proteins in the extracellular space were not related to the lysis process [63]. Among cytoplasmic protein precursors, proteins such as acetolactate synthase, bifunctional acetaldehyde CoA/alcohol dehydrogenase and ribosomal protein RpsT have been identified. Peptides were also released from cytoplasmic proteins, the secretion of which has been shown for many bacteria, such as glyceraldehyde-3-phosphate dehydrogenase, enolase, elongation factor TU, chaperone protein DnaK and pyruvate dehydrogenase E1 component beta subunit [63].
According to a recent study, the majority of precursors in the human plasma peptidome belong to secreted or cell membrane proteins [12]. In addition, the precursor proteins were from mitochondria, Golgi apparatus, endoplasmic reticulum and different vesicles. The GO enrichment analysis showed that these precursor proteins participate in muscle filament sliding, platelet degranulation/activation, exocytosis, glucose metabolic process and secretion by the cell [12]. Among identified peptides, known peptide hormones and growth factors released from the corresponding non-functional precursors were also found [12].
In the moss P.patens secretome, peptides from membrane and secreted proteins, lipoproteins, pectinesterase-related proteins and cucumsin-a subtilisin-like serine protease -were identified [15,16]. According to the GO enrichment analysis of the moss secreted precursors, most of the proteins were involved in the modification of the cell wall (pectin degradation), extracellular or extrinsic membrane proteins. In addition, proteins participating in photosynthetic reactions, including some chloroplast-coding proteins, such as photosystem I and photosystem II proteins and RUBISCO subunits, were identified [15,16].
Using BLAST similarity search (E-value < 0.00001; identity > 60%), we found orthologs of precursors from different peptidomic datasets (Table 1). According to our results, the most common protein precursors that had orthologs in plant, human, yeast and bacteria datasets were ATP synthase subunit from mitochondria, glyceraldehyde-3-phosphate dehydrogenase, elongation factor 1-alpha, enolase, heat shock protein, actin, adenosylhomocysteinase, 60S ribosomal protein, S-adenosylmethionine synthetase, fructose-bisphosphate aldolase, histone H2B. Several of the identified homologous precursors have given rise to similar degradation patterns in phylogenetically distant species, as in the case with actin from moss P. patens (Figure 4a) and human (Figure 4b). On the contrary, contrast patterns were observed for mitochondrial ATP synthase subunit (Figure 4c) and glyceraldehyde-3-phosphate dehydrogenase (Figure 4d) from moss P. patens, human, yeast S. cerevisiae, bacteria L. lactis and cotton G. arboreum. Taken together, published data indicate that the generation of peptide pools appears to be a more deliberate process than chaotic degradation and that the conserved proteins tend to produce stable pools of natively occurring peptides from similar regions (Figure 4). It may be speculated that peptides from functional proteins are generated in the two-step degradation process, in which precursors are primarily divided into relatively large fragments, presumably by proteasomes, followed by further proteolysis into smaller structurally related peptides [93].  A substantial portion of peptides in peptidomes of different organisms originated from proteins with unknown functions. However, it is currently unknown if such peptides are prone to be a source material for further selection and evolution into bioactive peptides ("Raw material" in Figure 1).

Biological Function of Different Peptide Groups
According to the mechanism of their generation, bioactive peptides can be divided into several groups: peptide hormones and stress-regulating peptides that are released from functional or non-functional protein precursors by specific proteases; those that are derived from functional proteins through proteasomal degradation or by non-specific proteases; and peptides/microproteins, translated directly from small open reading frames (Figure 1). Each group of peptides demonstrates specific activity, for example, through binding to a specific receptor or interacting with functional proteins or small molecules.
Although the number of known plant bioactive peptides is significantly less than in animals, they have been shown to be important regulators of numerous cellular processes [19,69,106,[146][147][148][149][150]. Plant peptide hormones regulate growth and development along with known non-peptide hormones [108]. The most common peptide involved in immune and stress signaling that was found in different plant species is plant elicitor peptide (PEP) [151]. It was shown that PEPs are cleaved from their precursors by metacaspases under an influx of Ca2+ in the cytosol as a rapid response to wounding or pathogen attack [86].
Peptide hormones act as ligands for cognate receptors in various organisms, thereby activating cascades of downstream reactions, including protein phosphorylation, and induce the expression of corresponding genes [136,152]. For example, in bacteria, virulence factor production is regulated through the detection of cyclic autoinducing peptides (AIP) by cell-surfaced histidine kinase AgrC [153][154][155]. Overall, the pheromone-receptor systems in Gram-positive bacteria are divided into the following groups: the RNPP (Rap, NprR, PlcR, and PrgX) family of regulators; agr-type cyclic peptides recognized by a twocomponent signal transduction system (TCSTS), consisting of a histidine kinase, AgrC, and a cytoplasmic response regulator AgrA; the Gly-Gly-type peptide family also recognized by TCSTS, for example, competence-stimulating peptides (CSPs) and their receptors ComD; and the Rgg regulators family, binding sex pheromones, such as sigX-inducing peptide (XIP) and its receptor ComR [125].
In yeast, a mating peptide pheromone "a-factor", the binding of which to a specific receptor Ste3 induces mating processes, was reported to be cleaved out from its precursor by a conserved zinc metalloprotease Ste24, the homologs of which have been found in mammals [126].
In animals, the membrane proteins, referred to as G-protein-coupled receptors (GPCRs), make up the superfamily of receptors responsible for binding with the corresponding peptide ligands and transducing the signal into the cell [163][164][165]. The known network of peptide ligands and GPCRs spans 407 interactions between 219 peptides and 138 receptors in human [166]. For example, growth factors are recognized by specialized receptor tyrosine kinases, such as epidermal growth factor receptors (EGFRs) [167] and platelet-derived growth factor receptor alpha (PDGFRα) [168]. The endothelin signaling peptides bind to their respective endothelin receptors ETA, ETB1, ETB2 and ETC [163].
Another group of biologically active peptides-cryptides-that are derived from functional proteins have been found in different organisms [39]. In plants, there are several examples, such as immune peptide GmSubPep (Glycine max Subtilase Peptide), derived from subtilisin-like protease, or an immune peptide CAPE1 (CAP-derived peptide 1) from PR1 protein, and a defense peptide inceptin, cleaved from a plant ATP synthase in larvae of Spodoptera frugiperda [40][41][42]. These peptides participate in immune responses. Examples of mammalian cryptides are also known, such as mitocryptide-1 cleaved out from cytochrome c oxidase, which acts as an activator of neutrophils [47], or a peptide hidden in the sequence of proteinase activated receptor 1 (PAR1) named parstatin, with an antagonizing activity to its precursor [169]. A number of cryptides have been discovered, which were cleaved out from hemoglobin precursors not only in blood or the heart, but also in brain tissue [43][44][45][46]. Another example of a known cryptide is a short peptide named EL28, hidden in the 19S ATPase regulatory subunit 4 sequence, increased in abundance upon interferon treatment in human cells. This peptide influences the activities of proteasomes in vitro and was reported to increase the effect of interferon in cells [48]. A peptide derived from histone H2B type 1-H, a PepH, was found in the human brain tissue of schizophrenia patients. It was shown that it participates in protection from cell death [170]. Another example is the peptide Pep5 derived from cyclin D2, which influences cell death in different types of tumor cells [171,172].
Depending on the location and the type of transcripts, smORFs can be classified as short CDSs, intergenic-smORFs, lncRNA-smORFs, or upstream and downstream smORFs [117]. Most of the intergenic smORFs are probably not translated and non-functional [173]. Nevertheless, smORFs have been shown to be a source of functional peptides, regulating key processes in cells [117,174]. Well-studied examples of functional peptides or microproteins encoded by short CDSs are some classes of antimicrobial peptides (AMP), which have been found in a range of organisms from bacteria to plants and animals [175]. Such peptides possess specific physicochemical properties, such as a positive net charge, promoting disruption of cell membrane [176]. In mammals, cysteine-rich β-defensins and histidinerich histatins are the most studied examples of such peptides [177,178]. Plants also have homologs of mammalian defensins that are encoded by short CDS [179]. Peptides encoded by lncRNAs are the least studied component of peptide pools, but this group may potentially include thousands of peptides [174]. The functional analysis of peptides encoded by lncRNA transcripts was mainly performed on animals [117]. For example, a 46 aa myoregulin (MLN) interacts with sarcoplasmic reticulum Ca2+-ATPase (SERCA) protein in the membrane of the sarcoplasmic reticulum and regulates Ca2+ handling in muscles [180]. Another example is a 53 aa conserved peptide HOXB-AS3, encoded by lncRNA HOXB-AS3, that suppresses colon cancer (CRC) growth [181]. In comparison with animals, the functions of peptides encoded by lncRNAs in plants are not well-studied. There are exam-ples of plant smORF-encoded peptides characterized to date. These are 36-aa POLARIS (PLS) [182], 53-aa ROTUNDIFOLIA4 (ROT4) [183], 51-aa ROT18/DLV1 [184], EARLY NODULIN GENE 40 (ENOD40; 12-, 24-aa) [185], 25-aa KISS OF DEATH (KOD) [186] and 10-aa OSIP108 [187]. Recently, four lncRNA-encoded peptides were characterized in the model plant Physcomitrium (Physcomitrella) patens [27]. The overexpression or knockout of these peptides affects plant growth, suggesting their growth-regulating functions [27]. Thus, smORF-encoded peptides may constitute a significant part of cellular and secreted peptidomes, and further studies are needed to understand the abundance, properties, lifetime and functions of such peptides.
Peptidomes may be a source of molecules for a rapid response to stress or pathogen attack. For example, novel peptides with potential antimicrobial activity derived from functional proteins were found in moss cells and secretomes treated with stress hormones [15,60]. Recent data also indicate that organellar proteases are responsible for the regulation of the generation of stress-signaling peptides [61]. The knockout of oligopeptidases PreP1/2 and OOP triggered the accumulation of peptides, activating a defense response in Arabidopsis thaliana [61]. A similar effect has been demonstrated in mice by knockout of thimet oligopeptidase (THOP1), which is reported to be a downstream participant of MHC-bound antigen peptides generation after proteasomal cleavage [104].

Conclusions
Recent progress in mass-spectrometry-based analysis has expanded our knowledge/view of the composition of intra-and extracellular peptidomes. Besides thousands of newly identified peptides, peptidomic data indicate that intracellular and extracellular degradation of functional proteins is not random and bioactive peptides may be embedded in their sequences. Analysis of the degradation patterns of conserved proteins from different organisms allows us to speculate on the inevitable nature of this process. However, mass-spectrometry analysis has some disadvantages for full peptidome characterization, such as (1) problems with the detection of low-abundance peptides; (2) bias towards the detection of only peptides with certain physicochemical properties; (3) incomplete genome annotations, which require further improvements; and (4) difficulty in correctly identifying modified native peptides. Therefore, further progress is needed to improve the detection of naturally occurring peptides and exclude artifacts during sample preparation. Even more important is the development of approaches for the identification and functional analysis of previously uncharacterized components of cellular and secreted peptidomes.

Data Availability Statement:
The data analyzed in this study are openly available, references are cited in Table 1.

Conflicts of Interest:
The authors declare no conflict of interest.