Plant Cell Wall Proteomes: The Core of Conserved Protein Families and the Case of Non-Canonical Proteins

Plant cell wall proteins (CWPs) play critical roles during plant development and in response to stresses. Proteomics has revealed their great diversity. With nearly 1000 identified CWPs, the Arabidopsis thaliana cell wall proteome is the best described to date and it covers the main plant organs and cell suspension cultures. Other monocot and dicot plants have been studied as well as bryophytes, such as Physcomitrella patens and Marchantia polymorpha. Although these proteomes were obtained using various flowcharts, they can be searched for the presence of members of a given protein family. Thereby, a core cell wall proteome which does not pretend to be exhaustive, yet could be defined. It comprises: (i) glycoside hydrolases and pectin methyl esterases, (ii) class III peroxidases, (iii) Asp, Ser and Cys proteases, (iv) non-specific lipid transfer proteins, (v) fasciclin arabinogalactan proteins, (vi) purple acid phosphatases and (vii) thaumatins. All the conserved CWP families could represent a set of house-keeping CWPs critical for either the maintenance of the basic cell wall functions, allowing immediate response to environmental stresses or both. Besides, the presence of non-canonical proteins devoid of a predicted signal peptide in cell wall proteomes is discussed in relation to the possible existence of alternative secretion pathways.


Introduction
Plant cell walls are an important cell compartment playing critical roles in development as well as biotic and abiotic stresses. During cell growth, the so-called primary cell walls contain intricate networks of polysaccharides (90-95% of the total mass), cell wall proteins (CWPs) (5-10%), nutrient minerals in the apoplast, which can be defined as the soluble fraction of the extracellular matrix, as well as aromatic compounds in some plants, such as monocots and bryophytes [1]. At the end of growth, secondary walls can be synthesized. Covalent cross-linkings involving either hemicelluloses such as glucuronoarabinoxylans and lignin monomers, or structural proteins such as extensins reinforce the cell wall structure [2].
In primary walls, the main polysaccharides are pectins, hemicelluloses and cellulose. Pectin molecules are of three types [3]: (i) homogalacturonans (HGs), which are secreted as methylesterified molecules and can be demethylated in muro by pectin methylesterases (PMEs) to form the so-called egg box structures after ionic interaction with calcium ions [4]; type I rhamnogalacturonans (RGI); and type II rhamnogalacturonans (RGII), which form dimers with boron ions. Major hemicelluloses can be xyloglucans in dicot plants, glucuronoarabinoxylans in monocots or mannans in bryophytes [1,5,6]. Finally, cellulose is the main load-bearing polymer present in all cell walls. Cellulose molecules are the simplest polymers in cell walls. They are constituted of linear chains of (1-> 4)-β-Dglucose organized in microfibrils, which are synthesized by cellulose synthases at the plasma membrane [7].
The capacity of the cell wall to expand or to be modified relies on the activities of numerous CWPs. For example, the local interactions at the level of biomechanical hotspots between cellulose microfibrils and hemicelluloses, such as xyloglucans, can be modified by expansins, thus determining the loosening capacity of cell walls [8]. Class III peroxidases (CIII Prxs) can polymerize phenolic molecules, such as lignin monomers or tyrosine residues of structural proteins, such as extensins [9]. Besides, signaling molecules, such as peptides or oligogalacturonides, can be released from proteins or polysaccharides thanks to cell wall hydrolase activities [10,11]. These external signals are perceived by plasma membrane receptors which transmit the information to the inside of the cell, thus triggering regulatory mechanisms involved in development or in response to environmental cues. These few examples highlight some of the roles played by CWPs.
Proteins which were not predicted to be secreted were identified in all the cell wall proteomes characterized so far. They were named non-canonical CWPs and could have been considered as contaminant proteins [12,13]. Alternative secretory routes have been described in bacteria and mammals. They were grouped under the unconventional protein secretion (UPS) pathways. The proteins following these routes are leaderless and share particular features, such as amino acid content, secondary structure or disordered regions [14,15]. The question of the existence of such alternative secretion pathways in plants is still a matter of debate.
The diversity of CWPs were revealed since the 2000s with the development of dedicated cell wall proteomics studies [16]. These studies were boosted by the description of plant genomic sequences, starting with that of Arabidopsis thaliana [17], in parallel with the development of mass spectrometry (MS)-based identification of proteins [18]. Nowadays, the strategies for isolation of proteins from cell walls and their identification are well-established [16,19]. New cell wall proteomes are described, thus allowing drawing a general picture. The aim of this article is to (i) provide an update on plant cell wall proteomics, (ii) define a core cell wall proteome comprising the protein families which are conserved in 13 yet described cell wall proteomes of dicot and monocot plant species, and (iii) discuss the case of the non-canonical proteins devoid of a predicted signal peptide which have been identified in all the cell wall proteomes.

An Overview of the Selected Cell Wall Proteomes
For this analysis, we have selected proteomic studies from 13 plant species, corresponding to 36 independent studies (Table 1). For a given plant, the cell wall proteome, as considered in this article, encompasses all the CWPs identified at least once in at least one organ or in cell suspension cultures. Among the selected plants, there are one bryophyte (Marchantia polymorpha), eight dicots (A. thaliana, Linum usitatissimum, Medicago sativa, Populus spp, Solanum lycopersicum, S. tuberosum, Gossypium hirsutum and Camellia sinensis) and four monocots (Saccharum officinarum, Triticum aestivum, Oryza sativa and Brachypodium distachyon). Different organs have been analyzed (thallus, hypocotyls, root, stem, leaf, or fruit) as well as cell suspension cultures and their culture media. A few experiments deal with the exposure to environmental constraints, such as temperature stress [20][21][22], salicylic acid treatment [23], β-aminobutyric acid treatment [24], phosphate starvation [25] or pathogen infection [26]. All these proteomes were chosen because most of them have been obtained in similar experimental conditions (Section 3), they have a minimal size of 100 CWPs and the available data have allowed a new expert annotation of all the identified proteins and their sorting into CWPs or presumed intracellular contaminants (Section 3). All of them, except for T. aestivum [27], can be found in WallProtDB-2 (https://www.polebio.lrsv.ups-tlse.fr/WallProtDB/) (accessed on 6 April 2022). [28]. The number of CWPs of the selected proteomes varies from 106 (L. usitatissimum) to 989 (A. thaliana) ( Table 1). a. Numbers 1-4 refer to the protocol used to study the extracellular proteome ( Figure 1). Figure 1. The four types of protocols (1-4) which have been used to study the extracellular proteome of plants. They can be qualified as non-destructive (1,2), or destructive (3,4), depending on whether they start with a grinding step or not.

How to Define CWPs and to Explore Cell Wall Proteomes?
The fact that cell walls are open compartments is a major difficulty for the preparation of cell wall fractions devoid of intracellular contaminants. From a historical point of view, two main strategies have been used: (i) the recovery of extracellular fluids after vacuum infiltration as a "non-destructive protocol" [33]; and the purification of cell walls followed by the elution of proteins with salt solutions, as a "destructive protocol" established for A. thaliana etiolated hypocotyls [30,31].
Then four main strategies were used for different plant and various organs [16] (Figure 1): non-destructive protocols involving either (1) a vacuum-infiltration step of plant tissues or (2) the analysis of culture media; or destructive protocols starting with (3) the purification of a cell wall fraction, followed by extraction of the proteins with salt solutions or (4) the isolation of N-glycoproteins from a total protein extract through Concanavalin A (ConA) affinity chromatography. This latter strategy is based on the fact that extracellular proteins are routed through the secretory pathway where many of them become N-glycosylated [61]. All these approaches have proven to be complementary and their combination has allowed enlarging the coverage of cell wall proteomes [29,48]. The steps of protein separation or protein identification could also vary [16]. However, they tend to be more and more similar with the development of shotgun mass spectrometry (MS) analyses by LC-MS/MS [34]. Altogether, it is now reasonable to investigate the different proteomes in order to (i) define a core cell wall proteome and (ii) identify proteins possibly directed to the extracellular space through alternative secretion pathways. The next step was to identify bona fide CWPs among the identified proteins. Indeed, the presence of proteins well-described as intracellular proteins, such as proteins participating in protein synthesis has been reported in nearly all the cell wall proteomics studies (Section 5).
The proteins present in the apoplast and in the cell wall are assumed to be secreted through the secretion pathway thanks to a signal peptide which targets them to the reticulum endoplasmic during their biosynthesis. Several bioinformatics programs can be used to predict which proteins could be found in the extracellular space, such as TargetP [62], SignalP [63], Phobius [64], Predotar [65] or LocTree3 [66]. Besides, it is possible to predict the presence of trans-membrane domains indicating a localization at the plasma membrane or an anchoring on its external side through a glycosylphosphatidylinositol (GPI)anchor. Databases or bioinformatics programs, such as Aramemnon [67], TMPred [68], TMHMM [69], PredGPI [70] or GPI-SOM [71], can be used to this end. The ProtAnnDB annotation tool collects such predictions for 21 plant species [72].
Other proteins expected to be intracellular have also been identified in cell wall proteomes (Section 5). They could be considered as contaminant proteins or as non-canonical CWPs. However, one cannot exclude the existence of alternative routes of secretion which have been demonstrated in bacteria and in mammals for which dedicated software has been designed (Secretome P) [14].
In all the cell wall proteomes included in this study, we have chosen to consider proteins as CWPs if (i) a signal peptide could be predicted by at least two different bioinformatic programs, (ii) no ER retention signal could be predicted and (iii) less than two trans-membrane domains could be predicted, or if an experimental work already showed that proteins of the same family were located in the extracellular space. Note that signal peptides can be predicted as trans-membrane domains by some bioinformatic programs since they share common properties such as the presence of stretches of hydrophobic amino acid residues. We are thus left with three categories of CWPs: (i) those having a predicted signal peptide; (ii) those having both a predicted signal peptide and a GPI-anchor; and (iii) those which have experimentally been proven to be extracellular. In addition, we have considered proteins having an extracellular domain possibly interacting with ligands, such as peptides or oligosaccharides; a predicted trans-membrane domain, and a predicted kinase cytoplasmic domain. As receptor kinases, such proteins play critical roles in the transfer of information from the outside of the cell to its inside [73][74][75].
Since we want to analyze different cell wall proteomes, it is necessary to homogenize the functional annotation of the CWPs. This precaution will avoid relying on automatic annotations based on sequence comparisons which can be misleading. All the proteins selected as CWPs were re-annotated according to the presence of domains such as PROSITE [76], Pfam [77] or InterPro [78].

A Core Cell Wall Proteome: The Conserved CWPs Families and Their Possible Roles in Cell Walls
The systematic re-annotation of CWPs after the presence of functional domains has allowed grouping them into nine functional classes [12], which have been found in various proportions in the cell wall proteomes of the 13 studied plant species: • Proteins acting on cell wall carbohydrates (PACs) belong to the major functional class in all the cell wall proteomes accounting for up to 25% of the CWPs. It comprises expansins [79] as well as glycosyl hydrolases (GHs), carbohydrate esterases (CEs) such as pectin methylesterases (PMEs) and polysaccharide lyases (PLs). The description of the latter protein families can be found in the Carbohydrate-Active enZYmes Database (CAZyDB, http://www.cazy.org) (accessed on 6 April 2022) [80]. • Oxido-reductases (ORs) include class III peroxidases (CIII Prxs), blue copper binding proteins, berberine bridge oxido-reductases, multicopper oxidases and laccases. The CIII Prxs and blue copper binding proteins are described in the Redoxibase (https: //peroxibase.toulouse.inrae.fr) (accessed on 6 April 2022) [81] and the two latter protein families are included in CAZyDB.

•
Structural proteins, such as hydroxyproline-rich glycoproteins (HRGPs), are scarcely represented in cell wall proteins because many of them are covalently cross-linked in cell walls and thus difficult to extract. A study has particularly succeeded in the identification of several extensins, Pro-rich proteins and leucine-rich extensins by using a dedicated protocol including a trypsin digestion applied directly on cell walls [90].
• Miscellaneous proteins include proteins which cannot be classified into the other groups. Among others, they include dirigent proteins [91], purple acid phosphatases [92], phosphate-induced (phi) proteins (EXORDIUM-like proteins) [93] and germins [94]. • Proteins of unknown function can represent more than one tenth of the cell wall proteomes, suggesting new functions or new biological activities yet to be described.
As mentioned, each of these functional classes includes several protein families. By comparing the 13 selected cell wall proteomes, it is possible to identify protein families which are present in all or in most of them (Appendix A). They are described in the two following paragraphs: proteins acting on cell wall carbohydrates belonging to the major functional class (Section 4.1, Figure 2) and proteins belonging to the other functional classes (Section 4.2, Figure 3).  . Schematic representation of the activities of diverse proteins belonging to the core cell wall proteome. The protein families have been grouped according to their known biological activities. Proteases are assumed to play roles in protein maturation, release of signaling peptides and protein degradation (top left of the scheme). DUF642 proteins and lectins interact with cell wall polysaccharides but their precise roles are not known (middle left part of the scheme). Several protein families could play roles in signaling (bottom left of the scheme): LRR proteins and lectins could interact with other proteins, and in particular with the extracellular domains of plasma membrane receptors, thus leading to the transduction of a signal to the cell; fasciclin arabinogalactan proteins (FLAs) are also assumed to play a role in signaling. Dirigent proteins, germins, thaumatins and purple acid phosphatases (PAPs) have diverse activities (center of the scheme, Section 4.2 for details). Oxido-reductases (multicopper oxidases, berberine-bridge oxido-reductases (BBEs) and class III peroxidases (CIII Prxs)) play multiple roles in the cell wall. In particular, CIII Prxs can cross-link structural proteins or phenolics compounds, and they contribute to the regulation of reactive oxygen species (ROS) which are involved in signaling or in non-enzymatic cleavage of polysaccharides (central part of the scheme). LTPs and GDSL lipases could play roles in the formation of cuticle (right side of the scheme). Some LTPs are localized at the surface of the plasma membrane thanks to GPI anchors and participate in the transport of lipids to the cuticle layer. LTPs have also been shown to play a role at the interface between the hydrophilic cell wall polysaccharides and the hydrophobic cuticle layer.

Proteins Acting on Cell Wall Carbohydrates
These protein families can be distinguished on the basis of their carbohydrate substrates. They have been grouped according to their known or predicted substrates: hemicelluloses, pectins or glycans of glycoproteins ( Figure 2).
A set of enzymes can act on hemicelluloses. GH16 are xyloglucan endotransglucosylases/hydrolases (XTHs). They were initially described as having xyloglucan-xyloglucan donor/acceptor substrate activities. However, it was later shown that they could accept other substrates such as cellulose or mixed-linkage (1,3;1,4)-β-D-glucans [95][96][97]. Molecular modelling had suggested that they could also modify arabinoxylans in Poaceae [97]. These findings allow assuming that they could play critical roles in remodeling the cellulose/hemicellulose networks in cell walls of both monocot and dicot plants. As an example, the xth21 mutant of A. thaliana exhibited a dwarf phenotype most probably resulting from a defect in the growth of the primary root [98]. This mutant also showed a decrease in the average mass of xyloglucans and in cellulose content, suggesting the role of the cellulose/xyloglucan network in the elongation of the cell wall.
Another group of enzymes can hydrolyze or modify pectin molecules. GH27 and GH28 hydrolyze galactomannans and homogalacturonans, respectively [99]. The A. thaliana QRT3 (QUARTET3) gene was shown to encode a polygalacturonase and the corresponding mutant exhibited defect in pollen mother cell wall degradation resulting in the defect in microspore separation [101]. GH35 could act on the arabinan side-chains of pectins or on the O-glycans of AGPs although some of them could also act on xyloglucans [99]. PMEs operate the demethylesterification of homogalacturonans, thus revealing negative charges which allow the formation of the egg box structures with calcium ions [4]. The A. thaliana atpme3 mutant was shown to have an increased number of adventitious roots together with an increase in the degree of HG methylesterification, thus suggesting the importance of changes in the pectin structure for adventitious root emergence [102].
Finally, a set of enzymes can hydrolyze the Nor the O-glycans of glycoproteins. They belong to GH families 18, 19 and 38 [103]. The O-glycans of AGPs were assumed to be substrates of GH19 as one of the few cell wall molecules carrying glucosamine or N-acetylglucosamine [104]. In the same article, it was shown that an incubation of an AGP fraction purified from carrot cells with an endochitinase of the GH19 family lead to the release of oligosaccharides. GH18 and GH19 were also described as chitinases/lysozymes playing roles during plant-microorganism interactions [105,106].
GH32 are cell wall acidic invertases. They cleave sucrose into glucose and fructose which can be uploaded by cells by hexose transporters. They are involved not only in phloem unloading and in the development of non-photosynthetic organs, but also in plant defense reactions [107,108].

The Other Conserved Protein Families
Apart from the proteins acting on cell wall carbohydrates, several protein families are also conserved ( Figure 3). Several families of extracellular proteases are well conserved in cell wall proteomes, such as Asp proteases, Cys proteases and Ser proteases. The roles of these proteins have begun to be discovered in A. thaliana. The AtSBT1.4, AtSBT1.7 and AtSBT4.13 subtilisins were shown to release the signaling peptide CLE40 (Clavata3/Endosperm Surrounding Region 40) from a preprotein [109]. CLE40 is involved in the regulation of stem cell differentiation. Such extracellular proteases may also play roles in protein maturation as AtSBT1.6 for PMEs [83]. The SDD1 (Stomatal Density and Distribution 1) subtilisin negatively regulates the formation of stomata in A. thaliana, most probably through peptide signaling, although its substrate has not yet been identified [110]. Besides, the A. thaliana extracellular CDR1 (Constitutive Disease Resistance) Asp protease was assumed to mediate disease resistance through a signaling peptide [111]. Most prob-ably, all these proteolytic activities are modulated by proteases inhibitors which are also found as conserved protein families in cell walls.
Among the ORs, CIII Prxs represent large plant gene families, with, for example, 73 members in A. thaliana and 189 in M. polymorpha (https://peroxibase.toulouse.inrae.fr) (accessed on 6 April 2022). They play major roles in plant cell walls by (i) generating reactive oxygen species (ROS) involved in signaling and in nonenzymatic cleavage of polysaccharides, or by regulating the level of H 2 O 2 , thus contributing to cell wall stiffening by cross-linking structural proteins such as extensins or monomers of lignins [9]. This latter role could also be played by laccases, such as LACCASE5 in B. distachyon culms [112]. Besides, an A. thaliana laccase (TRANSPARENT TESTA10) was shown to be involved in the polymerization of flavonoids in the seed coat [113]. The role of multicopper oxidases is more puzzling. The A. thaliana SKU5 (SKEWED5) gene was shown to be involved in root directional growth [114]. Mutants impaired in SKS11 and SKS12 (SKU SIMILAR11 and 12) showed alteration in pollen tube integrity, growth and guidance as well as some alteration in polysaccharide composition [115]. No enzymatic activity has been demonstrated yet for the encoded proteins. Finally, the role of berberine-bridge enzyme-like proteins start to be understood thanks to the characterization of the enzymatic activity of the A. thaliana OGOX1-4 (oligogalacturonide OXIDASE 1-4) proteins [116]. They oxidize OGs which are less hydrolysable by fungal PGs and have reduced ability to activate immune response. However, no specific role has yet been demonstrated during plant development.
Several protein families related to lipid metabolism could be identified in most cell wall proteomes. Several roles have been proposed for non-specific lipid transfer proteins (LTPs) [117]. They have been assumed to contribute to the transfer of lipids which are hydrophobic molecules through the hydrophilic cell wall [118]. Indeed, A. thaliana mutants impaired in LTPG2 or in LTPG1 and LTPG2 exhibit an alteration in cuticular wax composition in stems and siliques [119]. LTPG1 and LTPG2 are predicted to be GPIanchored proteins. LTPs have also been shown to be involved in the adhesion of the cuticular layer on the hydrophilic primary cell wall [120]. Several roles were proposed for GDSL lipases/acylhydrolases [121]. The tomato GDSL1 was shown to be involved in the deposition of cutin in the cuticle of tomato fruits [122]. Indeed, the silencing of GDSL1 leads to the appearance of nanopores in isolated fruit cutins and to a reduction in ester bond cross-links. An A. thaliana mutant impaired in GELP77 exhibits shrunken pollen grains which stick together, suggesting a role of GELP77 in pollen grain wall formation [123]. More recently, GDSL lipases/acylhydrolases were assumed to also be involved in suberin degradation [124].
Among the miscellaneous proteins, dirigent proteins (DIRs) are assumed to be involved in lignan and in lignin biosynthesis. They have no known enzymatic activity, but they would control the regio-and stereoselectivity of bimolecular phenoxy radical coupling [91]. As an example, the A. thaliana AtDIR10 protein was shown to be essential for the establishment of the lignin-based Casparian strips in roots [125]. Several types of enzymatic activities have been associated to germins and germin-like proteins: manganese superoxide dismutase (SOD), oxalate oxidase (OXO) or ADP glucose pyrophosphatase/phosphodiesterase (AGPPase) [126,127]. Thaumatins and thaumatin-like proteins belong to the large pathogenesis-related protein family (PR proteins) and are also called PR-5 [128]. Most of them exhibit an anti-fungal activity and their genes are induced upon biotic stress. They might also have allergenic properties. Extracellular purple acid phosphatases (PAPs) are phosphohydrolases able to cleave Pi from organic Pi-esters that are inaccessible to root cells in soils, for example [92]. The predominant A. thaliana PAPs (AtPAP12 and AtPAP26) were identified in several cell wall proteomes [22,31,32,129] and both proteins were isolated from the culture medium of cell suspensions cultures [130].
Fasciclin arabinogalactan proteins (FLAs) are assumed to be involved in the interactions between the cells and their environment in the same way as mammalian proteins carrying fasciclin domains (FAS1) [131]. Some of them are located at the plasma membrane surface thanks to the presence of a GPI-anchor as experimentally demonstrated for AtFLA4 and AtFLA12 [132,133]. They could also be released in the cell wall after GPI-anchor cleavage. AtFLA4 was assumed to interact with pectin molecules and to contribute to the biomechanical properties of the cell wall [131]. FLAs were also found to be present in the so-called G-layer of tension wood. In particular, mutants impaired in AtFLA11 and AtFLA12 exhibit reduced tensile strength and stiffness [134]. In this case, interactions between FLAs and cellulose microfibrils were suspected. Furthermore, in the functional class comprising signaling molecules, proteins with leucine-rich repeats (LRRs) are found in all cell wall proteomes. Their role is not clear but they could interact with other proteins or with peptides. Such interactions have been reported for the LRR domains of AtLRX2 and AtLRX8 interacting with the rapid alkalinization factor 4 (RALF4) signaling peptide [135].
The DUF 642 (domain of unknown function 642, InterPro domain IPR006946) proteins were initially identified as major proteins in the cell wall proteome of A. thaliana etiolated hypocotyls [31]. The DUF 642 domain is frequently associated with a galactose-binding-like domain (InterPro domain IPR008979). Different roles were proposed, such as a structural role as lectin-like proteins interacting with cell wall polysaccharides [136] or a role in the regulation of PME activity [137].

What about the Non-Canonical Proteins Identified in Cell Wall Proteomes?
All the published proteomes characterized from purified cell walls, extracellular fluids or cell suspension culture media contain proteins which are not expected to be secreted. These proteins have now been included in a new version of the plant cell wall proteome database called WallProtDB-2 (https://www.polebio.lrsv.ups-tlse.fr/WallProtDB/) (accessed on 6 April 2022) to allow obtaining an overview of their predicted sub-cellular localization and biological activity. Apart from the 4292 proteins considered to be bona fide CWPs (Section 3), WallProtDB-2 now contains 6462 proteins presumed to be intracellular and identified in apoplastic fluids or among proteins extracted from purified cell walls (Table 2). These proteins are assumed to be non-canonical CWPs. To our knowledge, this is the first time that this information has been collected. a. The proteome of P. patens has not been included in this study because of its small size. b. The proteome of B. oleracea has not been considered in this work since this is a xylem sap proteome.
In the following, 12 cell wall proteomes have been taken into account (Table 2). Altogether, they comprise 6425 presumed contaminants proteins. The B. oleracea and the P. patens proteomes have been excluded because the former is a xylem sap proteome and the latter is very small one.
A very high number of domains could be predicted in the proteins presumed to be contaminant: 1575 Pfam (https://xfam.org/) (accessed on 6 April 2022) and 3024 IPR (https://www.ebi.ac.uk/interpro/) (accessed on 6 April 2022) domains (Appendix A). This result shows the huge diversity of these proteins. One third of the Pfam domains (560) were only present in one protein whereas 6 domains were shared by more than 50 proteins ( Figure 5A). Similar results were observed for IPR domains with 938 domains (about one third) only present in one protein and 36 domains present in more than 50 proteins (Appendix A). The number of proteins sharing a given domain increases with the number of presumed contaminants in a given cell wall proteome. Figure 5B illustrates the case of proteins predicted to have a IPR ribosomal domain. Among these domains, there are (i) structural domains such as PF00076 (RNA recognition motif) shared by 174 proteins and IPR016040 (NAD(P)-binding domain) shared by 315 proteins or (ii) domains corresponding to a biological activity such as PF00012 (Hsp70 family) shared by 67 proteins, and IPR013766 (thioredoxin domain) shared by 159 proteins (Appendix A). The top 20 most represented Pfam domains describing a biological activity are listed in Table 3. None of these functions have already been described in the extracellular space.  The frequent identification of certain proteins in cell wall proteomes may have different explanations: (i) they could exhibit specific features allowing them to strongly interact with cell wall components during the purification of cell walls, for example, the histones (61 entries in 7 plant species, PF00125, IPR007125), which are basic proteins like most CWPs [12]; (ii) they could be very abundant proteins such as ribosomal proteins (altogether 578 entries in 8 plant species); or (iii) secreted through alternative secretory pathways. For some protein families, there is no clear hypothesis regarding their presence in many cell wall proteomes: e.g., thioredoxin (e.g., PF00085 with 117 occurrences in 12 plant species), heat-shock proteins (e.g., PF00012 with 67 proteins in 12 plant species), glyceraldehyde 3-phosphate dehydrogenase (PF02800 and PF00044 with 46 and 45 proteins in 11 and 10 plant species, respectively), lactate/malate dehydrogenase (PF02866 and PF00056 with 49 proteins in 10 plant species) and cyclophilin type peptidyl-prolyl cis-trans isomerase (42 proteins in 9 plant species). Finally, these proteins could be moonlighting ones, being present in different cell compartments and having different functions in each of them [138]. As an example, two non-specific lipid transfer proteins of A. thaliana, AtLTP2 and AtLTP4, have been localized in both the cell wall and chloroplasts [120,132].
As mentioned above, UPS pathways have been described in bacteria and mammals. In plants, the best documented example of the presence of leaderless proteins in the apoplast is probably that of the leaderless jacalin-related lectin of Helianthus annuus (Helja): it has been identified in extracellular fluids [139], and in extracellular vesicles [140], and it has been immunolocalized in the extracellular matrix [139]. Another example is that of the cytoplasmic mannitol dehydrogenase which has been immunolocalized in cell walls upon a salicylic treatment [23]. As for mammalian cells, four main UPS pathways have been proposed in plants [13]: a direct ER to plasma membrane traffic, plasma membrane transporter channels, secretory lysosomes, and multivesicular bodies (MVBs) leading to exosome secretion. Besides, exocyst positive organelles (EXPOs) with a double membrane have been characterized in A. thaliana and in Nicotiana tabacum cells [141]. Exocysts are proteins mediating the fusion between post-Golgi vesicles and the plasma membrane, thus allowing the release of proteins in the extracellular space. All these pathways are resistant to brefeldin A which disrupts the ER-Golgi vesicular traffic. However, it must be stressed that additional work has to be done to better define what is presently called extracellular vesicles (EVs) and to identify specific markers to allow comparing different studies [142].
Recent research has been devoted to EVs in A. thaliana and H. annuus upon pathogen infection or in response to salicylic acid treatment [140,143], and in Nicotiana benthamiana upon viral infection [144]. These vesicles contain proteins involved in plant defense reactions, in membrane trafficking; among which are proteins with or without predicted signal peptides. They have also been shown to deliver small RNAs to fungal pathogens [145] and viral components in the cell wall [144]. Whether these EVs are EXPOs and whether plants produce different kinds of EVs remain to be determined [142,146].
Unfortunately, no bioinformatic program similar to SecretomeP has yet been designed for plant proteins (Section 3). In this bioinformatic program, it is assumed that proteins present in extracellular spaces share common features whatever the route of secretion [14]. Such a tool would be useful to help sort the proteins devoid of a predictable signal peptide and focusing experimental work on them to demonstrate their actual presence in apoplastic fluids or in cell walls.

Conclusions
Altogether, the large amount of data accumulated during the last twenty years allows drawing a detailed picture of the cell wall proteome. A set of conserved protein families is present in all of them. Besides, the composition of the cell wall depends on the plant species, with differences between bryophytes, Poaceae and dicots [1,6]. However, the same protein families can be identified in all the cell wall proteomes characterized thus far. The current hypothesis is that they are either required for basic cell wall functions, quick answers to environmental stresses or in combination. As shown in this article, this collection of CWPs could (i) manage the rearrangement of the networks of cell wall polysaccharides; (ii) contribute to protein turnover, protein maturation of release of biologically active peptides; or (iii) play roles in signaling. In addition, they may be involved in the regulation of the symplastic transport. Studying additional cell wall proteomes would contribute to obtaining an even more precise description of the core proteome and scale it down to the organ level.
The question of the presence of unexpected leaderless proteins, the non-canonical proteins, in cell wall proteomes need to be further examined with a more precise description of the extracellular vesicles mostly observed upon pathogen infections. Additional experimental work has to be performed to demonstrate the presence of the unexpected proteins in extracellular spaces with their detection with specific antibodies or sub-cellular localization using fluorescent proteins. It is doubtful that all these proteins are bona fide CWPs. Many of them are most probably present as contaminants since the procedures used to extract extracellular fluids or to purify cell walls exhibit many drawbacks, notably due to the fact that the cell wall is an open compartment. The information provided in this article regarding the proteins families identified in most cell wall proteomes can provide clues to select candidates for testing their actual sub-cellular localization.
The next challenges for the cell wall proteomics studies will be a better description of the CWP post-translational modifications, a better knowledge of protein half-lives, and the design of methods to increase the cell wall coverage. Indeed, the known cell wall proteomes lack heavily O-glycosylated proteins, such as AGPs, or covalently-linked proteins, such as extensins or proline-rich proteins. Besides, peptidomics have to be developed to obtain an extensive description of the peptides present in cell walls which are key to understanding the signaling mechanisms through cell walls which are involved in developmental processes and responses to environmental cues [89]. Finally, the integration of transcriptomics and proteomics data will be critical to fully understanding the fine regulation of expression of the genes encoding CWPs.