Proteogenomic Characterization of the Cement and Adhesive Gland of the Pelagic Gooseneck Barnacle Lepas anatifera

We focus on the stalked goose barnacle L. anatifera adhesive system, an opportunistic less selective species for the substrate, found attached to a variety of floating objects at seas. Adhesion is an adaptative character in barnacles, ensuring adequate positioning in the habitat for feeding and reproduction. The protein composition of the cement multicomplex and adhesive gland was quantitatively studied using shotgun proteomic analysis. Overall, 11,795 peptide sequences were identified in the gland and 2206 in the cement, clustered in 1689 and 217 proteinGroups, respectively. Cement specific adhesive proteins (CPs), proteases, protease inhibitors, cuticular and structural proteins, chemical cues, and many unannotated proteins were found, among others. In the cement, CPs were the most abundant (80.5%), being the bulk proteins CP100k and -52k the most expressed of all, and CP43k-like the most expressed interfacial protein. Unannotated proteins comprised 4.7% of the cement proteome, ranking several of them among the most highly expressed. Eight of these proteins showed similar physicochemical properties and amino acid composition to known CPs and classified through Principal Components Analysis (PCA) as new CPs. The importance of PCA on the identification of unannotated non-conserved adhesive proteins, whose selective pressure is on their relative amino acid abundance, was demonstrated.


Introduction
Goose barnacles are filter-feeding marine crustaceans that live attached by the stalk to a fixed hard substrate, or to floating objects, by means of an adhesive secretion. The adhesive secretion is produced at the cement gland located at the top of the peduncle core, beneath the capitulum [1,2]. The adhesive secretion is conducted to the peduncle's base through ducts, where it is released, allowing to holdfast the specimens in an adequate position to meet the parameter needed for their survival under a variety of hydrologic regimes according to the species ecology (i.e., in oceanic or coastal habitats, submersed or intermittently immersed in intertidal zones, in protected overhangs, crevices, in the deep sea, or directly exposed to strong waves) [3,4]. Indeed, barnacles' cement multicomplex has been evolutionarily optimized to attach the base firmly to wet substrates, conferring plasticity and resistance, which has been inspiring for industrial applications (e.g., antifouling strategies, induction on demand of larvae settlement and fixation for aquaculture, The cement apparatus of the pelagic barnacle L. anatifera (Figure 1b) is found at the top of the peduncle's core, formed of a single type of adhesive-secreting unicellular gland clusters, mostly just below the mantle cavity, and a network of ducts that coalesce and carry the adhesive to the base of the peduncle [1,2,25]. In ripen individuals, some adhesive secreting cells are intermingled with the ovary, but most of them are located between the ovary and the capitulum. The presence of large nucleoli in the nucleus and the large amounts of rough endoplasmic reticulum in the cytoplasm of these cells suggest an intense protein synthesis. The cytoplasm of the adhesive-secreting cells also features numerous small electron-dense secretory vesicles, which stained positively for proteins (tetrazonium), polysaccharides (PAS), but not for the presence of lipids (Sudan black), in histological studies [1]. Contrarily, on barnacles' cyprids, the adhesive is reported to be a bi-phasic system containing lipids and phosphoproteins, the two distinct phases contained in two different kind of granule, at cyprid cement gland cells [26]. Furthermore post-translational modifications do not seem to play a role in adult barnacles' adhesion except for the glycosylation of MR52k [27], which is in line with the positive PAS dying observed of the electron-dense secretory vesicles in the gland cells cytoplasm.
The combination of transcriptomic and proteomic approaches has resulted in a powerful tool for a high throughput discovery of barnacle's CPs [15]. Thus, this study was focused on the proteogenomic characterization of L. anatifera (Figure 1) adhesion system. The protein composition of L. anatifera cement multicomplex and cement gland was quantitatively studied using label-free LC-MS proteomic analysis combined with bioinformatics approaches for protein identification and classification. The proteogenomic analyses applied allowed to identify the known CPs both in the proteome and gland transcriptome, as well as a group of unannotated proteins. As previously described in P. pollicipes, some unannotated proteins identified in the cement proteome of L. anatifera were abundant. The principal componen analyses revealed some of them as new CPs, since after PCA they grouped with clusters of canonical CPs previously described, either bulk or surface coupling proteins, due to the se lective pressure for conservation of relative amino acids composition in each CP group [15]  The cement apparatus of the pelagic barnacle L. anatifera (Figure 1b) is found at the top of the peduncle's core, formed of a single type of adhesive-secreting unicellular gland clusters, mostly just below the mantle cavity, and a network of ducts that coalesce and carry the adhesive to the base of the peduncle [1,2,25]. In ripen individuals, some adhesive secreting cells are intermingled with the ovary, but most of them are located between the ovary and the capitulum. The presence of large nucleoli in the nucleus and the large amounts of rough endoplasmic reticulum in the cytoplasm of these cells suggest an intense protein synthesis. The cytoplasm of the adhesive-secreting cells also features numerous small electron-dense secretory vesicles, which stained positively for proteins (tetrazonium), polysaccharides (PAS), but not for the presence of lipids (Sudan black), in histological studies [1]. Contrarily, on barnacles' cyprids, the adhesive is reported to be a bi-phasic system containing lipids and phosphoproteins, the two distinct phases contained in two different kind of granule, at cyprid cement gland cells [26]. Furthermore, post-translational modifications do not seem to play a role in adult barnacles' adhesion, except for the glycosylation of MR52k [27], which is in line with the positive PAS dying observed of the electron-dense secretory vesicles in the gland cells cytoplasm.
The combination of transcriptomic and proteomic approaches has resulted in a powerful tool for a high throughput discovery of barnacle's CPs [15]. Thus, this study was focused on the proteogenomic characterization of L. anatifera (Figure 1) adhesion system. The protein composition of L. anatifera cement multicomplex and cement gland was quantitatively studied using label-free LC-MS proteomic analysis combined with bioinformatics approaches for protein identification and classification. The proteogenomic analyses applied allowed to identify the known CPs both in the proteome and gland transcriptome, as well as a group of unannotated proteins. As previously described in P. pollicipes, some unannotated proteins identified in the cement proteome of L. anatifera were abundant. The principal component analyses revealed some of them as new CPs, since after PCA they grouped with clusters of canonical CPs previously described, either bulk or surface coupling proteins, due to the selective pressure for conservation of relative amino acids composition in each CP group [15]. Moreover, these proteins lacked annotation and/or conserved domains, sharing some physico-chemical features with CPs, e.g., molecular weight, isoelectric point, hydrophobicity, amino acid relative composition, secondary structure composition, and protein disorderliness [15][16][17]. This finding allowed to conclude that some unannotated proteins identified here, as well as those previously discovered in P. pollicipes, are indeed new canonical CPs, and some possibly belonging to other CPs families not yet defined, as they are very abundant in the adhesive, but did not cluster with the canonical groups characterized so far. Our findings reaffirm the limited knowledge we have on barnacles' CPs diversity, as well as the urgent need for a functional nomenclature for barnacles CPs, to replace the existing one based on their molecular weight.

Protein Identification
The shotgun proteomic approach employed to profile the proteome of the cement gland and secreted cement itself, of the pelagic gooseneck barnacle L. anatifera (Figure 1), allowed to identify 11,795 peptide sequences in the gland (Table S1) and 2026 peptide sequences in the cement (Table S2). After filtering (contaminants, "only identified by site" and REV_ removal), a total of 4128 proteins clustered in 1689 proteinGroups prevailed in the gland proteome (Table S3), and 530 proteins clustered in 217 proteinGroups in the cement (Table S4). Altogether, 4403 unique proteins were identified, 255 of which shared by the gland and cement, comprising nearly 50% of the total unique proteins (Figure 2a), as previously described for P. pollicipes [13]. Of the 3308 proteins identified in the three replicates, 3095 and 311 were found of the gland and cement samples, accounting for 1219 and 132 proteinGroups, respectively, whilst only 98 proteins overlapped all replicates analyzed, accounting for only 31.5% (Figure 2b; Tables S5 and S6), a much smaller relative amount than when all proteins are considered (48%). Among the proteins identified in the three samples, 3308 were non-redundant, being 2997 unique proteins in the gland proteome and 213 exclusive of the cement (Figure 2b). The original MaxQuant output files containing all proteins identified (proteinGroups), without filtering, can be found at Table S7 for the  gland and at Table S8 for the cement. amino acid relative composition, secondary structure composition, and protein disorderliness [15][16][17]. This finding allowed to conclude that some unannotated proteins identified here, as well as those previously discovered in P. pollicipes, are indeed new canonical CPs, and some possibly belonging to other CPs families not yet defined, as they are very abundant in the adhesive, but did not cluster with the canonical groups characterized so far. Our findings reaffirm the limited knowledge we have on barnacles' CPs diversity, as well as the urgent need for a functional nomenclature for barnacles CPs, to replace the existing one based on their molecular weight.

Protein Identification
The shotgun proteomic approach employed to profile the proteome of the cement gland and secreted cement itself, of the pelagic gooseneck barnacle L. anatifera (Figure 1), allowed to identify 11,795 peptide sequences in the gland (Table S1) and 2026 peptide sequences in the cement (Table S2). After filtering (contaminants, "only identified by site" and REV_ removal), a total of 4128 proteins clustered in 1689 proteinGroups prevailed in the gland proteome (Table S3), and 530 proteins clustered in 217 proteinGroups in the cement (Table S4). Altogether, 4403 unique proteins were identified, 255 of which shared by the gland and cement, comprising nearly 50% of the total unique proteins (Figure 2a), as previously described for P. pollicipes [13]. Of the 3308 proteins identified in the three replicates, 3095 and 311 were found of the gland and cement samples, accounting for 1219 and 132 proteinGroups, respectively, whilst only 98 proteins overlapped all replicates analyzed, accounting for only 31.5% (Figure 2b; Tables S5 and S6), a much smaller relative amount than when all proteins are considered (48%). Among the proteins identified in the three samples, 3308 were non-redundant, being 2997 unique proteins in the gland proteome and 213 exclusive of the cement (Figure 2b). The original MaxQuant output files containing all proteins identified (proteinGroups), without filtering, can be found at Table S7 for the gland and at Table S8 for the cement.

Quantitative Proteomic Analyses
Protein expression in the gland and cement was determined as absolute protein abundance using an intensity Based Absolute Quantification (iBAQ) score calculated by

Quantitative Proteomic Analyses
Protein expression in the gland and cement was determined as absolute protein abundance using an intensity Based Absolute Quantification (iBAQ) score calculated by MaxQuant (Tables S5 and S6, respectively). Both gland and cement proteome, showed a similar profile to other barnacles, mainly to the P. pollicipes proteome, which were studied using the same methodology [15]. Likewise, the composition of the L. anatifera cement gland was dominated by proteins involved in muscle and cytoskeleton motility, accounting for 71.2% (Figure 3a). The majority corresponded to actin, myosin, troponin, and tropomyosin, including other contractile and structural proteins (Figure 4a). In addition, proteins involved in "adhesion, extracellular matrix and membrane" corresponded approximately to 6.4% of total expression (Figure 3a), being heparan sulphate proteoglycan, papilin, and collagen the most expressed within this functional group (Figure 4a). The group of proteins involved in "protein synthesis and modification" accounted for 5.1% of total expression, similarly to group of "stress response and detoxification proteins" (5.2%), mostly constituted by heat shock proteins (HSPs) that had an important representation. Proteases (1.7%) where also quite well represented, being serine proteases and trypsin the most expressed. The group of proteins that remained uncharacterized or unannotated accounted for 1.4% of total expression ( Figure 3a). Minor components such as chemical cues, proteinase inhibitors, immune and defense and, cuticle proteins were also detected in the cement gland.
The canonical barnacle's cement proteins were not detected in the quantitative analyses of the cement gland at the proteomic level ( Figure 4a), but some bulk proteins such as CP100k (ATB53757.1; AGS19349.1) and CP52k (ATB53756.1) were found at relatively high expression in the gland transcriptome (Table S9). Not very differently, in P. pollicipes only CP100k was detected through the proteogenomic analyses performed and at a very low expression level [15]. The absence of the canonical CPs in the gland at proteomic level and the relative high abundance of its encoding transcript, could be related with the sensitivity of the methodology and the relative abundances. In addition, the translation to proteins in the gland could be further lowered than transcription, once the barnacles are established and fixed to the substrate, being the production of some CPs reduced or down-modulated in the gland, both in pelagic species and species inhabiting rocky intertidal systems [28]. Moreover, it has been demonstrated that the synthesis of the permanent adhesives only occurs during the early cyprid stage [16,29]. However, a low level of protein production must always be necessary to repair eventual detachment due to hydrodynamism, and to provide for displacement to occur. Indeed, barnacles in development can periodically secrete primary cement to achieve firm attachment [30], but once adhered, in many species adult barnacles can neither move freely on the surface nor actively detach from the substrate [16]. Relocation of adult P. pollicipes along the substrate after settlement, but mainly by juvenile along the stalk, was confirmed by Kugele and Yule [30], and also in acorn barnacles by Moriarty and coauthors [31]. Contrarily, L. anatifera is unable to relocate voluntarily; no evidence of relocation of animals from the capitulum to the substratum, or base of host animals was lacking [30], which may explain such extreme down-regulation of CPs production at the gland in this species.
On the contrary, cement proteome was dominated by barnacle's cement canonical proteins (CPs), and in minor amount by unannotated and uncharacterized proteins, chemical cues, protease inhibitors and adhesion, matrix, and membrane proteins ( Figure 3b). Among canonical proteins, bulk proteins CP100k and then CP52k were the most expressed, contrarily to P. pollicipes, in which the most expressed bulk CP was CP52k, and only then CP100k [15]. Regarding surface coupling proteins, CP43k was the most expressed in L. anatifera, followed by CP19k with less expression. By contrast, P. pollicipes showed CP19k as the most expressed surface coupling protein, while the CP43k was not even represented in the proteome [15]. Unannotated and uncharacterized proteins accounted for 7.9% of total proteins ( Figure 3b). It should be noted that six proteinGroups classified as unannotated or uncharacterized were listed among the 30 most represented proteins on the cement ( Figure 5). Due to their high abundance in the cement, we suspected that these proteins might be functional adhesive proteins belonging to previously characterized canonical CP families or to other families never detected or characterized before, either bulk or interfacial proteins, or having even a different function/location from those previously described.
ol. Sci. 2021, 22, x FOR PEER REVIEW 6 of 20 before, either bulk or interfacial proteins, or having even a different function/location from those previously described.
As discussed above, holdfast is essential for the survival of cirripedes, provided by the cement secretion, whose properties, for their importance to survival, must be evolutionary selected according to the species ecology. Herein, the CPs composition of L. anatifera is described for the first time, a cosmopolitan species found in a variety of floating substrata adrift in the ocean, or fixed but swinging or slightly moving with currents, in conclusion, a little selective species for the fixation substrate.     Other components were related to "protease inhibitors" (2%), "cuticle" (1.2%), and minor components "muscle and cytoskeleton motility", "protein biosynthesis and modification", and "stress response detoxification", in this order (Figure 3b), whereas some proteins found in the quantitative analyses remained with unknown function, "unannotated" and "uncharacterized", which will be discussed below.

Unannotated Proteins of the Cement Proteome
Proteins without annotation, uncharacterized, or just predicted, were found to be abundant-8.0%-in the cement proteome (Figures 3b, 4b, and 5). Some of those proteins were also found in the gland at proteomic (Figures 3a and 4a, Table S5) and transcriptomic level (Table S9). To figure out the biological function of such proteins, some additional analyses were performed. A total of 132 proteinGroups were blasted against the Non-Redundant protein database (nr at NCBI) using automatic adjustment of the BLASTp program. Of all, a total of 19 proteinGroups remained unannotated, without any protein homology description, or known conserved domains (Table S10). The results of these analyses were also included in the figures previously shown, and detailed information of Blast search and protein sequences can be found in Table S10. As discussed above, holdfast is essential for the survival of cirripedes, provided by the cement secretion, whose properties, for their importance to survival, must be evolutionary selected according to the species ecology. Herein, the CPs composition of L. anatifera is described for the first time, a cosmopolitan species found in a variety of floating substrata adrift in the ocean, or fixed but swinging or slightly moving with currents, in conclusion, a little selective species for the fixation substrate.
In addition, other abundant proteins in the cement proteome were associated with "chemical cues" (Figure 3b), among them MULTIFUNCin and issp-6 were the most represented (Figures 4b and 5). MULTIFUNCin is a multifunctional glycoprotein cue previously found in another barnacle's cement [15,18,32]. This glycoprotein is seemingly involved in habitat selection (settlement) by conspecific barnacle larvae, adhesion and defense [32]. On the other hand, issp-6 (S10) is a protein member of hemolymph juvenile hormone binding (IPR010562) family of proteins, [15]. This protein family is related to the juvenile hormone pathway, which is mainly involved in metamorphoses and development in cyprids [33]. Settlement inducing protein complex proteins (SIPC) are glycoproteic chemical cues that were found to be very abundant in the rocky shore goose barnacle adhesive, where they represent 3.2% of total proteins [15], but not found in the adhesive of ocean drifting species, despite being present in the gland. In replacement, L. anatifera features issp-6 in its cement multicomplex (Figure 4). Despite both species being gregarious, chemical cues are much more represented in P. pollicipes adhesive (12.1%), than in L. anatifera one (3.5%), possibly related to the strategy of each species to thrive, one moves in the ocean to meet the patches of larvae ready to settle, the other is sessile and needs to attract larvae to settle over, by means of producing larger amounts of chemical cues.
In addition, proteins involved in "adhesion, matrix, and membrane" and "protease inhibitors" were also relatively abundant in the cement, with approximately 2.9% and 2% of relative abundance, respectively (Figure 3b). The enzyme lysyl oxidase was among the 30 most expressed proteins ( Figure 5). This enzyme was also abundant in P. pollicipes cement proteome [15], and their active role in attachment demonstrated through proteomic and enzymatic approaches in in the adhesive layer of adult Amphibalanus amphitrite barnacles [18]. Lysyl oxidase was assigned to the modification of cement components, likely involved in lysine/arginine protein cross-linking, but also in collagen's and elastin fibrils' cross-linking [18].
Other components were related to "protease inhibitors" (2%), "cuticle" (1.2%), and minor components "muscle and cytoskeleton motility", "protein biosynthesis and modification", and "stress response detoxification", in this order (Figure 3b), whereas some proteins found in the quantitative analyses remained with unknown function, "unannotated" and "uncharacterized", which will be discussed below.

Unannotated Proteins of the Cement Proteome
Proteins without annotation, uncharacterized, or just predicted, were found to be abundant-8.0%-in the cement proteome (Figures 3b, 4b and 5). Some of those proteins were also found in the gland at proteomic (Figures 3a and 4a, Table S5) and transcriptomic level (Table S9). To figure out the biological function of such proteins, some additional analyses were performed. A total of 132 proteinGroups were blasted against the Non-Redundant protein database (nr at NCBI) using automatic adjustment of the BLASTp program. Of all, a total of 19 proteinGroups remained unannotated, without any protein homology description, or known conserved domains (Table S10). The results of these analyses were also included in the figures previously shown, and detailed information of Blast search and protein sequences can be found in Table S10.
Afterwards, a Principal Component Analyses (PCA) was conducted on the unannotated and uncharacterized proteins, together with known cement adhesive proteins of various species ( Figure 6; Table S11) to observe clustering. PCA used the relative residue composition (%) of 19 barnacle specific cement proteins obtained in the present study in L. anatifera cement proteome, which were classified under "unannotated" and "uncharacterized" proteins, to compare to 53 previously identified, classified and characterized cement specific proteins of various barnacle species, gathered from NCBI and literature, belonging to 8 different barnacle species (P. pollicipes, A. amphitrite, A. improvisus, A. eburneus, Fistulobalanus albicostatum, Megabalanus rosa, M. volcano, and Tetraclita japonica). The analysis allowed to observe the clustering patterns of the unannotated and uncharacterized proteins, with the groups of proteins previously defined [17] (Figure 6). The two first principal components (PC1 and PC2) extracted by the PCA explained 43.61% of the total data variation (26.63% and 16.98%, respectively), allowing to observe proteins grouping as a function of the relative amino acid composition ( Figure 6). PC1 discriminated G1 from the other two groups, while PC2 allowed for the separation of CP20k (G2) from the other two groups (G1 and G3). Regarding three CPs of A. eburneus, CP36k, -22k, and -7k identified by Naldrett and Kaplan [34], one of them, AE_36k did not group with any CPs group, similarly to other 5 unannotated cement protein that did not group neither with G1, G2, nor G3.
The PCA situated four of the unannotated proteins (DN61611, DN67416, DN65601, and DN69760) in the proximity of G1 (surface coupling proteins which includes CP19k, -43k, -58k, and -68k families of cement proteins), two proteins (DN61926 and DN67538) in the vicinity of G2 (surface coupling proteins of CP20k family), and two proteins (DN56671 and DN64372) near G3 proteins (bulk proteins of the families CP52k and -100k). Of the remaining 11 proteins, 6 (DN72668, DN62022; DN64031, DN53050, DN70620_c0, and DN70620_c1) clustered with AE_22k and AE_7k forming a new 8 protein cluster, while 5 of them (DN73117, DN58739, DN69827, DN62666, and DN63562) did not cluster with any group at all. Whether the cement proteins that did not cluster with the previously defined G1, G2, and G3 groups have an adhesive function or a different function in the cement multicomplex, it is yet to be determined, since the techniques herein used do not allow to determine that. It is known that they are barnacle cement specific, because they were indeed identified in samples of cement and do not have homology to any other proteins of the non-redundant protein database at NCBI. Moreover, some of these proteins are quite abundant, as for instance the DN73117, which is the fourth most represented protein in the cement, and DN70620_c0 and DN70620_c0, which were the 9th and the 13th. This makes us suspect that at least this very represented unannotated proteins might have a function that has to do with the very function of the cement itself, which is to adhere, or else to give structure to the cement.
One of the cement proteins picked from NCBI, AA52-3L, was misclassified according to the PCA analysis performed. Based on the relative amino acid composition of this A. amphitrite protein, PCA situates it in G1 ( Figure 6), but according to the authors, it is as a bulk protein, CP52k-like [12]. In the case of being a bulk protein, it should group with G3 proteins, instead of G1. Since this protein was used to automatically annotate the P. pollicipes PP_52k-L identified and annotated in a previous work [15] and the LA_52k-L3 (LA_DN66462) in the present work, this two proteins were also misclassified. The two proteins are smaller in length and lighter than CP52k proteins, and their physico-chemical properties also corroborate that they are surface coupling proteins of G1 (Tables 1, S11 and S13).
Regarding other characteristics of cement proteins, these are presented on Table S11  and Table 1, the former presenting the characteristics of the 53 adhesive proteins of various barnacle species previously characterized, whose sequences were available at NCBI, and the latter, the characteristics of the 19 proteins found in L. anatifera cement proteome, which could not be annotated by homology, nor conserved domain found, and the 9 canonical CPs found, including the one which was automatically annotated as being a LA_52k-L3. Three of the four proteins that clustered with group G1 surface coupling proteins were found to be disordered (>55% disorder), presenting a great percentage of its structure in the form of loops (>50%); more than 48% of their residues exposed, and having less than 5% of intermediate residues, agreeing with G1 protein characteristics [15]. Their isoelectric point, aliphatic index, and the aromatic, positive and negative, residues percentage also fall in the range of G1 proteins, as well as the negative hydropathic index. Most of the characteristics of the two proteins that have clustered with G2 surface coupling proteins also correspond to the characteristics of this group, particularly the degree of disorder and the pI. The high hydropathy index, low disorder, high aliphatic index and high content of aromatic residues and the percentage of loops between 40 and 50% are characteristics of G3 that the two proteins that clustered with this group have. It is a novelty to find CP20k proteins in stalked barnacles, since this group of proteins has been described to be exclusive of acorn barnacles with a calcified base, being located at the interface between cement and the calcareous base, a structure that pedunculate barnacles do not have, and characterized as a calcite-specific coupling protein [11,35,36]. So far, CP20k had never been described in membranous-base barnacles, either pedunculate or membrane-base acorn barnacles, such as T. japonica [37]. Table 1. Characteristics of the nineteen unannotated proteins (upper part of the table) identified in Lepas anatifera cement proteome through PCA and nine annotated ones (lower part of the table) using automatic adjustment of the BLASTp program through automatic BLAST against the Non-Redundant protein database (nr at NCBI). No. Res-number of residues; MM-molecular mass; pI-isoelectric point; Neg. res.-negative residues (sum of Asp and Glu); Pos. res.-positive residues (sum of Arg, His, and Lys); Incomplete sequence proteins are presented in italic. In very dark grey are the proteins that have clustered with G2 surface coupling proteins of CP20k kind, in dark grey are those that clustered with bulk proteins (G3), in light grey those that have clustered with G1 of surface coupling proteins, and in very light grey are the proteins that cluster with two previously identified CPs in Amphibalanus eburneus: CP7k and -22k. ¥ -protein misannotated as CP-52k-L3, since as it may be seen by PCA analysis (Figure 6), it clusters with surface coupling proteins (G1), and not with bulk proteins (G3). a Instability index (II)-provides an estimate of the stability of the protein in a test tube, depending on the presence of certain dipeptides [38], the occurrence of which is significantly different in the unstable proteins compared with those in the stable ones. A protein whose instability index is smaller than 40 is predicted as stable, a value above 40 predicts that the protein may be unstable. b GRAVY-Grand Average of Hydropathy-The GRAVY value for a peptide or protein is calculated as the sum of hydropathy values [39] of all the amino acids, divided by the number of residues in the sequence. Values define relative hydrophobicity of amino acid residues, the more positive the value, the more hydrophobic in the amino acids located in that region of the protein. c Aliphatic index of a protein is defined as the relative volume occupied by aliphatic side chains (alanine, valine, isoleucine, and leucine). It may be regarded as a positive factor for the increase of thermostability of globular proteins [40]. d Protein disorder-percentage of disordered regions as compared to the total protein sequence length predicted by Meta-Disorder [41].

Sampling, Protein Solubilization, and Extraction
Six immature Lepas anatifera (Pedunculata: Scalpellomorpha) specimens (<12 mm in rostro-carinal length) were collected through scuba diving from an oceanographic buoy in Gournes, Crete, Greece, in October 2017. Three replicas composed of 2 individuals each were used for proteomic analysis. Individuals with undeveloped ovaries were selected owing to the proximity of the ovary to the cement gland, to avoid contamination of gland samples with ovary. Animals were brushed to clean epibionts, transported to the laboratory on ice, further swept with an ethanol-soaked cellulose cloth and dissect upon arrival. The cement gland was located according to previous studies [1,2,25]. The tissues collected were kept frozen at −80 • C until protein homogenization and extraction, in SDT buffer (2% SDS, 100 mM Tris/HCl pH 7.6, 0.1 M DTT) according to Campos et al. [42]. Tissues were first homogenized using ultrasounds (Vibra Cell, Sonics, and Materials) at 60 Hz intensity, then mechanically disrupted using microbeads (Precelly's, Bertin instruments, Montigny-le-Bretonneux, France), followed by incubation in SDT for 14 h with agitation (450 rpm) in a thermomixer at room temperature. Samples were then centrifuged at 16,000× g for 20 min, the supernatant collected, the protein concentration determined by spectrophotometry (Synergy HT, BioTek, Winooski, Vermont,) at 280 nm, and stored at −80 • C until further analysis [15].

LC-MS/MS Analyses
Provided lysates were incubated for 30 min with 5 mM tris(2-carboxyethyl)phosphine (TCEP) at 56 • C. The solution was brought to 10 mM TCEP and 10 mM methyl methanethiosulfonate (MMTS) for 15 min, to reduce and protect cysteine residues, respectively [43]. Protein purification, protein digestion, and peptide purification were performed according to a slightly adapted Single-Pot Solid-Phase-enhanced Sample Preparation (SP3) protocol [44,45]. Sequencing grade trypsin (Promega, Fitchburg, WI, USA) was added at a ratio of 1:50 w/w in 50 mM HEPES, pH 8. After overnight incubation at 37 • C, beads containing the digested peptides were slightly acidified using 10% formic acid (FA), shaken, and incubated overnight at room temperature, after raising the acetonitrile concentration to at least 95%. Adsorbed peptides were washed once with pure acetonitrile (ACN) and then air dried. They were eluted in the first step with 20 µL 2% DMSO for 30 min, and in the second step with 20 µL 0.065% FA, 500 mM KCl in 30% acetonitrile for 30 min. Peptides were vacuum dried and dissolved in 0.2% trifluoroacetic acid/3% ACN for subsequent ultracentrifugation (50,000× g, 30 min, RT). LC-MS/MS analyses of purified and desalted peptides were performed on a Dionex UltiMate 3000 n-RSLC system, connected to an Orbitrap Fusion TM Tribrid TM mass spectrometer (Thermo Scientific, Waltham, MA, USA). Peptides of each sample were loaded onto a C18 precolumn (3 µm RP18 beads, Acclaim, 0.075 mm × 20 mm), washed for 3 min at a flow rate of 6 µL/min, and separated on a C18 analytical column (3 mm, Acclaim PepMap RSLC, 0.075 mm × 50 cm, Dionex, Sunnyvale, CA, USA) at a flow rate of 200 nL/min via a linear 120 min gradient from 97% MS buffer A (0.1% FA) to 25% MS buffer B (0.1% FA, 80% ACN), followed by a 30 min gradient from 25% MS buffer B to 62% MS buffer B. The LC system was operated with the Chromeleon software (version 6.8, Dionex, Sunnyvale, CA, USA) embedded in the Xcalibur software suite (version 3.0.63, Thermo Scientific). The effluent was electro-sprayed by a stainlesssteel emitter (Thermo Scientific). Using the Xcalibur software, the mass spectrometer was controlled and operated in the "top speed" mode, allowing the automatic selection of as many doubly and triply charged peptides in a 3 s time window as possible, and the subsequent fragmentation of these peptides. Peptide fragmentation was carried out using the higher energy collisional dissociation mode and peptides were measured in the ion trap (HCD/IT).
MaxQuant parameters for protein identification were MS and MS/MS tolerances of 20 ppm and 0.5 Da, respectively; two missed tryptic cleavages were allowed; PSMs were accepted at a 1% false discovery rate (FDR) and trypsin was selected for protein cleavage. The modification of cysteine by MMTS (methylthiolation) was set as a fixed modification, while oxidation of methionine and acetylation of protein N-terminus were chosen as variable modifications. Protein quantification was based on approximate absolute protein abundance an intensity Based Absolute Quantification (iBAQ) score calculated by MaxQuant. Venn diagrams were used to identify the shared proteins among the majority proteins of each replicate and figures were built using an online free tool, available at the webserver of the Bioinformatics and Evolutionary Genomics Center (BEG/Van de Peer Lab site, Ghent University, Belgium, http://bioinformatics.psb.ugent.be/webtools/Venn/; accessed date: 23 November 2020).

Data Filtration and Downstream Analyses
Downstream analyses as data filtration of proteinGroups obtained with MaxQuant was performed using Perseus freeware (version 1.6.2.3). Original data filtration included contaminants and REV_removal, as well as those proteins only identified by site. Afterwards, absolute intensity (iBAQ) of filtered proteinGroups was log(x)-transformed and only those proteinGroups with three valid values (of three possible) per row were considered. For the protein expression analyses, only those proteins found in the tree replicates per sample were considered. The resulting matrix containing all filtered proteinGroups was exported and manually reviewed using a set of keywords regarding the family of proteins found in barnacles cement or related organisms. Software used for graphical representation of the results was Excel (Microsoft, Redmond, Washington, DC, USA).

Characterization of Unannotated Cement Proteins
Protein sequences found in the cement proteome without hit or annotations were blasted online against the non-redundant protein database (nr at NCBI), using automatic adjustment of the PSI-BLAST (Position-Specific Iterated BLAST) algorithm. Afterwards, proteins sequences were re-annotated according to hit description, but in most of the cases, no hit was obtained. Proteins were then characterized using ProtParam tool from EXPASY (http://web.expasy.org/protparam/), including molecular weight and isoelectric point, instability index, hydropathy, percentage of positive, negative, and aromatic residues, and aliphatic index. Predictions on the secondary structure composition, solvent accessibility, protein disorder were performed by PredictProtein [47] (https://www.predictprotein.org/) from protein sequences by Meta-Disorder [41]. Principal components analysis (PCA) was used to analyze the relative composition of residues (%) of 19 unannotated proteins identified in L. anatifera (Lepadiform Order) cement multicomplex in the present study, plus 9 cement specific proteins (two CP100k, four CP52k, one CP43 and two CP19k) of L. anatifera identified and annotated in this work, in comparison to 53 cement specific proteins of various acorn barnacle species (Sessilia Order) and Pollicipes pollicipes (Scalpelliform), deposited at NCBI and literature. Only 20 amino acids were considered, for aspartic acid and asparagine were analyzed together, as well as glutamic acid and glutamine since in some cases, CPs' data delivered by the authors was in this form (one value for each of these two pairs of amino acids. The use of PCA for CPs classification is possible due to the selective pressure observed for the conservation of the relative amino acids composition of these proteins, rather than the conservation of functional domains [15,17], which precludes the possibility of their identification through the homology to others. The higher importance of the relative amino acids abundance over the primary sequence of residues has also been observed in other aquatic invertebrates, namely on the surface coupling proteins of echinoderms [48,49], highlighting the importance of this characteristic on wet adhesion.

Conclusions
The protein composition of L. anatifera cement multicomplex and cement gland was quantitatively studied by the first-time using high-throughput proteomic combined with bioinformatics and statistic approaches. The profiles of both gland and cement proteomes of L. anatifera were similar those of the goose barnacle P. pollicipes, previously studied. It was dominated by the bulk cohesive proteins CP100k and -52k, whereas surface coupling proteins were less abundant. The species differed on the interfacial proteins, represented in L. anatifera mainly by CP43k-like, but also by -19k and -20k, contrarily to P. pollicipes adhesive in which only CP19k was found. For the first time CP20k was found to be expressed in a membranous-base barnacle, an interfacial protein postulated to be exclusively related with the adhesion of the cement to the calcareous base of acorn barnacles, which was not the case. Chemical cues were much less represented at L. anatifera adhesive as compared to P. pollicipes, which we hypothesize having to do with the different reproductive ecology of the species, related to the habitat; one moving as neuston in the oceans, and the other fixed in the rocky shores. Unlike at cement secretion, the canonical barnacle's CPs could not be detected in the cement gland of L. anatifera at the proteomic level, although they did at transcriptomic level. This may have to do with the fact that this species is unable to relocate voluntarily, contrarily to P. pollicipes.
Unannotated and uncharacterized proteins accounted for 7.9% of total proteins, of which 6 proteins were listed among the 30 most expressed proteins in the cement proteome of L. anatifera. A principal component analyses (PCA) revealed that 8 out of 19 of those proteins were new CPs, since they clustered with the 3 groups of canonical CPs previously described in the literature. Four clustered with surface coupling proteins of the group G1, which includes CP19k, -43k, and -68k; two with the interfacial proteins of G2-the group of CP20k proteins, and two with the proteins of the G3, which comprises CP52k and -100k, the bulk CPs. It remains to be defined if the 11 unannotated CPs that did not cluster with any of the previously defined CP groups have an adhesive or cohesive function, or even a different function in the cement multicomplex. Six of them formed a new cluster together with CP22K and -7K of A. eburneus. The importance of PCA on the identification of unannotated non-conserved adhesive proteins, whose selective pressure is on relative amino acids abundance, was demonstrated.