Current View on EpCAM Structural Biology

EpCAM, a carcinoma cell-surface marker protein and a therapeutic target, has been primarily addressed as a cell adhesion molecule. With regard to recent discoveries of its role in signaling with implications in cell proliferation and differentiation, and findings contradicting a direct role in mediating adhesion contacts, we provide a comprehensive and updated overview on the available structural data on EpCAM and interpret it in the light of recent reports on its function. First, we describe the structure of extracellular part of EpCAM, both as a subunit and part of a cis-dimer which, according to several experimental observations, represents a biologically relevant oligomeric state. Next, we provide a thorough evaluation of reports on EpCAM as a homophilic cell adhesion molecule with a structure-based explanation why direct EpCAM participation in cell–cell contacts is highly unlikely. Finally, we review the signaling aspect of EpCAM with focus on accessibility of signaling-associated cleavage sites.


Introduction
EpCAM (CD326) is a conserved type I transmembrane glycoprotein of 35 kDa expressed on epithelial cells of higher eukaryotes. It was initially discovered as a carcinoma cell surface antigen [1] and has since then gained immense importance as a carcinoma marker with applications in diagnosis, prognosis, and therapeutic intervention [2,3]. Soon after EpCAM's discovery it was proposed that it functions as a homophilic cell-cell adhesion molecule [4], hence its name-epithelial cell adhesion molecule. However, recent discoveries put this name to test [5,6]. Today, EpCAM implication in cell proliferation and differentiation is at the forefront, cell proliferation enhancing signaling via regulated intramembrane proteolysis [5,7], and regulation of epithelial-to-mesenchymal transition (EMT) [8][9][10]. The first high-resolution structure of EpCAM paved the way for these detailed studies [11] and here we review recent findings in the light of available structural data.

Evolution of the Structural Model of EpCAM
The first information on structural organization of EpCAM polypeptide chains was topology assignment deduced from the cDNA-derived protein sequence: a large extracellular part (EpEX), a single transmembrane region (EpTM or TM), and a short cytosolic tail (EpIC or IC). Within the extracellular part a thyroglobulin type-1A (TY) repeat [12] containing a protease-sensitive site [13,14] has been identified. The lack of other homologous and structural data resulted in erroneous classification of the N-terminal region between the signal peptide and the TY domain as an EGF-like repeat. This was  [15]; the N74 and N111 are shown as partially and fully glycosylated (gray and black, respectively), and the main protease-sensitive site marked by an arrowhead. Middle and right, crystal structure of EpEX (PDB 4MZV) in ribbon representation depicts the three domains (ND, TY and CD), disulfide linkages (yellow spheres), N-terminal pyroglutamate residue (orange-red sticks), and three glycosylation sites where asparagine was mutated to glutamine to abolish glycosylation and thereby achieve a homogenous protein sample for structure determination (mutations N74Q, N111Q, N198Q; dark gray sticks). Polypeptide chain is color-coded according to the domains and the same color coding is used throughout the paper. This and all other structural figures were prepared using UCSF Chimera version 1.14 (University of California San Francisco, Resource for Biocomputing, Visualization, and Informatics, San Francisco, USA) [16].
The N-terminal domain (just after the cleaved-off signal peptide, residues 24-63; ND) has a very compact core stabilized by a unique arrangement of three tightly packed disulfide bridges ( Figure 2a) [11]. The N-terminal glutamine (Q24) is post-translationally modified to a pyroglutamate [15]. The domain's orientation is stabilized via several polar as well as hydrophobic contacts with TY and CD. The disulfide bonding pattern is I-IV, II-VI, and III-V (Cys residues sequentially numbered from I to VI) which is clearly different from the canonical I-III, II-IV, and V-VI linkage found in EGF-like domains [17]. Likewise, the spacing between cysteine residues alongside the polypeptide chain differs between EGF-like domains and EpCAM ND. However, the ND does share disulfide linkage with some other small cysteine-rich domains, like the CFC domain of human cripto [18], and is with regard to fold [19] most similar to the disulfide-free WW-type domains [20] found in various proteins like formin-binding protein 3, dystrophin and NEDD, still, all these combined structural aspects are not found in any other protein of known structure, thereby classifying the structure of EpCAM N-terminal domain as unique. Domains of this type, also called cysteine-rich microdomains (size up to 40 aa residues), are often involved in protein-protein interactions, an example is a well-known EGF-EGFR interaction [21], and are also considered as therapeutic scaffolds with a very stable central core [22]. Interestingly, the ND of EpCAM is targeted by the vast majority of anti-EpCAM monoclonal antibodies (mAbs) raised up to now, perhaps due to its high exposure at the membrane-distal tip of the molecule, as discussed in the following section. In contrast, the anti-EpCAM mAbs targeting the TY or CD are rare [11,23]. hydrophobic side chain), for example in MUC1 [25]. While EpCAM CD lacks this auto-proteolytic motif, it is a subject of proteolysis by other proteases with implications in signaling [5] as discussed in detail later. Interestingly, one of these cleavage sites, the Y250-Y251, is juxtaposed to the spot corresponding to the auto-proteolytic site in MUC1 (Figure 2b). Contrary to other SEA domains, the CD of EpCAM contains a TLIYY motif (residues 247-251) which acts as a binding site for oncogenic ER-resident protein disulfide isomerase which may influence its trafficking [26].  (PDB ID 2J5H), and WW domain of human FBP11 (PDB 1YWJ) with characteristic tryptophan and tyrosine residues (orange-red sticks); disulfide bonds with connectivity order are shown as yellow sticks. (b) CD (deep pink ribbon) of EpCAM superimposed on the structure of the SEA domain of human MUC1 (gray; PDB 2ACM) which has an auto-proteolytic motif GSVVV (blue). Right, the sea urchin sperm (SEA) domains of human receptor-type tyrosineprotein phosphatase IA-2 (PDB 2QT7) and Notch receptor (PDB 3ETO), both without an autoproteolytic activity, shown in the same orientation. Identified cleavage sites within CD by TACE (D243-P244-G245) and BACE (Y250-Y251) are shown in orange and yellow, respectively, and the AGR2-binding region is shown in dark green (overlapping the BACE cleavage site). The superposition was done using UCSF Chimera [16]. RMSD values range from 1.75 (IA-2 and EpCAM pair) to 3.31 Å (IA-2 and NOTCH1 pair) with an overall RMSD of 2.79 Å. (c) TY domain of EpCAM (left) with indicated disulfide bridges (yellow) and the protease-sensitive site GRR (dark blue). (PDB 1YWJ) with characteristic tryptophan and tyrosine residues (orange-red sticks); disulfide bonds with connectivity order are shown as yellow sticks. (b) CD (deep pink ribbon) of EpCAM superimposed on the structure of the SEA domain of human MUC1 (gray; PDB 2ACM) which has an auto-proteolytic motif GSVVV (blue). Right, the sea urchin sperm (SEA) domains of human receptor-type tyrosine-protein phosphatase IA-2 (PDB 2QT7) and Notch receptor (PDB 3ETO), both without an auto-proteolytic activity, shown in the same orientation. Identified cleavage sites within CD by TACE (D243-P244-G245) and BACE (Y250-Y251) are shown in orange and yellow, respectively, and the AGR2-binding region is shown in dark green (overlapping the BACE cleavage site). The superposition was done using UCSF Chimera [16]. RMSD values range from 1.75 (IA-2 and EpCAM pair) to 3.31 Å (IA-2 and NOTCH1 pair) with an overall RMSD of 2.79 Å. (c) TY domain of EpCAM (left) with indicated disulfide bridges (yellow) and the protease-sensitive site GRR (dark blue). Homologous TY domains from p41 invariant chain (PDB 1ICF), thyroglobulin (TG, 2nd TY type-1 domain; PDB 6SCJ), and IFGBP-1 and -6 (PDB 1ZT3 and 1RMJ) are shown in gray. TY-characteristic CWCV sequence motif is shown in orange-red.
Contrary to the ND, the C-terminal domain (residues 139-265; CD) is the largest of the three extracellular domains and contains no disulfide bridges. The domain has an α + β fold where the helices are clustered on one side of an antiparallel concave β-sheet (Figure 2b) [11]. While this domain does not have any significant sequence similarity to proteins outside of the GA733 family, the structural comparison revealed that it belongs to the SEA (sea urchin sperm protein, enterokinase, agrin) group of protein domains found in various cell surface and secreted proteins, for example in mucin MUC1, dystroglycan, receptor-type protein phosphatase IA-2 and Notch receptors (Figure 2b) [24]. The presence of this domain type in other distantly related eukaryotic lineages, for example in green algae, suggests an ancient evolutionary origin [24]. The function of these domains is not clear; however, some are implicated in signaling via proteolytic cleavage. The domain is often found paired with a transmembrane domain, as in EpCAM and all proteins listed here; in some proteins it possesses an auto-proteolytic activity through the GSϕϕϕ motif (ϕ marks a residue with a hydrophobic Cells 2020, 9, 1361 4 of 18 side chain), for example in MUC1 [25]. While EpCAM CD lacks this auto-proteolytic motif, it is a subject of proteolysis by other proteases with implications in signaling [5] as discussed in detail later. Interestingly, one of these cleavage sites, the Y250-Y251, is juxtaposed to the spot corresponding to the auto-proteolytic site in MUC1 (Figure 2b). Contrary to other SEA domains, the CD of EpCAM contains a TLIYY motif (residues 247-251) which acts as a binding site for oncogenic ER-resident protein disulfide isomerase which may influence its trafficking [26].
The thyroglobulin-like domain (residues 64-138; TY) is sandwiched between the ND and CD. It is similar by sequence as well as by structure (Figure 2c) to TY domains of other functionally diverse vertebrate proteins: EpCAM paralogue Trop2, p41 invariant chain (TY as inhibitor of cysteine cathepsins), insulin growth factor-like binding proteins (IGBFPs; TY aids in binding IGF), SMOCs, testicans (TY as inhibitors as well as substrates of cysteine cathepsins), nidogens (TY as inhibitors of cysteine cathepsins), and thyroglobulin (some TY domains harbor precursor residues for thyroid hormones). These domains appeared early in the evolutionary history, however they are Metazoa-specific; in vertebrates the TY domain-containing proteins acquired other vertebrate-specific domains by exon shuffling and duplication [27,28]. On the basis of three disulfide bonds and bonding pattern I-II, III-IV, and V-VI, the TY domain of EpCAM is classified as TY type-1A domain (subtype 1B contains two disulfide bonds). Other characteristic features of the domain are a small hydrophobic core composed of N-terminal α-helix and β-sheet, stabilized by the three disulfide bridges, and the CWCV signature sequence motif ( Figure 2c). The TY domains of various proteins differ by the length of loops and the N-terminal α-helix, and these structural differences are also linked to their functional diversity. For example, only TY domains with a short α-helix and short loops can act as inhibitors of cysteine cathepsins since longer loops sterically prevent inhibitor-style binding into the enzyme active site cleft [29]. EpCAM TY domain does not inhibit these or other peptidases, it is rather cleaved by them [11]-within TY domain is a dibasic protease sensitive site (G79-R80-R81) located at the protruding loop ( Figure 2c). Matriptase, among other proteases, can efficiently cleave EpCAM at this site, thereby destabilizing its interaction with claudin-7 and, consequently, targeting it for internalization and degradation [30]. This protease-sensitive site is the same as the one identified and included in early models of EpCAM ( Figure 1). The cleavage at this spot does not result in dissociation of the ND since parts of EpCAM remain tethered together via the C66-C99 disulfide bond within the TY domain ( Figure 2c) [11].

The Biological Unit of EpCAM Is a cis-Dimer
In the first crystal structure of EpEX two polypeptide chains from adjacent asymmetric units of the crystal form a crystallographic dimer [11]. Considering the extensiveness of interactions and large buried surface area upon dimer formation (1980 Å 2 per subunit) with 8.6 kcal mol −1 solvation free energy gain it seemed likely that this dimer represents a biological unit. This was later confirmed by small angle X-ray scattering (SAXS) experiments in solution [6]. Moreover, structure revealed that EpCAM extracellular part alone is sufficient for dimerization, contrary to initial observations [31]. The dimerization interface is mostly formed by the protruding loop of the TY domain, which interacts with the concave β-sheet of the CD (Figure 3a). In such dimer the C-termini of the subunits are located close to each other, thus indicating that this arrangement represents a cis-dimer, in other words, formed by subunits from the same cell, since a trans-dimer (subunits from neighboring cells) would be impossible with such dimer architecture. Protein used for crystallization was a non-glycosylated mutant (three N-to-Q mutations; Figure 1), therefore N-glycosylation was modeled by attaching the canonical high-mannose chains to N74, N111 (both within TY) and N198 (within CD) (Figure 3a). While early reports indicated that N198 is not glycosylated, and the N111 and N74 are completely and partially glycosylated, respectively ( Figure 1) [15], it was later shown that all three sites are indeed glycosylated and that abolishing glycosylation at N198 significantly and negatively affects overall expression level and half-life of EpCAM on the plasma membrane [32]. In the model of glycosylated cis-dimer the oligosaccharide chains protrude sideways and could be involved in maintaining proper Cells 2020, 9, 1361 5 of 18 orientation of EpCAM with regard to the membrane. This would contribute to a greater exposure of the glycosylation-free membrane distal part [11], especially of the ND. According to molecular dynamics simulations the dimer is additionally stabilized by dimerization of the transmembrane helices of the two subunits [11]. However, the dimerization interface most probably does not extend to the cytosolic tail (EpIC) considering the analogy to EpCAM paralogue Trop2-while it has been demonstrated that cytosolic tail of Trop2 has a strong potential to form α-helical structure, dimers were not detected [33]. For EpIC no stable or induced secondary structure has been observed (our unpublished results). two scFv fragments bound to each of the NDs, all within the same asymmetric unit, implying that the EpEX subunits are not related by a crystallographic axis and therefore not necessarily identical in structure. Few residues, mostly within the TY loop, were not resolved in the electron density map, however the overall subunit structure is similar with a RMSD over Cα atoms of 1.02 (Figure 3b) [34]. Some differences in the subunit structure are at the membrane distal part of the molecule where the EpEX-only structure accommodates a crystallization additive within a hydrophobic pocket causing a local conformational change, at a side-loop with the N198 glycosylation site (missing residues, or high B-factor indicating flexibility), in the TY loop, and near the CWCV motif of the TY domain. Here, in the EpEX-only structure the site 115 is occupied by a threonine residue while for the EpEX-scFv complex the methionine protein variant was used, resulting in a local structural change within this and the neighboring third loop of the TY without other far-reaching effects, since this region is not involved in extensive contacts with the rest of the subunit nor with the juxtaposed subunit. The T115M polymorphism has been linked to increased risk of breast cancer; however, the molecular mechanism is not known [35].  Additional insight into dimer architecture and dimerization-mediating interactions is provided by the second crystal structure of human EpEX in complex with a modified single-chain Fv fragment (scFv) from an anti-EpCAM Moc31 mAb (PDB 6I07) [34]. Similarly to the first structure, triple N-to-Q glycosylation-abolishing mutant protein was used, and the protein crystallized as a cis-dimer with two scFv fragments bound to each of the NDs, all within the same asymmetric unit, implying that the EpEX subunits are not related by a crystallographic axis and therefore not necessarily identical in structure. Few residues, mostly within the TY loop, were not resolved in the electron density map, however the overall subunit structure is similar with a RMSD over Cα atoms of 1.02 (Figure 3b) [34]. Some differences in the subunit structure are at the membrane distal part of the molecule where the EpEX-only structure accommodates a crystallization additive within a hydrophobic pocket causing a local conformational change, at a side-loop with the N198 glycosylation site (missing residues, or high B-factor indicating flexibility), in the TY loop, and near the CWCV motif of the TY domain. Here, in the EpEX-only structure the site 115 is occupied by a threonine residue while for the EpEX-scFv complex the methionine protein variant was used, resulting in a local structural change within this and the neighboring third loop of the TY without other far-reaching effects, since this region is not involved in extensive contacts with the rest of the subunit nor with the juxtaposed subunit. The T115M polymorphism has been linked to increased risk of breast cancer; however, the molecular mechanism is not known [35].
While in the two crystal structures the subunits are remarkably similar, there are substantial differences in their relative orientation as part of the cis-dimer (Figure 3c). Superposition of one subunit from each of the cis-dimer structures demonstrates that the other subunit in EpEX-scFv is inclined in comparison to the EpEX-only structure. This is clearly visible by the different relative angle of the membrane-distal α-helices which in EpEX-only structures run almost in parallel, while in EpEX-scFv structure they are at 25 • angle. This poses a question, which of the two structures more closely resembles the biologically relevant cis-dimer? Due to missing residues (no electron density implying a local structural disorder) in the EpEX-scFv structure the direct comparison of interaction extensiveness would be inaccurate. However, since in the EpEX-only structure the TY loop critical for mediating cis-dimerization interactions is structurally well defined and forms extensive contacts with the CD (there is a well-defined electron density for the TY loop residues) [11], it is plausible that the EpEX-only structure (PDB 4MZV) indeed represents a more stable cis-dimer than the EpEX-scFv structure. Furthermore, crystal contacts can influence the cis-dimer structure. In the EpEX-only structure the crystal contacts are between the EpEX subunits/dimers themselves, while in the EpEX-scFv crystal they are mediated almost exclusively via scFv-scFv interactions between the adjacent asymmetric units and the EpEX dimers make little contact with the other neighboring EpEX dimers (Figure 3d). Effect of crystal packing on dimer conformation is a known phenomenon; an example is the Cro dimer from phage λ where subunits adopt different relative orientations in different crystal forms [36,37]. The fact that the two subunits could form a dimer with different relative orientations could indicate that the dimerization could be, although extensive in surface, partially dynamic. This would enable "breathing" of the dimer and might be associated with easier access to the proteolytic cleavage sites involved in signaling as discussed in the later section.

The Rise and Fall of EpCAM as an Adhesion Molecule from a Structural Point of View
EpCAM was first described as a calcium-independent cell-cell adhesion molecule, capable of mediating cell-cell adhesion through direct homophilic interaction between EpCAM molecules on adjacent cells [4,38]. The role in cell-cell adhesion was based on EpCAM translocation to the area of cell-cell contacts [38], disruption of cell-cell contacts by anti-EpCAM antibodies [38], and its ability to induce cell aggregation of mouse fibroblast cell line L929 lacking other endogenous adhesion molecules [4].
Initially, there were two proposed models for the formation of EpCAM adhesion unit. The first one, based on chemical-crosslinking of full-length EpCAM (labeled as EpCAM from here on) and EpEX, and on analytical ultracentrifugation of EpCAM, suggested EpCAM forms trans-tetramers through interaction of two cis-dimers on opposing cells (Figure 4a, left) [31]. While the dimerization of full-length EpCAM was described as strong (dissociation constant; K d < 10 nM), the trans-tetramerization appeared to be much weaker (K d = 10 µM) [31], which is in agreement with initial observations that EpCAM-mediated adhesion is significantly weaker than adhesion mediated by E-cadherin [4]. Surprisingly, the same cross-linking experiments failed to detect any oligomerization of EpEX alone, suggesting that for oligomer formation full-length protein containing both the transmembrane and cytosolic parts is critical. dimerization surface [11] the breakage of the cis-dimer would be prohibitive from the evolutionary perspective. Second, if C2 symmetry is not broken, a higher-order lateral homo-oligomer cannot have a finite number of subunits. Any interaction between two dimers that would lead to a formation of a tetramer would also imply an extension of the pattern via symmetry-equivalent sites resulting in tightly packed EpCAM clusters (Figure 4d). However, such clusters of EpCAM have never been observed [39,41].  According to the second model EpCAM resides in the membrane as cis-tetramers that interact to form a trans-octameric cell-cell adhesion unit [39]. This mechanism was based on results of chemical-crosslinking and cell aggregation assays of truncated EpCAM, lacking one or more extracellular domains [39]. The TY domain was found necessary for lateral interactions, while the trans-interactions between proteins on opposing cells are mediated by ND (Figure 4a, center) [39].
A more detailed description of EpCAM adhesion unit was not possible until the high-resolution structure of EpEX was described in 2014 (Figures 1, 3a and 4a, right) [11]. This structure where EpEX was found in a cis-dimeric arrangement provided important insight into possible architectures of oligomers: the cis-dimer does not support the tetramer/octamer model of adhesion. EpCAM cis-dimer has a cyclic C2 symmetry with axis perpendicular to the cell membrane (Figure 4c). A lateral cis-tetramer, as proposed in the tetramer/octamer model, is thus unfeasible for two reasons. First, the interactions between the subunits in the cis-dimer need to be completely rearranged to enable formation of a different lateral homo-oligomer [40]. Considering the extensiveness of the cis-dimerization surface [11] the breakage of the cis-dimer would be prohibitive from the evolutionary perspective. Second, if C2 symmetry is not broken, a higher-order lateral homo-oligomer cannot have a finite number of subunits. Any interaction between two dimers that would lead to a formation of a tetramer would also imply an extension of the pattern via symmetry-equivalent sites resulting in tightly packed EpCAM clusters (Figure 4d). However, such clusters of EpCAM have never been observed [39,41].
In agreement with the dimer/tetramer model of adhesion, a D2-symmetric model of an EpCAM trans-tetramer was proposed [11] (Figure 4d). This model accounted for all up-to-date experimental knowledge on EpCAM adhesion and theoretical knowledge on oligomer evolution:

•
The C-terminal of EpEX should extend towards the cell membrane as would be expected in the full-length EpCAM [11], thereby determining its basic orientation.

•
The N-terminal domain is not relevant for cell-cell adhesion [39], suggesting that it is not directly involved in the adhesion-mediating interactions.

•
The distance between cell membranes at sites of EpCAM mediated cell-cell contacts is around 10-14 nm [41]-this is roughly twice the dimension of an EpEX cis-dimer (5-6 nm) [11]. • N-glycosylation has no effect on adhesion [39] insinuating that sugar moieties do not participate in contact formation nor sterically hinder it.

•
From an evolutionary point of view, dimers with cyclic C2 symmetry can evolve to form tetramers with dihedral D2 symmetry (three 2-fold axes of rotation, perpendicular to each other) [40] without disturbing the already existing symmetry.
However, all this experimental information was circumstantial-although this model was in agreement with all the above listed observations, there was no structural data available that could have been unambiguously attributed to the existence of EpCAM trans-tetramer. Moreover, there was also no evidence of higher-than-dimer order homo-oligomers when the crystal structure of the cis-dimer was determined [11]. Finally, no evidence of trans interactions between EpCAM cis-dimers was found in a comprehensive structural investigation of EpCAM oligomerization employing SAXS, cross-linking coupled with mass spectrometry (XL-MS), bead aggregation assays (BAA), and Förster resonance energy transfer (FLIM-FRET). SAXS showed that EpCAM extracellular parts in solution form stable cis-dimers in concentration range from 0.5 to 26.2 mg/mL (corresponding to 17.5-919.4 µM). Although chemical-crosslinking experiments were able to capture tetrameric EpCAM, the identified crosslinks between residues could be better explained by random and transient interactions of the cis-dimers in the solution rather than trans-tetramerization which would be, according to the adhesion model, biologically relevant. BAA, an experiment commonly employed to investigate cell-cell adhesion molecules, also showed no trans-interactions between EpCAM cis-dimers. Similar conclusions could be drawn from FLIM-FRET experiments, performed using two different cell lines. Combining these experimental observations, the authors concluded that EpCAM does not form higher-order homo-oligomers as has been assumed for more than 20 years [6].
Although this seemed to contradict all the previous experiments, it was not the first paper challenging the direct involvement of EpCAM in cell-cell adhesion via homo-oligomerization. First, Fornaro et al. failed to reproduce the initial results of EpCAM's ability to induce cell segregation in transfected mouse fibroblast a year after the first such observations were described [42]. Next, Guillemot et al. also found that EpCAM has no effect on segregation of thymic epithelial cells [43]. Furthermore, Tsaktanis et al. published evidence that neither EpCAM cleavage nor EpCAM knockdown have any effect on cell-cell adhesion in a carcinoma cell line [5].
Combined, all the gathered experimental observations suggest that EpCAM's role in cell-cell adhesion should not be attributed to homo-oligomerization of EpCAM molecules on opposing cells but rather to indirect regulation of classical E-cadherin mediated adhesion [44,45] and tight junction formation [46,47], actomyosin network homeostasis [48], and cell signaling.

Structural Basis of EpCAM Signaling via Regulated Intramembrane Proteolysis
EpCAM is involved in several signaling pathways, and the first such evidence was provided in 2009 when regulated intramembrane proteolysis (RIP)-mediated signaling through EpIC-FHL2-β-catenin-Lef1 signaling complex was discovered [7]. Later it was also shown that EpCAM is involved in the MAPK signaling pathway through inhibition of nPKC [49,50], and that it regulates signaling through EGFR via direct binding to EGFR [9,51]. Most of the signaling pathways, except for RIP, have not yet been thoroughly investigated from the structural point of view.
In the case of EpCAM, RIP is comprised of two subsequent proteolytic cleavages. The first cleavage results in release of soluble EpEX and generation of a C-terminal fragment (EpCAM-CTF) (Figure 5a) [7,52]. It is mediated by either a disintegrin and metalloproteinase (ADAM) 17, also known as tumor necrosis factor-α-converting enzyme (TACE) [7], or β-secretase 1 (BACE) [52]. TACE and BACE cleavages, also termed αand β-cleavages, occur at distinct locations. α-cleavage takes place at the plasma membrane, while β-cleavage is executed after EpCAM internalization, since BACE is predominately located to the trans-Golgi network. Cleavages at sites that are located at the C-terminal part of EpEX (Figure 6a) result in three possible soluble extracellular part variants [5]. However, it remains to be discovered if they differ in biological function. It is also not clear whether a single EpCAM molecule can be cleaved both at αand β-sites and whether the type of cleavage influences downstream processing of EpCAM-CTFs.
Cells 2020, 9, x 9 of 18 ( Figure 5a) [7,52]. It is mediated by either a disintegrin and metalloproteinase (ADAM) 17, also known as tumor necrosis factor-α-converting enzyme (TACE) [7], or β-secretase 1 (BACE) [52]. TACE and BACE cleavages, also termed αand β-cleavages, occur at distinct locations. α-cleavage takes place at the plasma membrane, while β-cleavage is executed after EpCAM internalization, since BACE is predominately located to the trans-Golgi network. Cleavages at sites that are located at the C-terminal part of EpEX (Figure 6a) result in three possible soluble extracellular part variants [5]. However, it remains to be discovered if they differ in biological function. It is also not clear whether a single EpCAM molecule can be cleaved both at αand β-sites and whether the type of cleavage influences downstream processing of EpCAM-CTFs. The second cleavage (γ-cleavage) takes place in the ER where EpCAM-CTFs are processed by presenilin-2, a part of the γ-secretase complex (Figure 5b) [7]. γ-cleavage sites were pinpointed to five distinct positions within the EpCAM transmembrane region (Figure 6b), and the cleavage at first three (γ1-γ3) results in a soluble Aβ-like fragment that is released in the extracellular space. The last two (ε1, ε2) result in release of the soluble EpIC into the cytoplasm [5,52].  [53,54], and structure extracellular part of BACE was obtained from the Protein Data Bank (PDB 2WJO). Transmembrane and cytosolic regions of both proteases are depicted schematically. EpCAM is presented as molecular surface; transmembrane region and intracellular domain are shown in gray and light pink, respectively. (b) Cleavage of EpCAM-CTF by γ -secretase complex (PDB 5A63) results in release of Aβ-like peptide and EpIC that is recruited in the EpIC-FHL2-β-catenin-Lef1 signaling complex. FHL (green), β-catenin (orange) and Lef1 (blue) are depicted by shapes corresponding to their relative sizes.
The second cleavage (γ-cleavage) takes place in the ER where EpCAM-CTFs are processed by presenilin-2, a part of the γ-secretase complex (Figure 5b) [7]. γ-cleavage sites were pinpointed to five distinct positions within the EpCAM transmembrane region (Figure 6b), and the cleavage at first three (γ1-γ3) results in a soluble Aβ-like fragment that is released in the extracellular space. The last two (ε1, ε2) result in release of the soluble EpIC into the cytoplasm [5,52]. The second cleavage (γ-cleavage) takes place in the ER where EpCAM-CTFs are processed by presenilin-2, a part of the γ-secretase complex (Figure 5b) [7]. γ-cleavage sites were pinpointed to five distinct positions within the EpCAM transmembrane region (Figure 6b), and the cleavage at first three (γ1-γ3) results in a soluble Aβ-like fragment that is released in the extracellular space. The last two (ε1, ε2) result in release of the soluble EpIC into the cytoplasm [5,52].
Although the name "Aβ-like" suggests a function similar to β-amyloid fragments, the biological function of EpCAM Aβ-like fragment is still to be discovered-the name only implies similar mechanisms of generation [52,55]. The role of EpIC, on the other hand, is much better understood. Soluble EpIC forms a complex with four-and-a-half LIM domain protein 2 (FHL2) and β-catenin that is in turn translocated to the nucleus where it interacts with transcription factor Lef1 [5,7] ( Figure 5b). The resulting EpIC-FHL2-β-catenin-Lef1 signaling complex induces transcription of cell proliferation-related genes such as CCNA2, CCND1 and CCNE1 (cyclins A2, D1 and E, respectively), and MYC (c-myc) [7,56,57]. Recently, it has been discovered that generation of EpIC by γ-secretase is slow and that the resulting EpIC is afterwards efficiently degraded by the proteasome [58]. While this suggests EpIC is not suited for fast nuclear signaling as initially expected, it is still believed to be the main mechanism of EpCAM function as a signaling molecule.
Structural information on the EpIC-FHL2-β-catenin-Lef1 signaling complex is sparse. The interacting pairs of proteins have been identified, but there is a lack of high-resolution structural data. However, some conclusions can also be drawn from structural investigations of β-catenin/Wnt-signaling pathway. First, EpIC interacts with FHL2 but not directly with β-catenin [59,60]. For interaction fourth LIM domain of FHL2 is crucial but the involvement of other LIM domains is not excluded [7]. Second, at minimum the last three LIM domains of FHL2 are needed for its interaction with β-catenin [61], but presence of the first and the half LIM domain increases the strength of the interaction. On the other hand, only N-terminal domain of β-catenin is needed for establishing a stable interaction [61]. The interaction between the full-length proteins is moderately strong (K d ≈ 1.08 µM) [60]. Finally, crystal structure of β-catenin ARM repeats 2-10 with a bound part of Lef1 β-catenin-binding domain (β-catenin-BD) revealed that Lef1 interacts with β-catenin in an analogous manner as other members of TCF family. The affinities (dissociation constant) of β-catenin for Lef1 β-catenin-BD and its phosphorylated variant are 23 and 35 nM, respectively [62]. The structure of Lef1 HMG-box (291-391) bound to its target DNA segment was determined with NMR [63]. Considering all this structural data we build a schematic model of the complex (Figure 7). µM) [60]. Finally, crystal structure of β-catenin ARM repeats 2-10 with a bound part of Lef1 β-cateninbinding domain (β-catenin-BD) revealed that Lef1 interacts with β-catenin in an analogous manner as other members of TCF family. The affinities (dissociation constant) of β-catenin for Lef1 β-catenin-BD and its phosphorylated variant are 23 and 35 nM, respectively [62]. The structure of Lef1 HMGbox (291-391) bound to its target DNA segment was determined with NMR [63]. Considering all this structural data we build a schematic model of the complex (Figure 7).  [64]. Binding of EpIC to FHL2 is indicated by dotted lines (light pink; width is related to importance of interaction). First and a half, second, third and fourth domain of FHL2 are depicted based on corresponding NMR structures (PDB 2MIU, 1X4K, 2D8Z, and 1X4L respectively). Binding of FHL2 to β-catenin N-terminal domain is indicated by a green dotted outline. β-catenin is Figure 7. Schematic model of EpIC-FHL2-β-catenin-Lef1 signaling complex. EpIC was modeled using MODELLER [64]. Binding of EpIC to FHL2 is indicated by dotted lines (light pink; width is related to importance of interaction). First and a half, second, third and fourth domain of FHL2 are depicted based on corresponding NMR structures (PDB 2MIU, 1X4K, 2D8Z, and 1X4L respectively). Binding of FHL2 to β-catenin N-terminal domain is indicated by a green dotted outline. β-catenin is represented by structure of ARM repeats with bound part of Lef1 β-catenin BD (PDB 3OUW) and relative positions of Nand C-terminal domains (NTD and CTD, respectively), the structures of which are yet unknown. Position of β-catenin BD is indicated by blue dotted outline. Structure of Lef1, except for the C-terminal HMG-BOX bound to its target DNA sequence (PDB 2LEF), is not known. β-catenin BD and Pro-rich region are indicated at their relative position.
Despite considerable progress in our understanding of RIP-mediated EpCAM signaling in the past years several questions remain unanswered. First and most importantly, the exact role of EpIC in the EpIC-FHL2-β-catenin-Lef1 signaling complex is not known-β-catenin/Lef1 are known to induce transcription of the same oncogenes as EpIC-mediated signaling without the presence of either FHL2 or EpIC (reviewed in [65]). Second, the quest for identification of a RIP trigger has been, to date, unsuccessful. Initially it was proposed that soluble EpEX or formation of EpCAM cell-cell contacts initiates RIP, but this was later rebutted by discovering that such interactions are highly unlikely [6]. A recent report suggested that RIP is induced through EGFR activation via EGF [51] but others failed to confirm this observation [9]. Third, TACE cleavage sites were mapped on EpCAM cis-dimerization surface [5], meaning that cis-dimerization and cleavage are mutually exclusive ( Figure 8). However, no explanation was provided on what causes otherwise stable EpCAM cis-dimers [6,11] to dissociate for the cleavage to take place. Similarly, EpCAM TM region has the tendency to dimerize [11] which may hinder its processing by γ-secretase as has been demonstrated for the C-terminal fragment of the amyloid β protein-precursor (APP CTFβ) [66].
cis-dimerization surface [5], meaning that cis-dimerization and cleavage are mutually exclusive ( Figure 8). However, no explanation was provided on what causes otherwise stable EpCAM cisdimers [6,11] to dissociate for the cleavage to take place. Similarly, EpCAM TM region has the tendency to dimerize [11] which may hinder its processing by γ-secretase as has been demonstrated for the C-terminal fragment of the amyloid β protein-precursor (APP CTFβ) [66]. One subunit of the dimer (gray ribbon) covers the cleavage site within the other subunit (molecular surface, cleavage site in orange). This cleavage site is in EpEX monomer easily accessible as shown by the complex of one subunit with a catalytic domain of TACE (TACEcat; orange ribbon). The model was generated using HADDOCK [67] with α1 and α2 cleavage sites on EpEX (orange surface) or TACE active and zinc binding site (H405, E406, H409, and H415; gray side chains) used as interaction restraints.
Further investigation is needed to answer the abovementioned questions. We believe that such knowledge will not only provide us with a better understanding of this major EpCAM signaling pathway, but also pave the way for new possibilities for a rational design of the next-generation of drugs for treating carcinomas and other diseases involving EpCAM.

EpCAM Structure and Diseases
Since its discovery on the surface of colorectal cancer cells in 1979 [68,69], EpCAM has been recognized as an epithelial cancer antigen. Due to its frequent overexpression in carcinomas [70], it has been widely studied as a target for cancer diagnostics and treatment (reviewed in [71]). EpCAM overexpression is often linked to poor prognosis [72][73][74][75][76][77][78], presumably due to its involvement in cancer cell proliferation, migration, and metastasis [79]. One subunit of the dimer (gray ribbon) covers the cleavage site within the other subunit (molecular surface, cleavage site in orange). This cleavage site is in EpEX monomer easily accessible as shown by the complex of one subunit with a catalytic domain of TACE (TACE cat ; orange ribbon). The model was generated using HADDOCK [67] with α1 and α2 cleavage sites on EpEX (orange surface) or TACE active and zinc binding site (H405, E406, H409, and H415; gray side chains) used as interaction restraints.
Further investigation is needed to answer the abovementioned questions. We believe that such knowledge will not only provide us with a better understanding of this major EpCAM signaling pathway, but also pave the way for new possibilities for a rational design of the next-generation of drugs for treating carcinomas and other diseases involving EpCAM.

EpCAM Structure and Diseases
Since its discovery on the surface of colorectal cancer cells in 1979 [68,69], EpCAM has been recognized as an epithelial cancer antigen. Due to its frequent overexpression in carcinomas [70], it has been widely studied as a target for cancer diagnostics and treatment (reviewed in [71]). EpCAM overexpression is often linked to poor prognosis [72][73][74][75][76][77][78], presumably due to its involvement in cancer cell proliferation, migration, and metastasis [79].
Despite that, a lot of molecular details of EpCAM's role in carcinogenesis are unknown. Surprisingly, there are not many known mutations in the EPCAM gene linked to cancer. Deletions of the last exons in one of the alleles are linked to hereditary non-polyposis colorectal cancer (HNPCC), also known as the Lynch syndrome [80][81][82]. In most cases these deletions involve loss of exons 8 and 9 that code for EpIC (for an extensive review please see [83]). Since in addition to other gene defects, the polyadenylation signal is also lost, it is not clear if these truncated forms are successfully translated. Hypothetically, such truncated proteins would not be able to participate in RIP-mediated signaling and their cell surface localization would be compromised due to disrupted localization-determining protein-protein or protein-lipid interactions. However, abnormal EpCAM protein is not believed to be the culprit at all-deletion of EpCAM 3' exon causes silencing of a downstream DNA mismatch repair protein MSH2 gene through transcriptional read-through and promoter methylation [80], which results in increased risk of cancer.
While monoallelic mutations are not known to cause developmental defects, biallelic mutations of the EPCAM gene cause congenital tufting enteropathy (CTE). CTE is an inherited disorder of the small intestine that results in a severe form of diarrhea [84]. To date, 42 different EPCAM mutations have been linked to CTE [83,[85][86][87][88][89][90][91][92][93][94][95]. On the protein level these mutations result in single amino acid substitutions, truncations due to frameshifts, and missing segments due to abnormal splicing. In most cases EpCAM is synthesized as a soluble protein without its transmembrane and intracellular parts [83]. Mutant EpCAM (deletion of exon 4) is not present on the cell surface, it rather accumulates in the ER. This activates the ER stress-induced mechanism unfolded protein response (UPR) [96]. There are also two reports of homozygous full EPCAM gene knockouts [80,87]; however, they are lethal in mice [97].
Loss of functional EpCAM affects expression and proper localization of other proteins involved in cell-cell adhesion: key components of adherent junction E-cadherin and β-catenin [98], and tight junction protein claudin-7 [30,99,100]. This is easily explained when mutant EpCAM is expressed only as its soluble (truncated) extracellular part, because transmembrane and intracellular regions are responsible for interactions with claudin-7 [101] and β-catenin [7], respectively. However, the connection between structural and functional consequences of mutations is not immediately obvious in case of extracellular single amino acid substitutions, far from the identified interaction surfaces with the abovementioned proteins (for an extensive review please see [83]). We hypothesize that these mutations affect the stability of EpCAM protein as a whole and lead to either its increased internalization or proteolytic degradation or accumulation in ER, as in the case of exon 4 deletion mutant [96].

Concluding Remarks
Our understanding of EpCAM at structural level has improved significantly over the past decade. Determination of extracellular domain cis-dimeric crystal structure clarified its composition and domain organization. It also provided a detailed overview of EpCAM key structural features such as protruding N-terminal domain with a unique fold that harbors the majority of antibody binding sites, and a TY repeat with an uncommonly long loop that is critical for EpEX cis-dimerization via binding to its third, C-terminal domain, where important cleavage sites are located. The cis-dimer structure of EpEX also provided a basis for further investigation of the relationship between EpCAM homo-oligomerization and its role in cell-cell adhesion. However, recent findings provide compelling evidence that EpCAM molecules on the opposing cells do not interact, putting EpCAM's adhesive role in question. In contrast, the number of studies reporting EpCAM engagement in signaling is increasing. Structural investigations of RIP cleavage explained the key steps in this process and provided an exact knowledge of cleavage sites. However, these findings brought about many new questions that need to be answered before all details of this signaling pathway are fully understood. For example, the relationship between cleavages and cis-oligomerization is not clear and the data on intracellular signaling complex is sparse.
More structural information on EpCAM interactions with its binding partners is also needed. We believe further endeavors in this direction will help us elucidate the complex and diverse role of EpCAM in epithelial morphogenesis, homeostasis, and disease.