Bioinformatic Analysis of Structure and Function of Lim Domains of Zyxin Family Proteins

Lim domains are one of the most abundant types of zinc-finger domains and are linked with diverse cellular functions ranging from cytoskeleton maintenance to gene regulation. Zyxin family Lim domains perform the critical cellular functions and are indispensable for cellular integrity. Despite having these important functions the fundamental nature of the sequence, structure, functions, and interactions of the Zyxin family Lim domains are largely unknown. Therefore, we have used a set of in-silico tools and bioinformatics databases to distill the fundamental properties of the Zyxin family proteins/Lim domains from their amino acid sequence, phylogeny, biochemical analysis, post-translational modifications, structure dynamics, molecular interactions, and functions. Consensus analysis of the nuclear export signal suggests a conserved Leucine-rich motif composed of LxxLxL/LxxxLxL. Molecular modeling and structural analysis demonstrate that Lim domains of the members of the Zyxin family proteins share similarities with transcriptional regulators, suggesting they could interact with nucleic acids as well. Normal mode, Covariance, and Elastic Network Model analysis of Zyxin family Lim domains suggest only the Lim1 region has similar internal dynamics properties, compared to Lim2/3. Protein expression/mutational frequency studies of the Zyxin family demonstrated higher expression and mutational frequency rates in various malignancies. Protein-protein interactions indicate that these proteins could facilitate metabolic rewiring and oncogenic addiction paradigm. Our comprehensive analysis of the Zyxin family proteins indicates that the Lim domains function in a variety of ways and could be implicated in rational protein engineering and might lead as a better therapeutic target for various diseases, including cancers.


Introduction
Lim (Lin-11, Isl-1, and Mec-3) domains are zinc finger domains (ZnFs), first identified as a DNA-binding motif in the transcription factor TFIIIA found originally in African Clawed Frogs (Xenopus laevis) (1). Individual Lim domains have independent functions and their binding properties are modulated by the number of zinc fingers in the Lim region (2). ZnFs have binding competency to diverse biomolecular substrates such as DNA, RNA, proteins, and lipids (3)(4)(5)(6). These domains display distinct metal binding characteristics and require zinc/iron or other metal ions to stabilize the finger-like folds (7). ZnFs classification is based glutamate (26,27). The Lim consensus sequence has been identified as CX 2 CX 16-23 HX 2 CX 2 CX 2 CX 16-21 CX 2 (C/H/D) (X=any amino acid) (28). However, the analysis of all the identified human Lim sequences shows a slightly broader consensus sequence. The number and spacing of highly conserved amino acids (such as cysteine/histidine/aspartate) specify the metal-binding characteristics of a particular Lim domain (26,29). The highly conserved amino acids coordinate Zn ion to stabilize the Zn finger topology. The semiconserved amino acids (bulky/aliphatic) are also found in the invariant position, but their role has not been elucidated yet. The number and properties of amino acids other than conserved or semi-conserved are also variable in different Lim domains (28).
Based on their amino acid composition, domain architecture, function, and cellular localization, Lim domains are classified into four different categories: A) Nuclear only, B) Lim-only, C) Lim actin-associated, and D) Lim-catalytic (Figure 1). Sequence and phylogenetic analysis show that Lim proteins played a major role in the evolution of multicellular organisms (22). The Nuclear Only Lim domains family typically contains proteins with two Lim domains and a C-terminal homeodomain. The proteins of this family function as a transcriptional activator and potential transcription factor such as Isl-1 and Lhx1, respectively (28). As their name suggests, these proteins are located into the nucleus and regulate the gene expression, which in turn modulates the cell lineage determination, neurogenesis, and cell differentiation (24,28,30). The Lim Only protein family (LMO, CRP, FHL, PINCH) harbours 1 to 5 Lim domains, and these Lim domains (especially in case of LMO/CRP) have a high similarity to Isl-1 and Lhx family (Figure 1). The Lim Only family proteins have an evolutionary relationship with both Nuclear Only and Lim-actin associated members suggesting that it could have originated from common ancestors by divergent evolution (22,28). Proteins from this family function as regulators of cell proliferation, differentiation, apoptosis, and autophagy (11,(31)(32)(33)(34)(35). The largest Lim superfamily containing proteins is the Lim actin-associated group. It is composed of eight families (Paxillin, Zyxin, Testin, Enigma, ALP, ABLIM, EPLIN, and LASP), which possess a total of 26 proteins identified so far (28), and are present in the nucleus as well as in the cytoskeleton (26,28). The other proteins in this family are not specified yet because of Lim sequence promiscuity (22). The Lim-catalytic family includes Lim domains that are found in catalytic proteins that act as monooxygenases and kinases (22,28) (Figure 1). These proteins play an important role in cytoskeleton remodeling, dynamics, and reorganization. The proteins of this family also function as signal transducers, regulates various actin-dependent biological processes including cell-cycle progression, cell motility, and differentiation (28,36,37). Additionally, they also promote microtubule disassembly, axonal outgrowth, and brain development (38,39). Structural studies report that all Lim domains bind to two zinc atoms, and display the topology of zinc finger motifs (29,37).
The comparison of biochemical, structural, and dynamics features of the Lim domains of Zyxin family provides us information on how they are involved with similar roles as well as dedicated functions. These characteristics might contribute to rescuing or complimenting the functions of other proteins in the same family as many of the Zyxin family proteins are an obligate requirement of tumorigenic properties in the cells (40-42). A recent study suggests that Zyxin acts as a biomarker for the aggressive phenotypes of human brain cancer (glioblastoma multiforme) and is associated with poor prognosis of glioma patients (43). Our analysis also discusses the role of the Zyxin family in oncogenic addiction (44) and metabolic rewiring (45). In this article, we have provided detailed insights into the nature and properties of Lim domains from Zyxin family proteins (Lim actinassociated group), which have unique zinc finger motifs and are involved in diverse cellular processes, especially in carcinogenesis. Zyxin family members are key regulators of many pathways including the cellular developmental pathways (28). All the members of the Zyxin family Lim domain contain three Lim domains at the C-terminus which is devoid of the homeodomain but retains a common presence of Proline-rich repeats (PRR) at the N-terminus (Figure 2). In this study, we have harnessed a set of in-silico tools and databases to gain insights into the amino acid compositions, phylogeny and their biochemical analysis, posttranslational modifications, structure and dynamics, molecular interactions, and functions of the Lim domains of Zyxin family proteins which include Zyxin, LimD1, Ajuba, LPP, WTIP, FBLIM1, TRIP6 (Figure2). Moreover, we have also performed Normal Mode and Covariance analysis to obtain information on biologically functional motions of Lim domains. This comprehensive comparative study provides insights into the molecular basis of the versatility of Lim domains and their functions.

Zyxin family proteins exhibit variations in PRR/Lim edges.
Lim domain-containing proteins are present in different cellular locations such as cell membranes, cytoskeleton, cytoplasm, and in the nucleus. Some Lim proteins such as Lim homeodomain (28) (LHX family: constitute by 11 proteins), and nuclear LMO (28) (Lim Only proteins) (LMO1/2/3/4/X) are exclusively located into nuclear milieu (Figure 1) Table 1 for more details). The amino acid sequence and composition, peptide length, net charge (isoelectric point), and biochemical features of Zyxin family Lim domains vary considerably. However, the presence of cysteine/histidine residues and their spacing are quite uniform in these domains. These Lim domains also require Zn 2+ ion coordination for folding and stability (28). Each Lim domain contains a set of four cysteine residues that individually coordinate Zn 2+ ions leading to the formation of Zinc finger topology arranged in tandem (26,28) (Figure2B). These two clusters of cysteine residues are separated by two amino acid-long spacers, of which one is a semi-conserved aliphatic/bulky amino acid, and the other is a non-conserved (charged/hydrophilic/hydrophobic) amino acid that is indispensable for the Lim domains' function (28). The topology of Lim domain zinc fingers shows treble-clef like folds. However, nuclear Lim domains have different amino acids compositions compared to the cytoskeletal Lim domains. Although, some regions of Lim1/2/3 domains of the cytoskeletal Lim-actin associated superfamily do share similarities with Nuclear/Lim Only families. These variations of the amino acids of Lim1/2/3 in the nonconserved region imparts the functional diversity to the Lim domains in their respective families (6,26,28).
At the sequence level, we observed that the amino acid composition of the Lim family proteins, particularly Zyxin related family (Zyxin, LimD1, Ajuba, LPP, WTIP, FBLIM1, TRIP6) is quite intriguing. The sequence analysis revealed that the N-terminal is prolinerich, devoid of cysteine/histidine residues. Furthermore, these polyproline regions contain repeated unit clusters of 2 to 6 amino acids at the N-terminus but it is absent at the C-terminus of the Lim domain. Alternatively, the C-terminus contains three cysteine-rich regions (Lim1/2/3 Zn finger domains), which coordinate Zn 2+ ions to adopt a stable conformation. The length of these Zyxin family-related polypeptides revolves around 373 to 676 amino acids and constitutes ~48-69% of PRR as well as ~30-51% of cysteine-rich domains ( Table 3).
All Zyxin family proteins have three Lim domains namely Lim1, Lim2, and Lim3 (Lim 1/2/3), located at the C-terminus (Figure 2). Our amino acid sequence analysis indicated that the hydrophobic residues are non-conserved and highly variable in Lim1/2/3 domains (Supplementary Figure 1). Each Lim domain is ~55-65 amino acids long with a consensus signature repeat of cysteine/histidine residues which form two zinc fingers (each finger consists of ~30 amino acids). Sequence alignment of the Zyxin family Lim domains depicts the presence of conserved cysteine/histidine/aspartate (C/H/D) residues (Supplementary Figure 1). Due to the abundance of the hydrophobic residues in Lim domains we decided to perform a hydrophobicity analysis to study the distribution of hydrophobic residues in all three Lim domains of Zyxin family proteins as well as in individual Lim1/2/3 domains. Additionally, the hydrophobicity of the peptide is a basic determinant of a protein's topology (53), and different amino acids have variable alpha-helical or beta-sheet propensity (53,54). The hydrophobic/charged/neutral amino acid proportion analysis would also shed light on understanding the specific localization (55), and thereby the functionality of different members of the Zyxin family. The hydrophobic plots presented in Figure 3 demonstrate that the Lim1 and Lim2 domains of Zyxin, TRIP6, LimD1, LPP, and Ajuba are more hydrophobic as compared to the Lim3 domain. In the case of FBLIM1 and WTIP, we observed that the hydrophobicity of the Lim2 domain was higher than Lim1 and Lim3 domains. Such differential hydrophobicity in Lim domains could be linked to their distinct properties, structures, and functions, besides having a conserved zinc finger motif. Further phylogenetic analysis of the Zyxin family proteins was carried out to identify the most primitive member of this group. The dendrogram presented in Supplementary  Figure 2 suggests that LimD1 is the primitive member and all other members have evolved out of it, and Zyxin is the most evolved member of the Zyxin family.
Next, we implemented NetNES (56) algorithm to identify the nuclear shuttling trait via the nuclear export signal (NES) sequence, which could have recently evolved or might exist ancestrally (Supplementary Figure 2). Both phylogenetic, as well as NetNES (56) analyses, revealed that the Zyxin family proteins have a common nuclear shuttling trait, which is ancestral in nature (Supplementary Figure 2). These results are also supporting the hypothesis of Lim domain diversification and functions bloomed by the evolution of the nuclear roles in addition to cytoskeletal properties (22). Furthermore, we have identified the consensus sequence of the NES sequence in the Zyxin family proteins. Our analysis revealed that FBLIM1/WTIP/LimD1 and TRIP6/Ajuba display LxxLxL and LxxxLxL consensus sequence, respectively (L=L/V/I/F/M; x=any amino acid). However, Zyxin and LPP have some promiscuity and show the LxxxLxxL/LxxLxxL consensus sequence (L=L/V/I/F/M; x=any amino acid) (Figure 4). Interestingly, all NES sequences are found in the PRR of each of the Zyxin family proteins. These findings are consistent with the available report which shows that 75% proteins have LxxLxL, 11% proteins have LxxxLxL and 15% proteins do not comply either of these consensus sequences (56).
The isoelectric point (pI), post-translational modifications (PTMs), cellular locations, the functions of Zyxin family proteins can be altered.
In order to understand the basic properties of the Zyxin family proteins and their Lim domains (Zyxin, LimD1, Ajuba, LPP, WTIP, FBLIM1, TRIP6), we have calculated the pI of the unmodified full length as well as the Lim domain ( Figure 2 & Table 1). We observed that the peptide length for Zyxin family proteins ranges from 373 to 676 amino acids, and has pI values of 5.71 to 8.52. However, their Lim domains vary from 185 to 195 amino acids, with the pI values that range from 5.81 to 7.81 (Table  1). Our analysis suggests that FBLIM1, which is found exclusively at cell adhesion sites, has a peptide length of 371 amino acids and a pI of 5.71. The acidic pI of FBLIM1 supports the fact that many acidic proteins present in cytosol, cytoskeleton, vacuoles, and lysosomes (57). However, WTIP, which is predominantly present in the nucleus, has a peptide length of 430 amino acids and a pI of 8.53. Other Zyxin family proteins such as Zyxin, LimD1, Ajuba, LPP, WTIP, TRIP6 has a length of 430-676 amino acids and a pI of 6.20-7.19. Zyxin, LimD1, Ajuba, LPP, WTIP, TRIP6 proteins are found in the cell cytoskeleton and nucleus. The pI values for the nuclear proteins are distributed in a wide range from 4.5 to 10 unlike cytosolic/membrane proteins (57,58). Although a comparable number of nuclear proteins have basic pIs (58). It clearly indicates that the peptide length and distribution of isoelectric point could play an important role in the localization of Zyxin family proteins and their niches.
Next, we analyzed the PTMs such as acetylation, mono-methylation, ubiquitylation, and phosphorylation of Zyxin family proteins which indicated that phosphorylation is the predominant PTM for Zyxin family proteins ( Table 2). Since the pI has a critical role in protein distribution inside cells and phosphorylation impacts the pI of proteins (58,59), we also analyzed the pI of each Zyxin family proteins considering their high phosphorylation levels (Supplementary Figure  3 & Table 2). Protein phosphorylation found in eukaryotes and prokaryotes drives the conformational switching of proteins/enzymes inside the cell and regulates the diverse cellular processes such as cell signaling, cell growth, metabolism, apoptosis, membrane transport, gene expression, and cell cycle activation (60,61). Various studies support the fact that altered phosphorylation is one of the hallmark features of carcinogenesis (62-64). Imatinib, gefitinib and trastuzumab are the novel anticancer drugs that prevent phosphorylation and mitigate the tumorigenic potential of the cancer cells (61,65,66). Phosphorylation modifications can also shift the pI of a target protein (59), and previous work has linked the phosphorylation levels that affect the net charge of the protein, with cancer predisposition (57-59). Phosphorylation also has the capability to mask and unmask the NES region which destines protein localization (67,68). The phosphorylation analysis of Zyxin family proteins (Zyxin, LimD1, Ajuba, LPP, WTIP, FBLIM1, TRIP6) by PhosphositePlus (69) indicated that the pI shift varied depending on the number of phosphorylation modifications and affects the pI considerably ( Table 2). It was observed that FBLIM1 which has a pI of 5.71 confined to only focal adhesion sites, whereas the WTIP with a pI of 8.53 is mainly confined to the nucleus. All other zyxin family proteins have a pI in the range of 6.20-7.19 which is found in focal adhesion sites, cell membranes, cytosol and the nucleus. Interestingly, LPP which is mainly involved in cell motility and gene transcription (46) undergoes the drastic range of a pI shift from 7.18 for unphosphorylated proteins to 2.84 with maximum of 83 phosphorylated sites as suggested by PhosphositePlus (69). Thus, our analysis established the link between the acidic, basic and near-neutral pI of Zyxin family proteins and their cellular localisation. In conclusion, it is plausible that fluctuation of pIs after phosphorylation/ dephosphorylation in the Zyxin family proteins ascertain their subcellular locations and niches (Table 1 & 2). These findings also support the fact that changes in pI is associated with the gain of additional activities, such as subcellular compartmentalization and changes in interacting partners that imparts functional differences of the proteins (70).

Structural features of Zyxin family Lim domains indicate their similarities with transcription factors.
Since it is established that the sequence of Zyxin family proteins has an important role in functioning and localization, we further sought to investigate their structural similarities and dissimilarities. Due to the absence of highresolution structures of full-length proteins of this family, we decided to perform homology modeling for each sequence. The modeling performed using Robetta package as described in the methods section suggested that the majority of the Zyxin members irrespective of their length, except for Ajuba and TRIP6, adopt a hairpin shape such that the N-and C-terminus of the protein were on the same side and adjacent to each other ( Figure 5). In the case of TRIP6, both the terminals were positioned diametrically opposite to each other. However, for Ajuba, the C-terminus is located in the middle, whereas the N-terminus is present at the opposite end ( Figure 5). Each model was then validated using Procheck analysis which employs the Ramachandran plot, which suggested that all the residues were in permitted regions of the Ramachandran plot implying that the models predicted for Lim domains are correctly folded (Supplementary Figure 4) (71). Absolute quality estimation is a parameter that allows a comparison of the modeled structure with reported PDB structures. The score of comparison is represented via Z score, where a Z-score in between 0.5 to 1 represents a model that closely resembles an experimentally determined native structure. (Supplementary Figure 5)(72). In Quantitative Model Energy ANalysis (QMEAN) analysis of Lim domain models of Zyxin family proteins showed a Zscore ranging from 0.6 -0.7 which is well in the acceptable range of QMEAN Z-score. The comprehensive model validation performed using PROCHECK and QMEAN analysis showed that all models have a good quality of stereochemistry as well as nativeness, respectively, making them reliable to be used for further studies. Investigation of structural similarities allows us to decipher the conserved domains which in turn helps to predict and ascertain the functions of enigmatic proteins (73).
Lim domains of the Zyxin family are Zn fingers, which in turn are reported to be DNA binders. Therefore, we decided to investigate the presence of DNA binding stretches using DNAbinder webserver (74). DNAbinder trains a Support Vector Machine (SVM) using a database to search for sequence similarity and motif finding approaches to discriminate DNA binding proteins from non-binders. For this study we implemented the analysis using amino acid composition mode, the webserver performed a PSI-BLAST analysis against a database containing DNA binding and nonbinding proteins (1153 for each case). The SVM scoring was performed against a threshold of 0, with the resultant SVM indicating if the protein interacts with DNA or not. The negative SVM values suggest non-DNA binders whereas proteins with positive SVM scores are DNA binders. Our analysis indicated that all the members of the Zyxin family have positive SVM scores ranging from 0.11 (LPP) and 1.71 (WTIP), suggesting that all the proteins are DNA binding proteins (Table 4). Using DNAbinder, we can identify if a protein interacts with DNA, however, we could not identify the regions or domains of a particular protein that specifically interact with the DNA. Therefore, we have used DALI server (73) to investigate the structural/domain similarities of zyxin family proteins with DNA binding proteins.
The DALI server analyzes protein structures in 3D by comparing the submitted structure queries against the deposited structures in the PDB server. This allows us to identify proteins with biological similarities that may not have been identified by mere sequence analysis (73,75). For the alignment of zyxin protein, the DALI server could successfully identify 88 structures from the PDB server. Interestingly, the majority of structures shared only 17 -28 % sequence identity (% id), however, the Z score values showed high structural similarities with established transcription factors presented in Supplementary Figure 6. The Z-score depicts the similarity between the protein of interest with the proteins from the PDB database. It encompasses not only the structural similarities but the Structural Classification of Proteins (SCOP) similarities of the proteins as well. The SCOP database is a classification that organises proteins of the known three-dimensional structure according to common structural features and evolutionary relationships (76). The inclusion of SCOP helps us to remove biases introduced by size, substructures and divergence which may result in the server giving falsepositive proteins showing similarity (73,77). Thus, we decided to further study structures with Z-score >9.4. This strategy provided us with a total of 6 structures for the subsequent analysis. Note that these structures showed only 23-28 % identity. The highest scoring structure in DALI analysis was a complex of Lmo4 protein and Lim domain (PDB ID: 1RUT) (78), followed by a complex of Lim Only protein 4 (LMO4) and Lim domain-binding protein (LDB1) (PDB ID: 2DFY) (79), the complex of Lim domain from homeobox protein LHX3 and islet-1 (ISL1) (PDB ID: 2RGT) (80), Lim/homeobox protein Lhx4, insulin gene enhancer (ISL2) (PDB ID: 6CME) (81) and an intramolecular as well as the intermolecular complex of Lim domains from homeobox protein LHX3 and islet-1 (ISL1) (PDB ID: 4JCJ) (82) (see Supplementary Figure 6). The highest scorer 1RUT which included Lim domain transcription factor LMO4 (Lim Only) was first identified as a breast cancer autoantigen (83), which works as a transcriptional factor (84) and interacts with tumor suppressor Breast Cancer protein 1 (BRCA1) (85). The expression of LMO4 is developmentally regulated in the mammary gland and it is known to repress the transcriptional activity of BRCA1 (85). A common theme observed in all the highest scores protein complexes was that the participating Lim domains were part of transcription factors critical for developmental pathways. Furthermore, in case of candidates with Z-scores of <9.4, we observed the proteins such as LHX4 (Lim homeobox protein-4), ISL1 (Islet-1), CRP1 (Cysteine Rich Protein 1), Rhombotin, TRIP6 (Thyroid Receptor Interacting Protein 6), T-cell acute lymphocytic leukemia protein 1, LMO 2 (Lim Only 2), FHL2 (Four And A Half LIM domains protein 2), Leupaxin, CRP2 (Cysteine Rich Protein 2), PINCH2 (Particularly Interesting New Cys-His protein 2), chromatin structure remodeling complex subunit R5, RING finger protein (Supplementary Figure 6). The structural similarities of Zyxin Lim domains with various transcriptional regulators suggest it may have a role in nucleic acid binding and its regulation. Interestingly, when we performed that DALI analysis for all the Zyxin family proteins (Zyxin, LimD1, Ajuba, LPP, WTIP, FBLIM1, and TRIP6), we observed a similar structural alignment pattern with common proteins such as Isl1, LHX3, LHX4, ISL1, CRP1, and Rhombotin. However, it is important to mention that TRIP6 showed a higher degree of similarity with PINCH, Paxillin, and Lim domain BI ( Table 5). TRIP6 also displayed more unique interactions than other members; as described in protein-protein interactions section.

Zyxin family Lim domains show contrasting dynamics with large inter-domain motion of individual Lim1/2/3.
To understand the collective functional motions of Lim domains, NMA (Normal Mode Analysis) and Covariance analysis of Lim domains from Zyxin family proteins were performed using iMODS (86). This server calculates the normal modes from internal coordinates (torsion angles) which is more accurate than Cartesian approximations and implicitly maintains stereochemistry (86,87). The iMODs server performs the NMA of a biomolecule simulating a transition between two conformations, the output is generated using an affine model-based approach which helps us identify the rigid bodies and depicts their motion using simple curved arrows (86). The NMA analysis of Zyxin family proteins shows different Lim domain motions with respect to Lim1/2/3 ( Figure 6). Covariance analysis indicates the variability of correlated, uncorrelated, and anti-correlated motions. The covariance matrix suggests the fluctuations for C-alpha atoms around their mean positions and coupling between pairs of the residues (88). It is depicted as correlated (red region), uncorrelated (white) and anti-correlated (blue) motions. However, it is important to point out that the difference in motion of Lim domains was observed in the ensemble motion of Lim1/2/3 domains of the protein, whereby the motion of individual domains within the protein strongly resembled each other (Figure 7). The Lim1 domain has the highest correlated motion as shown by red regions of the Covariance matrix, but in the case of Lim2, all proteins showed varied motion. On the other hand, the Lim3 domain of all proteins displayed similar motions ( Figure 6).
Next, we employed iMODs for Elastic Network Model (ENM) analysis of Zyxin family Lim domains to identify the pairs of atoms that are connected by springs. The results are depicted by a grayscale graph, where each dot represents one spring between two atom pairs. The degree of greyness for each dot represents the stiffness, dark grey represents stiffer springs and vice versa. Lim domains of all the Zyxin family proteins, specifically the Lim1 region show higher stiffness as compared to the Lim2 and Lim3 regions (Figure 8). The areas which showed stiff grey regions were the same regions that showed correlated motions, suggesting the presence of stiff stretches of residues that gave correlated motion in Lim domains.

Mapping of protein-protein interactions of the Zyxin family proteins could provide insights into their crucial role in cancer progression.
The Zyxin family proteins are functionally diverse, playing an important role from cytoskeleton to transcriptional machinery, which is in turn possible only by protein-protein interactions. It is reported that the proline-rich region and Lim domain work as a docking site for the protein-protein interactions (28,36,89). Zyxin was first identified as a focal adhesion component in chicken embryo fibroblasts (90). It is present in a differential amount in cells like a smooth muscle cell, cardiac cells, brain & liver cells, and first interaction was identified with α actinin (90,91). As mentioned above, zyxin contains N-terminal PRR, NES, and three Lim domains at the C-terminus (Figures 1 & 2).
Zyxin is present in both the nucleus and the cytoplasm and is involved in influencing numerous signaling pathways. Furthermore, the cellular levels of Zyxin are increased in various malignant cancers (40,42,43,92). Therefore, we decided to explore the protein interactome for the Zyxin protein family using the available databases such as Bioplex (93), Genevestigator®, GeneMANIA (94), MINT (95), STITCH (96), Signor 2.0 (97) and STRING (98). First, we used the Genevestigator database that provides information on the expression levels of proteins in various cancers. As presented in Figure 9, where the x-axis represents the protein expression level, the Zyxin family proteins are upregulated in almost all the different cancer types, including breast, colon, kidney, liver, and prostate cancers. Mutation frequency is another determinant of the protein function/nature exhibited by mutated protein causing subsequent consequences in different pathology and diseases (99). Therefore, we next performed a mutation frequency analysis using PhosphoSitePlus ® databases. The resultant lollipop plots are shown for each Zyxin family proteins as a function of particular cancer types in Figure 10. These plots suggest that all members of the Zyxin family exhibit high levels of mutational frequency in different cancers. The mutation frequency is typically influenced by tissue types such as endometrial, stomach, lung, kidney, breast, and ovarian cancer (100). Our analysis suggests that all of the Zyxin family proteins have a higher mutational frequency in different cancer. However, their mutational frequency is varied in tissue types such as endometrial, stomach, lung, kidney, breast, and ovarian cancers (Figure 10).
Using Bioplex analysis (93), we identified different cytoskeleton and gene regulatory proteins that interact with Zyxin family proteins. Bioplex depicts the protein-protein interactions and also the probabilities of association using the HEK293T cell line as a model. Figure 11 presents the interacting proteins as prey (green circle) and bait (grey square) and the direction of interaction is represented by the arrow. The Bioplex analysis suggests that the Zyxin family proteins interact with each other. For example, Ajuba interacts with Zyxin, TRIP6, LIMD1, and LPP. LIMD1 was found to interact with TRP6 and Ajuba, whereas WTIP interacts with TRIP6.
Next, we performed physical association, coexpression, and pathway analysis for the Zyxin family proteins (Zyxin, LimD1, Ajuba, TRIP6, FBLIM, WTIP, and LPP) using Genemania (94), which manifested the involvement of each Zyxin family protein as a critical player that could alter the signaling pathways (Figure 12). Using this pathway analysis, we found that Zyxin could be linked with NOLC1 (Nucleolar and coiled-body phosphoprotein 1), which facilitate ribosomal processing and modifications (101). Pathway analysis also suggests that Zyxin, FBLIM and LPP interact with VASP, indicating that their similar roles or compensatory nature where the effect of an absence of one of the proteins can be compensated by another protein of the same family. Pathway analysis by OmniPath (102) exhibits the role of Zyxin in the AKT and IL7R pathways (Supplementary Figure 7). Both of these pathways are linked with leukaemia (103) and T-cell development (104). This analysis also suggests that Zyxin regulates MAPK, GSK3B and CDK1 pathway indirectly. Using the Signor analysis, we found that Akt phosphorylates Zyxin on Ser142 and regulates apoptosis (92).  (Supplementary Figure 7). STITCH database provides information on interacting partners of a protein of interest, which could in turn provide a further understanding of their molecular/cellular functions (96). On the other hand, the STRING database provides enhanced coverage of protein associations with functional genome-wide discoveries (98). Upon performing the STITCH/STRING analysis, we observed that Zyxin interacts with BCAR1 (Breast cancer antiestrogen resistance protein 1), which coordinates tyrosine kinase based signaling (105,106). Using the PHAROS server, we have analyzed the different transcription factors which harbour Lim domains. Lim domain occurrence in different transcription factors which have a clear and obligate role in gene expression of developmental pathways. These transcription factors are LHX and their isoforms, which we have also identified in our structural alignment section.

Discussion
Zyxin family of proteins are ubiquitously present in the cytoplasm and nucleoplasm (Figure 1). Despite belonging to the same family and having similar structures, they largely differ in terms of sequence, length, amino acid compositions, motif arrangement, biochemical, structural, and functional properties. Zyxin family members could become an attractive target for cancer therapy, due to their multifaceted roles in cancer development and progression. Before exploring the avenues of drug discovery, it is important to understand the basic structure and function of a protein. A prior knowledge derived from computational work immensely aids in designing experiments which saves time and increases experimental efficiency. This work is an attempt towards the basic understanding of Zyxin family proteins especially Lim domains. Here, we have used various bioinformatics tools to explore the similarity and diversity in the protein as well as to understand what are the plausible roles that these proteins could play which can be further exploited for drug development The Zyxin family proteins are functionally diverse, playing an important role from cytoskeleton to transcriptional machinery, which is in turn possible only by protein-protein interactions. It is reported that the proline-rich region and Lim domain work as a docking site for the protein-protein interactions (28,36,89). Zyxin was first identified as a focal adhesion component in chicken embryo fibroblasts (90). It is present in a differential amount in cells like smooth muscle cells, cardiac cells, brain & liver cells; their first interaction was identified with α actinin (90,91). As mentioned above, zyxin contains N-terminal PRR, NES, and three Lim domains at the C-terminus (Figures 1 & 2). Using the PhosphoSitePlus ® webserver, we mapped the possible PTM sites on the proteins under study (Supplementary Figure 3). Due to higher phosphorylation modification, Zyxin is also regarded as a phosphoprotein. Zyxin is required to undergo phosphorylation events to shuttle from the cytosol to the nucleus, and to modulate the apoptotic response (37,40,41,107). Moreover, a recent report demonstrated that zyxin is phosphorylated during mitosis and promotes colon cancer tumorigenesis in a mitotic-phosphorylation-dependent manner (42). Thus, we decided to investigate the phosphorylation sites of the Zyxin family proteins. Our analysis revealed that the majority of phosphorylation sites are located in the PRR of all the proteins observed ( Table 2). As mentioned above, the PRR is a preferred docking site for multiple protein interactions (108), thus any changes in PTM might modulate the protein-protein interaction.
Despite having varied amino acid composition and length, the zyxin family members show a common architecture with the PRR at the Nterminal followed by three Lim domains located at the C-terminus (Figure 2A). The Lim domains have a Zn-finger motif arrangement, with a Zn 2+ ion held in place by four cysteines. Each Lim domain has two such cysteine clusters holding Zn 2+ ions in place (Figure 2B). The multiple sequence alignment of Lim domains suggested that the hydrophobic residues present in zinc finger are non-conserved (Supplementary Figure 1). The subsequent hydrophobic analysis showed that the Lim domains have different hydrophobicity levels. For example, the Lim1 and Lim2 domains of Zyxin, TRIP6, LimD1, and LPP have higher hydrophobicity properties than Lim3. However, in the case of FBLIM1 and WTIP, Lim2 had higher hydrophobicity than Lim1 and Lim3 (Figure 3). This correlation was also reflected in the pI values of the Lim domains, Zyxin, TRIP6, LimD1, and LPP have a pI value lower than 7.5 whereas both FBLIM1 and WTIP have a pI value higher than 7.5 ( Table 1). These findings are consistent with the bimodal distribution of acidic and basic proteins which in turn is correlated with the subcellular localization and cellular niches (57). The pI of a protein is a key determinant of the distribution of protein in a cell. Phosphorylation of proteins influences the pI as well as the shuttling of proteins into the nucleus through masking and unmasking of NES (68,109). Additionally, phosphorylation is also involved in activating proteins and enzymes to enable them to perform various cellular functions. Furthermore, alterations in the phosphorylation of a protein is one of the hallmark features of cancer (110). The PRR is a preferred docking site for multiple protein interactions (108), thus any changes in PTM might modulate the protein-protein interaction. As PRR is a favoured region for posttranslational modifications, thus it will be interesting to further explore the mechanisms underpinning higher post-translational modification (PTM) for PRR region compared to the C-terminal Lim domains. In the case of the Zyxin protein family, Zyxin exhibits higher phosphorylation than other members, thus it is regarded as a phosphoprotein. Zyxin is required to undergo phosphorylation in order to shuttle from the cytosol to the nucleus which in turn modulates the apoptotic response (37,40,41,107). Moreover, a recent report demonstrated that Zyxin undergoes phosphorylation during mitosis and promotes colon cancer tumorigenesis in a mitoticphosphorylation-dependent manner (42). Thus, we decided to investigate the phosphorylation sites of the Zyxin family proteins using the PhosphoSitePlus ® webserver (Supplementary Figure 3). Our analysis revealed that the majority of phosphorylation sites are located in the PRR of all the member proteins ( Table 2). Our phosphorylation analysis showed that it affects phosphorylation and influences the pI, which might potentiate additional activities, such as subcellular compartmentalization, change in interacting partners, that provide functional plasticity to these proteins.
Zyxin family proteins are reported to be localized in both cytoplasm and nucleus. Furthermore, some zyxin family proteins are capable of shuttling between nucleus and cytoplasm, a feature governed by the presence of the NES sequence. The alignment of NES sequences for Zyxin family proteins revealed a consensus motif of LxxLxL/LxxxLxL with some promiscuity in zyxin/LPP as LxxxLxxL/LxxLxxL (Figure 4). This sequence is present in all of the Zyxin family proteins irrespective of their reported function and localization. This analysis supports the hypothesis that the Zyxin family proteins are evolved out of a single ancestor and have retained the NES sequence (Supplementary Figure 2). The homology modeling of the members of Zyxin family showed a structure such that the protein folded onto itself forming a hairpin-like fold with N-and C-termini present on the same site adjacent to each other ( Figures  5 and 6). These modeled structures were rigorously validated using PROCHEK/QMEAN and subjected to structural alignment and dynamics studies in order to identify the biologically relevant functional motion (Supplementary Figures 4 and 5). Our main objective was to generate 3D models of Zyxin family proteins that can be used for dynamics studies to understand the evolutionary relationships and their functions. Subsequently, we performed the Lim domains that are known to interact with proteins as well as nucleic acids (18,26,28,36,49,51,111,112). Typically, Lim1 mainly interacts with proteins (36) whereas Lim2/3 primarily has nucleic acid binding properties (18,27,37,47,111,113). However, Lim domains can also function as a protein-binding interface (36, 89,[112][113][114][115]. The high-resolution structures of some of the Lim domains have been determined using nuclear magnetic resonance and X-ray crystallography, which suggests that the Lim domain zinc finger consists of two antiparallel β -hairpins with a short α -helix (78)(79)(80)82,(116)(117)(118)(119). The conserved cysteine/histidine/aspartate residues coordinate with the zinc atom which helps to maintain the secondary and tertiary structure of the Lim domains (28) (as depicted in Figure 2). NMA of Zyxin family proteins using 3D models, which suggested that all the proteins show a pinching motion. In order to make a finer observation of the movement of the protein, we performed a Covariance analysis which showed that Lim1 had the most correlated motion across all the members. However, Lim2 and Lim3 showed varied motion across all the members (Figure 7). Through the Covariance matrix, we observed that TRIP6 shows the highest correlated motion whereas Ajuba and WTIP have relatively lower correlated motion. We also observed that Ajuba and WTIP show similar motion, whereas LPP, LIMD1, and FBLIM have similar motion but Zyxin and TRIP6 have a completely unique motion amongst the Zyxin family (Figure 7). Correlated motions are important for allosteric regulation, catalysis, ligand binding, protein folding, and mechanics of motor proteins (120)(121)(122)(123). The correlated motion of the Lim1 region indicates that it has a higher propensity for binding towards its binding partner. However, the Lim3 region of each protein shows more anti-correlated motion. These transformations in the covariance matrix of Lim domains reflect the differences in the internal structural dynamics of the Zyxin family proteins, and these dynamic behaviours could confer diverse functions to each Zyxin family member as well as each LIM regions. Similarly, stiffness of the protein was investigated using Elastic Network Modeling (ENM). The observations made for stiffness were in corroboration with the covariance analysis: the stretches that showed the highest correlated motion also showed a high degree of stiffness (Figure 8).
Several studies have linked the roles of Zyxin and other Zyxin family member proteins in cancer development and metastasis. For example, since Zyxin is a shuttling protein, it is reported that Zyxin has differential expression levels in melanocytes/melanoma cells and modulates cell spreading, proliferation, and differentiation (41). Furthermore, Zyxin was found to be a target of β -amyloid peptide and was implicated in Alzheimer's disease (124). Zyxin regulates the cellular growth by Hippo-Yorkie/Yki signaling (42), a conserved mechanism in mammalian cancer cells. Studies have reported that in colon cancer altered phosphorylation pattern of Zyxin protein is essential (42,125). In human breast cancer cells, Zyxin promotes tumorigenesis by Hippo-yesassociated protein (YAP) signaling pathway which has been involved in cancer progression and plays a crucial role in different human malignancies as well (42). Although, Zyxin was initially regarded as a focal adhesion protein (28,90), subsequent studies established that it has multifunctional properties and acts as a signal transducer (28,36,37). As described earlier, each Lim domain of zyxin has variable non-conserved regions. The presence of Zyxin in the nucleus and in the cytosol indicates its intriguing role in transcription or gene regulation (28,126). WTIP has a crucial role in cell proliferation and the downregulation of WTIP is associated with poor prognosis and survival of non small cell lung cancer patients (127). Interestingly, LIMD1, Ajuba, and WTIP discovered as a novel mammalian processing body (P-body) components and implicated in novel mechanisms of miRNA-mediated gene silencing (47). FBLIM1 promotes cell migration and invasion in human glioma by altering the PLC-γ/STAT3 signaling pathway and could be used as a molecular marker for early diagnosis in glioma patients (128). LPP was required for TGF-β induced cell migration/adhesion dynamics and regulates the invadopodia formation with SHCA adapter protein cooperation (129), and loss of LPP/Etv5/MMP-15 may be implicated in the prognostic marker of lung adenocarcinoma (130). Bearing this in mind, we performed extensive protein interactome analysis for Zyxin family proteins that provided insights into the relationships between the Zyxin family proteins and cancer, and it was observed that Zyxin family proteins are upregulated in a variety of cancers (Figure 9). Taking into account that Zyxin is involved in the upregulation of various oncogenic pathways, the increased levels of these proteins in cancer cells further highlights the importance of investigating Zyxin proteins in carcinogenesis, progression, and detection of cancer. Next, we investigated the role of the mutational frequency of Zyxin proteins in various cancers, which revealed that Zyxin, LPP, LIMD1, TRIP6, and FBLIM displays high levels of mutational frequency in stomach cancer (Figure 10). Similarly, Zyxin, LIMD1 and TRIP6 mutations are linked with endometrial cancer. In the case of lung cancer, LPP displayed the highest amount of mutation frequency. For bladder, kidney and head/neck cancers, proteins TRIP6, WTIP, and Ajuba, respectively, showed the highest mutational frequency. The increased mutation frequency again highlights the role of zyxin member proteins in various cancers. Both these experiments helped us understand that the Zyxin protein family is highly susceptible to mutation and are upregulated in many cancers, but neither of the studies presented the proteins that the Zyxin proteins interact in a cancerous cell.
We studied the Zyxin protein family interactions using Bioplex. We observed many interactions with different proteins. This relationship is represented as bait and prey, depending on the direction of interactions. Interestingly, we observed that the Zyxin family proteins interact amongst themselves as well, with Ajuba being the focal point showing interactions with TRIP6, LIMD1, Zyxin, and LPP. WTIP and FBLIM1 do not show such interactions (Figure 11 & Figure  12). Zyxin interacts with bait and prey mode with VASP, FHL3, and TANC2 ( Figure 11A). LimD1 and TRIP6 act more closely to regulate the interaction network amongst themselves (Figure 11). TRIP6 has the most complex interactome compared to other zyxin family members (Supplementary Figure  8). Alternatively, WTIP displayed the smallest interaction network consisting of two proteins only of which one is TRIP6, the other protein is PPP2R3A which is reported to negatively control cell growth and division. However, the presence of mutated forms of this protein is reported in multiple cancers (131,132). An intriguing observation was made in the case of FBLIM1 which predominantly exists in focal cell adhesion sites, and was shown to interact with isoforms of HIST1H3. The predominant function of FBLIM is reported as the interaction and maintenance of the cytoskeleton, thus it would be worth investigating as to why the Bioplex server reports a high degree of interaction of the cytoplasmic protein with a nuclear protein. Furthermore, we also found that the known interacting partners of Zyxin, such as ENAH/VASP proteins and other Lim domaincontaining proteins also interact with the other members of the Zyxin family proteins. Such analysis could provide insights into the various roles Zyxin family members are involved with. For example, Zyxin-Ajuba interaction studies could further explore how Ajuba negatively regulates the Hippo signaling pathway, positively regulates the mi-RNA mediated gene silencing, and is required for mitotic commitment (46,47,108).
Our protein-protein interactions (PPIs) analysis could provide new insights into the functional diversity of the Zyxin protein family. We have used a set of databases to extract the interaction network of each member of the zyxin family, which in turn helps in understanding the impact and insights of zyxin family proteins at the whole cellular proteome level. Interaction of Zyxin with NOLC1 (Nucleolar and coiled-body phosphoprotein 1) and BCAR1 (Breast cancer anti-estrogen resistance protein 1) suggest a novel role of zyxin in ribosomal processing and in tyrosine kinase-based signaling, respectively (Supplementary Figures 7). GeneMANIA explores functionally similar genes from the genomics/proteomics dataset which provide its predictive value for the functions/interactions, co-localization, and co-expression of the query gene with the proteins in the database. Zyxin, LimD1, and Ajuba showed more physical interactions as compared to WTIP, LPP, and FBLIM1. But the same could not be said in case of co-localization/co-expression, where Zyxin, FBLIM1, Ajuba, and TRIP6 showed higher degrees of similarity. It is important to mention here that for all the Zyxin family members, a similar degree of co-expression and colocalization was observed except for LPP. LPP demonstrated to have higher degrees of coexpression where a direct and indirect co-expression was observed with many genes, particularly TAGLN, CNN1, and SMTN (Figures 12). Out of this TAGLN and CNN1 are reported as a putative marker for bladder cancer, whereas SMTN is reported to interact with Breast Cancer Metastasis Suppressor 1 (BRMS1), a protein known to suppress metastasis in multiple carcinomas (133,134). Many zyxin family proteins such as Zyxin, WTIP, and LPP, are directly implicated in the cancer prognosis and acts as a biomarker for certain tumorigenic conditions (40,127,133).
The present in-silico studies and comprehensive analysis of the Zyxin family proteins clearly indicate that these proteins can function in a variety of ways from protein-protein to proteinnucleic acid interactions. This highlights the importance of targeting the Zyxin family proteins for the development of the better therapeutic intervention in different disease conditions, particularly for cancers.

Sequence, amino acid composition, phylogeny, and basic biochemical analysis.
All the Zyxin family (Zyxin, LimD1, Ajuba, LPP, WTIP, FBLIM1, TRIP6) sequences were retrieved from the UniProt database as outlined in Table 1 (135). Amino acid sequence alignments were performed using MUltiple Sequence Comparison by Log-Expectation (MUSCLE) which also offers identification of critical amino acid residues (136). Sequence PSI-BLAST (Position-Specific Iterative Basic Local Alignment Search Tool) was performed to find distant evolutionary related proteins. For the phylogenetic analysis, Zyxin family sequences were subjected to Clustal W (137), the output was then imported into Jalview (138) and a tree was generated through neighbour joining using a BLOSUM62 (139). Nuclear export signal (NES) prediction analysis was performed using NetNES (56) which calculates the scores from the Hidden Markov Model (HMM) and Artificial Neural Network (ANN). To determine the consensus sequence, conserved Leucine-rich NES signal analyzed from NetNES (56) aligned by MUSCLE (140) and logo was generated using WEBLOGO (141). ProtParam was used to calculate amino acid composition and basic biochemical characteristics (142). We used the program ProtScale to perform hydrophobicity analysis (142). Furthermore, PhosphoSitePlus v6.5.9.1 was used to calculate the number of phosphorylation modification sites as well as to determine the isoelectric point in the different phosphorylated states (69).

Structure modeling, validation and alignment of Zyxin family Lim domains
Zyxin family Lim domain amino acid sequence was retrieved from UniProtKB (135), as UniProt IDs detailed in Table 1 were utilized to build homology models using the Rosetta server (143). The Robetta server performs the comparative modeling with the available structure using BLAST, PSI-BLAST, and 3D-Jury packages or employs de novo modeling approaches using the de novo Rosetta fragment insertion method (144). Next, all the Lim domain models of Zyxin family proteins were validated by SWISS-MODEL workspace (145,146) encompassing the Anolea, DFire, QMEAN, Gromos, DSSP, Promotif, and ProCheck packages. Structure alignment of Zyxin family Lim domains was performed by the Dali server (73,75).

Normal mode analysis (NMA), Elastic network modeling and Covariance analysis.
NMA and Covariance analyses were performed using the iMODS server using default settings (86). After structure submission in PDB format or PDB ID, iMODS calculates the lowest frequency modes (normal mode represents the biological motions) in internal coordinates of a single/trajectories structure. This server calculates the lowest frequency normal modes in internal coordinates and utilizes the improved faster NMA calculations by implementing the iterative Krylov-subspace method (147). iMODS also provides motion representations with a vector field, deformation analysis, eigenvalues, and covariance maps. It also has an affine model-based arrow representation of the dynamic regions (86). Furthermore, the iMODS server also helps us understand how correlated the motion of residues in the protein is as well as how flexible the movement of each residue is under the Normal Mode Analysis.

Protein-protein interactions of Zyxin family Lim domains.
To explore the interactome of the Zyxin family Lim domains, we have used Bioplex explorer (Bioplex 3.0) (93). Protein-protein interactions define the involvement of the desired protein in a particular pathway that would help to map the pathogenic network. We have utilized the Pharos database to extract the role of Lim domains in cancer and autoimmune diseases (148). Pharos interface provides the collective information of the query protein by extracting the information from other databases, particularly in a disease context. Pharos is helpful in browsing the relevant information of the desired target in a systematic manner (148). To get the information about the signaling pathways influenced by the Zyxin family Lim domains, we have used OmniPath (102) and Signor 2.0 (97) programs. We also used Genemania (94) that provides insights into protein-protein, protein-DNA and genetic interactions, pathways, gene and protein expression data, protein domains, and phenotypic screening profiles. We also explored the experimentally determined and predicted protein-protein interactions using the STRING database (98). Next, we performed an analysis of the chemical and protein interactions using the program STITCH (96). We also utilized the Molecular INTeraction database (MINT) database to identify the interactions mediated by the Zyxin family Lim domains (95). MINT focuses on experimentally determined proteinprotein interactions mined from the available literature. Next, we used GENEVESTIGATOR to analyze transcriptomic expression data from repositories (149). The collection of gene expression data from different samples (tissue, disease, treatment, or genetic background) with graphics and visualization tools (149).

Figure 5:
Structures of all the Lim domains of the Zyxin family proteins were generated using comparative/homology modeling in the Robetta server. Robetta server generates structure using structures identified by BLAST and PSI-BLAST, the regions of the protein that could not be modeled using BLAST are modeled de-novo using Rosetta fragment insertion method. For each model, Nterminal is depicted in blue and the C-terminal is depicted by red. All models were validated using PROCHECK and SWISS-MODEL workspace; the structures with the highest Z-score are depicted here. Figure 6: Normal mode analysis of Zyxin family Lim domain proteins. The NMA for each protein model was performed using the iMODS server under default settings. The motion of each domain is depicted by the arrows arising in the direction of motion. Additionally, the models are coloured on the basis of NMA mobility of each structure, where the mobility is represented using a colour spectrum (blue represents the lowest mobility whereas red represents the highest mobility, and intermediate mobility is depicted by interim colours, blue>green>yellow>red). Interestingly, the direction of motion of all proteins is such that the distal ends of the protein move in the direction towards each other producing a motion that resembles clamping.     The bait protein is shown as a green circle whereas the prey protein is shown as grey squares. Clearly, TRIP6 has the highest number of interacting partners and WTIP has the least interacting partners. It is important to point out that certain zyxin family members were observed to interact with each other.