Comparative Studies of Vertebrate Beta Integrin Genes and Proteins: Ancient Genes in Vertebrate Evolution

Intregins are heterodimeric α- and β-subunit containing membrane receptor proteins which serve various cell adhesion roles in tissue repair, hemostasis, immune response, embryogenesis and metastasis. At least 18 α- (ITA or ITGA) and 8 β-integrin subunits (ITB or ITGB) are encoded on mammalian genomes. Comparative ITB amino acid sequences and protein structures and ITB gene locations were examined using data from several vertebrate genome projects. Vertebrate ITB genes usually contained 13–16 coding exons and encoded protein subunits with ∼800 amino acids, whereas vertebrate ITB4 genes contained 36-39 coding exons and encoded larger proteins with ∼1800 amino acids. The ITB sequences exhibited several conserved domains including signal peptide, extracellular β-integrin, β-tail domain and integrin β-cytoplasmic domains. Sequence alignments of the integrin β-cytoplasmic domains revealed highly conserved regions possibly for performing essential functions and its maintenance during vertebrate evolution. With the exception of the human ITB8 sequence, the other ITB sequences shared a predicted 19 residue α-helix for this region. Potential sites for regulating human ITB gene expression were identified which included CpG islands, transcription factor binding sites and microRNA binding sites within the 3′-UTR of human ITB genes. Phylogenetic analyses examined the relationships of vertebrate beta-integrin genes which were consistent with four major groups: 1: ITB1, ITB2, ITB7; 2: ITB3, ITB5, ITB6; 3: ITB4; and 4: ITB8 and a common evolutionary origin from an ancestral gene, prior to the appearance of fish during vertebrate evolution. The phylogenetic analyses revealed that ITB4 is the most likely primordial form of the vertebrate β integrin subunit encoding genes, that is the only β subunit expressed as a constituent of the sole integrin receptor ‘α6β4’ in the hemidesmosomes of unicellular organisms.


Introduction
Cell surface integrin receptors regulate cell-cell and cell-extra cellular matrix (ECM) interactions and are involved in mediating all known basic cellular processes (proliferation, migration, differentiation and death) in the body. Precise regulations of these cellular processes by a wide range of integrin receptors are witnessed in cells during development and later in life [1][2][3][4][5][6][7]. Disturbance of integrin function/s lead to suboptimal organogenesis in rodent animal models [5][6][7][8] and disease states in human populations [9][10].
An integrin receptor is a heterodimer consisting of an α and a β subunit, each containing extracellular, transmembrane and cytosolic domains. The extracellular domains of receptor subunits bind with the ECM proteins (such as fibronectin, laminin and collagen) and the cytosolic domains of β subunits interact with kinases (focal adhesion kinase and Src kinase), adaptor molecules (such as talin and kindlin) and the cytoskeleton (actin and microtubules) [5,[11][12]. These interactions facilitate the 'outside-in' and the 'inside-out' signaling across the cell membrane by the integrin heterodimers [12][13][14].
While the evolutionary path of integrins in development and maintenance of cellular process are intensive areas of investigation [15][16][17], the evolution of different integrin subunits encoded within the vertebrate genomes remains to be fully elucidated. This knowledge is necessary for understanding the integrin receptors and the evolution of cellular functions that are coordinated by these versatile receptors. Evolution of integrin genes dates back to the time of transition of unicellular life forms into multicellular organisms [18][19]. It is known that the integrin-mediated adhesion system existed in the single celled Amastigomonas (Phylum Apusozoa), possibly for the purpose of attachment with the basal lamina, a transition towards sedentary life and multicellularity [19]. In vertebrates, a phylum that includes ~53,000 species, the genes coding integrins are identified as early as in fishes (Actinopterygians) that evolved about 450 millions of years ago (Mya) [20]. Here we report the gene structures and amino acid sequences for vertebrate β-integrin encoding genes (ITB) and proteins (ITB), respectively, as well as their phylogenetic and evolutionary relationships. Potential regulatory sites for several human ITB genes, predicted secondary structures of signal peptides and cytoplasmic domains and tissue specific expression for mammalian ITB genes are also discussed in terms of their homology and evolution. Table 1 summarizes the locations and predicted structures for vertebrate ITB genes based upon BLAT interrogations of several vertebrate genomes using the reported sequences for human ITB1 [21][22][23][24], ITB2 [25][26][27]; ITB3 [28][29][30][31]; ITB4 [32][33][34]; ITB5 [35][36][37]; ITB6 [38][39][40]; ITB7 [41][42]; and ITB8 [43][44] and the University of California Santa Cruz (UCSC) Genome Browser [45]. The predicted vertebrate ITB genes predominantly contained 13-16 coding exons, with the exception of vertebrate ITB4 genes which exhibited 36 (opossum ITB8) to 39 coding exons and encoded larger ITB protein subunits (~1,800 amino acids) as compared with other ITB subunits which contained ~800 amino acids in sequence (Table 1). ITB genes were separately located on vertebrate chromosomes for each of the genomes examined in comparison with other gene families which may be clustered on a single chromosome (e.g., the alcohol dehydrogenase (ADH) gene family) [46] or a small number of chromosomes such as the lactate dehydrogenase (LDH) gene family [47].

Vertebrate ITB Signal Peptides and Domain Structures
Application of the SignalP 3.0 server predicted a standard length of signal peptides for ITB1 (20 aa), ITB2 (22 aa), ITB3 (26 aa), ITB4 (27 aa), ITB5 (24 aa), ITB6 (21 aa) and ITB7 (19 aa) subunits except for a long signal sequence (42 aa) for the smallest size integrin isoform ITB8. Although, the server provided a distinct cleavage site for signal peptides in all human ITB forms, a recent study has shown that the signal peptide of the β2 integrin subunit in ruminants containing cleavage inhibition glutamine (Q) was not processed [48]. Not much is known about the signal peptide processing of integrins; however, human ITB1 and ITB2 integrin subunits contain cleavage inhibiting 'Q' at the predicted cleavage sites (data not shown). Domain annotation of signal sequences of ITB genes predicted a central helical domain with an anterior and a posterior coiled motifs in ITB1, ITB2, ITB3, ITB5, ITB6 and ITB7 subunits. The signal peptide for ITB8, however, consisted of two central helical motifs separated by a coiled motif and two additional coiled motifs at the N and C terminal ends of the signal peptide. The predicted signal peptide sequences from different ITB subunits showed little evidence of sequence similarity (data not shown) that is not uncommon for signal peptides [49]. The lack of identity amongst the primary structures of signal sequences of different ITB subunits and the similarity amongst the secondary structures (a central hydrophobic core with coiled motifs at the ends) implies that these secondary structures of signal sequences are indispensible conformations for the insertion of the N-terminal ends of beta-subunits into the cell membrane. The reason for the very long signal sequence and two hydrophobic motifs in the ITB8 structure is unclear although it is possible that an additional hydrophobic motif may enhance the processing and translocation of ITB8 into the lipid bilayer [50][51]. Figure 1 illustrates the predicted domain structures for ITB2 and ITB4, with the former representing the domain structures for ITB1, ITB3, ITB5, ITB6, ITB7 and ITB8 [52] including the N-signal peptide previously described (residues 1-22 for ITB2); an extracellular integrin beta region (pfam00362) (residues 32-447) including a potential cell attachment site (residues 397-399) and a region of cysteine-rich tandem repeats (residues 414-617); an integrin beta tail domain (pfam07965) (residues 622-700); a transmembrane helical region (residues 701-723) (see Figure 1 for ITB2 TMHMM region), which anchors ITB2 to the cell membrane; and an ITB2 cytoplasmic region (residues 724-768). Calx-beta domain (in green); FN3 (fibronectin 3), cytokine receptor and interdomain contacts (red triangles); Note: lack of FN3 binding domains and the cytokine receptor motifs in 4 subunit that interacts only with laminin-332.

Alignments of Vertebrate ITB Cytosolic Isoform Sequences
The cytoplasmic domains of β subunits interact with several intracellular proteins. Many of these interactions are known to cause conformational change in the extracellular domain changing the affinity of the receptor with the ECM. The interactions of extracellular domains with the ECM may also cause change in the conformation of the cytosolic domain allowing its interaction with the non-receptor tyrosine kinases and the actin cytoskeleton. Therefore, the cytoplasmic domain of integrin β subunits plays crucial roles in both 'outside-in' and 'inside-out'signaling [12][13][14]. Figure 2 examines alignments of vertebrate ITB1 cytosolic domain sequences which are color coded for amino acid residue properties. With the exception of a second duplicated ITB1.1 (designated as ITB1B) gene product observed in zebrafish (Danio rerio), identical sequences were observed for the ITB1 cytosolic domain for all vertebrates examined, which indicates that this is a highly conserved region of ITB1 which undertakes essential functions and is subject to selection and maintenance of this sequence. Comparisons of the cytosolic domain sequences for the other ITB proteins (alignments not shown) revealed lower levels of amino acid sequence identities as compared with the highly conserved ITB1 cytosolic domain sequence: ITB2 (37% identities); ITB3 (77% excluding the gene duplicate product ITB3B from zebrafish); ITB5 (50%); ITB6 (74%); ITB7 (35%); and ITB8 (45%).

Figure 2.
Amino acid alignments for vertebrate ITB1 cytosolic domain sequences. ITB1 sequences examined included Hu-human; Rh-rhesus; Ma-marmoset; Mo-mouse; Ra-rat; Gp-guinea pig; Ho-horse; Co-cow; Pi-pig; Op-opossum; Ch-chicken; Fr-Xenopus tropicalis; Zf-zebrafish; see Table 1 for details; note that 2 ITB1-like genes were observed in zebrafish (designated as ITB1A and ITB1B); * shows identical residues for ITB subunits; : similar alternate residues; . dissimilar alternate residues; α-helix for vertebrate ITB sequences is in shaded yellow; β-sheet is in shaded grey; colors for amino acids are shown as: basic (R and K); acidic (D and E); neutral hydrophilic (G, Y, Q, S, T, N, Y, C, H); and hydrophobic (M, A, F, I, L, W, P, V); the Cyto-1, Cyto-2 (NPXY) and Cyto-3 (NXXY) domains are shown in dotted lines (see text for details). Figure 3A shows amino acid sequence alignments for the six major human ITB1 isoforms designated as ITB1a-ITB1f [57]. Residues 1-26 were identical for each of the isoforms which contained the 19 residue α-helix region, whereas the C-terminal differed in length and sequence and exhibited 1-2 predicted β-sheet regions. Recent studies [58] have shown that ITB1a is expressed in fetal muscles but is substituted by ITB1d during postnatal development. The C-terminal region is exposed at the cytoplasmic face of the plasma membrane where it is bound to the actin filaments. ITB1d is expressed only in striated muscle tissues and binds to both cytoskeletal and extracellular matrix proteins with an affinity higher than ITB1a which provides a stronger link between the cytoskeleton and extracellular matrix to support mechanical tension during muscle contraction. ITA1a and ITA1b have been shown to be similar as far as the alpha/beta association and fibronectin binding are concerned but differ, however, in their subcellular localization. ITB1a has been localized in focal adhesions whereas ITBb does not and exhibits distinct properties [22]. Human ITB1 isoforms are differentially expressed in tissues and exhibit distinct binding properties. HumanITB1a is widely expressed and usually coexpressed with other isoforms with a more restricted distribution. ITB1b is expressed in skin, liver, skeletal muscle, cardiac muscle, placenta, umbilical vein endothelial cells, neuroblastoma cells, lymphoma cells, hepatoma cells and astrocytoma cells. ITB1c is expressed in muscle, kidney, liver, placenta, cervical epithelium, umbilical vein endothelial cells, fibroblast cells, embryonic kidney cells, platelets and several blood cell lines, whereas ITB1d is expressed specifically in striated muscle (skeletal and cardiac muscle).  Table 1 for sources of beta integrin cytosolic domain sequences: * shows identical residues for ITB subunits; : similar alternate residues; . dissimilar alternate residues; α-helix for vertebrate ITB sequences is in shaded yellow; β-sheet is in shaded grey; colors for amino acids are shown as: basic (R and K); acidic (D and E); neutral hydrophilic (G, Y, Q, S, T, N, Y, C, H); and hydrophobic (M, A, F, I, L, W, P, V); the Cyto-1, Cyto-2 (NPXY) and Cyto-3 (NXXY) domains are shown in dotted lines (see text for details).
The cytoplasmic tail of the ITB4 subunit is exceptionally long (1072 residues) compared to other ITB subunits [33] that are much shorter (Figure 3). Point mutation analysis of the cytoplasmic sequences of these β integrin subunits reveal three clusters of amino acids in the β cytoplasmic tail that regulate the interaction of integrins with the cytoskeleton, localization of receptors at the adhesion complex and inside-out signaling [12][13][59][60][61][62]. These three clusters of amino acids (Signalins) are commonly known as cyto 1, cyto-2 and cyto-3: cyto-1 is present in the vicinity of transmembrane domain, whereas cyto-2 (NPXY motif) and cyto-3 (NXXY motif) are located in the proximal and distal regions respectively of a tail ( Figure 2) [63]. Alignment results of ITB1 subunits of different species ( Figure 2) and spliced versions of ITB1subunits ( Figure 3A) show that the cyto-1 residues remain highly conserved indicating their conservation during vertebrate evolution for their specificity in function. Recent studies have shown that the interaction between the conserved arginine residue in the α-tail and aspartate residue in the β-tail, and by the hydrophobic residues immediately N-terminal to the arginine and aspartate residues play important role in 'inside-out' signalling by forming a 'clasp' between the α and β subunits [64][65].
The cyto-3 sequence in contrary varied amongst the spliced versions of ITB1 and the different ITB subunits (Figure 3). Therefore, the variability in the functions of different spliced versions of ITB1 ( Figure 3A) and amongst different ITB isoforms ( Figure 3B) may be derived from the differences in the cyto 2 and cyto 3 sequences. Moreover, each β subunit conceals distinct differences in its affinity towards intracellular proteins that is shown to be dictated by the 'X' and the neighboring amino acids of these motifs [66]. For instance, the binding of ICAP-1α, a 200amino acid protein, with the cyto-3 is influenced by the proximal Val787 and Val 790 [66][67][68]. The NPXY and NXXY motifs, with the propensity to form β turns, act as canonical recognition sequences of intracellular proteins with phosphotyrosine-binding domains (PTB) [66]. These include the interaction of β1A tail with the PTB domain of talin, EPS8 and Dab1; β2 tail with Dok-1 and talin; β3 tail with Numb, Dab1, EPS8, Tensin, Dok-1 and talin; β5 tail with Numb, Dab1, Dab2, EPS8, Tensin, Dok-1 and talin and the β7 tail with tensin, Dok-1 and Talin [67].
Recent studies have reported that cytosolic proteins kindlin-1, 2 and 3 are essential for integrin activation [68][69]. Immuno-precipitation assays with β integrin tails show that isoforms of kindlins bind with membrane proximal NPXY and membrane distal NXXY motifs as well as neighboring residues (NPXY linker region) of the β integrin subunit [69]. Several other cytosolic proteins (including filamin, melusin and myosin) also bind both conserved and non-conserved domains of β cytoplasmic tails [70]. Therefore, differences in the residues within and around these motifs in vertebrate β integrin subunits may change the affinities of these cytosolic proteins with the β integrin tail. Phosphorylation of the tyrosine residue in the distal NXXY motif of the β3 subunit disrupts the recognition by kindlin-2 and co-activation of aIIb.b3 integrin by talin [71][72]. The phosphoryalated or unphosphorylated state of tyrosines may also determine the affinities of proteins with the cytosolic tail.
The unphosphorylated state of Y747 in the of β3 integrin tail has a 3 fold preference for the talin over the PTB domain of Dok1, whereas with the phosphorylated state of Y747, this affinity is increased 400 fold for Dok1 and decreased 2 fold for talin [73]. A recent study shows that phosphorylation of tyrosine 759 inhibits binding of kindling-2 with the C-terminal β3 chain [71]. Thus the expression patterns of different β subunits ( Figure 5) and interacting proteins in the cytosol as well the phosphorylation state of 'Y' may determine the functional output of the integrin receptors. Figure 4 shows the predicted structures of mRNAs for human ITB transcripts for the major isoform in each case [57]. The transcripts were 3.0-9.2 kbs in length and exhibited distinct exonic structures in each case, including extended 3'-untranslated regions (UTR), especially for ITB3a, ITB6a and ITB8a transcripts. The number of ITB introns varied widely among the vertebrate genes examined: the ITB4 gene contained the largest number of introns (39) followed by ITB1 (16), ITB2 and ITB7 (15), ITB3, ITB5 and ITB6 (14) and ITB8 (13). The human ITB genome sequences contained several predicted transcription factor binding sites (TFBS), microRNA sites located in the 3'-untranslated region and CpG islands, which included CpG158, CpG92, CpG91, CpG152 and CpG133 located in the 5'-untranslated region of human ITB1, ITB3, ITB4, ITB5 and ITB8, respectively (see Table 2). These CpG islands within the ITB gene promoters may play major contributing roles in maintaining high levels of gene expression (1.4-6.1 times the average for human genes) [57] which are similar to CpG islands within housekeeping gene promoters expressed in most tissues [74]. Large numbers of TFBS sites were observed for most of the human ITB genes examined, including 51, 56 and 105 such sites for ITB4, ITB6 and ITB8, respectively. Of particular significance for the human ITB1 and ITB3 gene promoters is the transcription factor, HoxD3, that binds directly to these promoters and assists in regulating the expression of integrins α5β1 and αVβ3 during angiogenesis [75]; the PPARα (peroxisome proliferator-activated receptor-α) that regulates gene expression in vascular cells and inhibits TGF (transforming growth factor)-β-induced ITB5 transcription [76] and Hox A10, that directs the regulation of the ITB3 gene in human endometrial cells and regulates transcription of ITB3 during myeloid differentiation [77][78]. Moreover, the genes encoding the integrin subunits β7, β3, β6 and β8 map to 12q13.13, 17q21.32, 2q23-q31 and 7p15-p21 positions respectively which are close to HOXC, HOXB, HOXD and HOXA genes suggesting a common divergence of these genes during vertebrate evolution [79]. Several microRNA (miRNA) binding sites within the 3'-untranslated region (3'-UTR) of human ITB mRNA were also identified ( Table 2). These microRNA species are phylogenetically conserved and regulate mRNA and protein expression during embryonic development [80][81]. MiRNA 183, for example, inhibits tumor invasiveness and participates in the development and function of neurosensory organs by targeting the ITB1 (mRNA) gene [82] whereas ITB3 gene expression is apparently regulated by miRNA let-7a in malignant melanoma [83]. Over-expression of mir-124 attenuates endogenous ITB1 expression in oral squamous cell carcinomas [84] while microRNA miR-93 promotes tumor growth and angiogenesis by decreasing ITB8 transcripts [85]. The number of microRNAs that target the 3' UTR of human ITB transcripts (4 for ITB1, 44 for ITB2, 5 for ITB3, 1 for ITB4, 53 for ITB5, 11 for ITB6, 30 for ITB7 and 2 for ITB8) varies widely among the human ITB genes examined. The absence of redundancy among this wide range of microRNA species regulating the levels of human integrin subunits suggests that the evolution of the C-terminal non-coding regions of these subunits followed a divergent path for the purpose of regulating the levels of expressions of each ITB subunits in different cells. The regulation of ITB4 by a single microRNA further suggests that the expression of this subunit is not intensely regulated at the post-transcriptional stage in comparison with the other human ITB genes.

Human ITB genes: Introns, Isoforms and Predicted Regulatory Regions
Brendle and coworkers [86] have also examined single nucleotide polymorphisms (SNPs) in predicted miRNA sites for several ITA and ITB genes and the potential association of these SNPs with breast cancer risk (BCR) and reported a potential BCR marker for one of the ITB4 miRNA binding sites. A likely mechanism for mi-RNA translational regulation has been recently reported [87]. MicroRNAs have been shown to be transcribed as long primary-miRNAs (pri-miRNAs) in the nucleus and processed in the cytoplasm into 19-22 bp mature mi-RNAs which anneal to the 3'-UTR of target mRNAs to promote degradation or translational repression [88]. Moreover, considerable flexibility has been reported for mi-RNAs which are capable of targeting hundreds of genes while individual 3'-UTR mi-RNA regions may be a target for several distinct mi-RNAs [89][90]. The miRNA sequences within the 3'-UTR of human ITB genes are therefore likely to play a major role in regulating the translation of these genes within vertebrate tissues. Figure 5 presents 'heat maps' showing comparative Itb gene expression for various mouse tissues obtained from GNF Expression Atlas Data using GNF1M chips (http://genome.ucsc.edu; http://biogps.gnf.org) [91]. These data supported a broad and high level tissue expression for mouse Itb7, including during early embryonic development. A very high level of expression for Itb2 and Itb7 in bone marrow, spleen and lymphocytes are consistent with their involvement in forming the integrin receptors in blood cells [92]. The Itb4 expression was highest in epidermal tissues and is consistent with its presence in the hemidesmosomes of these epithelial cells [93][94]. It may be noted that ITB4 pairs only with the α6 subunit forming a laminin-binding receptor providing stable adhesion of epithelial cells with the basement membrane [95][96]. Other comparisons of mouse Itb tissue expression indicated significant differences, including higher levels of Itb3 expression in bone but with lower expression levels for Itb8 in most tissues examined. Overall, mouse Itb tissue expressions levels were up to 3.7 times the average level of gene expression [57] which supported key roles played by these membrane receptor proteins which serve various cell adhesion roles in tissue repair, hemostasis, immune response, embryogenesis and metastasis [92]. Similar tissue distribution profiles for ITB gene expression were observed for human tissues, including an overall high level gene expression ranging from 0.8-6.1 times the average level of human gene expression (Table 2).

Evolution of Vertebrate ITB Genes and Proteins
A phylogenetic tree ( Figure 6) was calculated by the progressive alignment of 61 vertebrate ITB amino acid sequences with vertebrate ITB1-8 sequences which was rooted with the Caenorhabitis elegans (nematode) ITB-like sequence (see Table 1). The phylogram showed clustering of the ITB sequences into groups which were consistent with their evolutionary relatedness, as well as groups for each of vertebrate ITB1-ITB8 which were distinct from the nematode ITB-like sequence. These groups were significantly different from each other (with bootstrap values of >90) and showed closer relatedness for the following ITB gene groupings: group 1: ITB1-ITB2-ITB7; group 2: vertebrate ITB4 with the elegans ITB-like sequence (PAT3); group 3: ITB3-ITB5-ITB6; and group 4: ITB8, which is the most distinct group in terms of its relatedness to other ITB gene families. It is apparent from this study of vertebrate ITB genes and proteins that these are ancient proteins for which a proposed common ancestor for the ITB genes may have predated the appearance of fish >500 millions of years ago [96].
Among the ITB integrin genes examined, the ITB4 integrin subunit gene related most closely with the C. elegans (nematode) PAT3 sequence indicating that it may represent the primordial vertebrate beta integrin gene and the first to appear in the vertebrate ancestor. The ITB4 differs from other ITB subunits. It is unusually longer (1778 residues) compared with other integrin β subunits and contains a long amino-terminal (683 aa) and cytosolic (1072 aa) domains [33]. The extracellular domains of β4 subunit showed low identity (~35%) with other β integrin subunits. Moreover, the transmembrane domain of the ITB4 subunit is poorly conserved and is exceptionally long [92,[97][98]. Figure 6. Phylogenetic tree of vertebrate beta integrin cytosolic domain amino acid sequences. The tree is labeled with the ITB name and the name of the animal and is 'rooted' with the Caenorhabitis elegans (nematode) ITB-like sequence (see Table 1). Note the 7 major clusters corresponding to the ITB1, ITB2, ITB3, ITB4, ITB5, ITB6, ITB7 and ITB8 gene families. A genetic distance scale is shown (% amino acid substitutions). The number of times a clade (sequences common to a node or branch) occurred in the bootstrap replicates are shown. Only replicate values of 90 or more which are highly significant are shown with 100 bootstrap replicates performed in each case.

Vertebrate ITB Integrins: Functional and Comparative Aspects
Different integrin beta receptor proteins are known to interact with at least 22 different ligands and matrix proteins [14,92,[99][100][101][102] that are summarized in Table 3. Of these, ITB1 pairs with the largest number of ligands (12 ligands) followed by ITB2 and ITB3 (7 ligands each), ITB5, ITB6 and ITB7 (3 ligands each), ITB8 (2 ligands) and ITB4 (1 ligand). Moreover, ITB1 pairs with the largest number of α subunits (fourteen) followed by ITB2 (four), ITB3 and ITB7 (two), and ITB4, ITB5, ITB6 and ITB8 (single α subunit). The ITB4 subunit pairs only with the ITA6 subunit forming α6β4 as the sole integrin receptor of the hemidesmosomes, a structural component, that is required for the attachment of cells with the basal lamina [103][104][105]. The genes and proteins of hemidesmosomes [107] date back to metazoans/holozoans, suggesting that the attachment of unicellular life forms on the basal lamina via the hemidesmosomes possibly initiated the formation of multicellular organisms with the evolution of other cell-cell junction components (tight, adherent, desmosmal and gap junctions). The tissue specific expression of integrin ITB subunits ( Figure 6) showed that mammalian Type I hemidesmosomes are found in the epithelial cells of skin, mouth and esophagus whereas Type II hemidesmosomes are found in the intestinal epithelial cells [93]. This is consistent with the high expression of ITB4 in epithelial cells. While the ITB4 subunit in the α6β4 integrin receptor primarily plays a role in the formation of stable adhesions of epithelial cells with laminin-332, recent studies have suggested an additional role in the migration of keratinocytes and cancer cells [106,108]. Prior to migration, keratinocytes lose their stable adhesion mediated by hemidesmosomes and migrate over collagen and then secrete a provisional matrix of laminin-332 for its motility [109]. The cancer cells also require laminin-332 to migrate [110]. It is now known that the proteolytic cleavage of laminin-332 triggers cell motility of cells via the α6β4 receptor [111]. Other evidence suggests that the migration on laminin-332 is indeed mediated by the α3β1 integrin rather than the α6β4 integrin which actually has transdominating inhibiting effects on migration mediated by the α3β1 integrin [112]. Overall these reports indicate that the ancestral role of integrin in forming stable adhesions of epithelial cells via hemidesmosome might have evolved to support migratory roles of cells by the introduction of additional integrin receptors to perform specialized functions. In this regard, the evolution of the ITB1 subunit from the primordial ITB4 may have played a significant role in influencing cell migration. The transmigration of blood cells across the endothelial layers, a highly specialized function mediated by the integrins, may be associated with the evolution of receptors αvβ3 and those formed by the association of the ITB2 subunit with αL, αM, αX or αD subunits [113][114][115][116][117][118][119].
The extracellular domains of both α and β subunits of a receptor interact with wide spectrum of ECM molecules (Table 3) to perform various cellular functions. This suggests that these receptors may have evolved along with the evolution of ECM molecules for performing diverse functions in the context of presence or absence of specific ligands. Consequently, the ITB1 subunit may be the most promiscuous of all of the vertebrate β subunits as it pairs with the largest number of α subunits, and these alpha/beta1 heterodimers also interact with a large number of ligands. This is consistent with the observation that ITB1 like subunits had already diverged in the earlier stages of metazoans (corals and sponges) [120]. Therefore, the clues to the evolution of different vertebrate integrin receptors may lie in their evolution to interact with different ECM molecules. However, in the absence of comprehensive information on the different domains/motifs of the ECM molecules that interact with the specific domains of different integrin receptor, further conclusions may not be derived. Nevertheless, the clues to the evolutionary proximity amongst different β subunits might be found in their ability to pair with common α subunit/s, since these β subunits are likely to preserve domain/s that determine their ability to associate with similar α subunit/s or vice versa [121]. With this notion and based on the overlapping subunit compositions of functional integrin receptors (Figure 7), it is predicted that ITB1 that shares the sole alpha subunit (α6) with ITB4, is the closest to the ancestor ITB4. The ITB1 evolved to pair with the largest number of alpha subunits ( Table 3) including α4 that is shared with ITB7, and the αv subunit that is shared with ITB3, ITB5, ITB6 and ITB8. Therefore, the cluster containing ITB3, ITB5, ITB6 and ITB8, and the cluster consisting of ITB7 and ITB2 may have been derived directly from ITB1. The origin of ITB5, ITB6 and ITB8 from ITB3 (rather from the versatile ITB1) is less likely because ITB3 is the most specialized of this cluster and is expressed in both blood cells (platelets) and other cell types such as placental trophoblast and cancer cells [122][123][124]. In contrast, ITB5, ITB6 and ITB8 including ITB1 are not expressed in blood cells. The ITB2 and ITB7 subunits, that constitute solely the integrins of hematopoietic and immune system [125], and specifically ITB2 that does not share an α subunit with other ITB subunits, are likely to be the most specialized ITB subunits. The αLβ2 mediates migration of T-Cells across the endothelium (invasion or transmigration) and the α4β7expressed on memory T cells directs their trafficking to the sites of inflammation. An analysis of α subunit sharing by different ITB subunits suggests that evolution of the ITB1 subunit led to the emergence of two groups of ITB subunits, one consisting of ITB3, ITB5, ITB6 and ITB8 subunits and the other consisting of ITB7 and ITB2. This conclusion from the subunit sharing concept (Figure 7) is very similar to our phylogenetic analysis data that suggests that ITB1-ITB7-ITB2 belong to one cluster and the ITB3-ITB5-ITB6 as another cluster. A previous phylogenetic study on ITBs [16] supported ITB1-ITB7-ITB2 as one cluster and the ITB3-ITB5-ITB8 as another cluster. Therefore, two phylogenetic analyses differed by one subunit (ITB6/ITB8) in their second cluster but both found ITB4 either an outlier or an ancestral integrin subunit. The ITB8 is found to be a distinct member of ITBs in our study whereas in the previous study it was found to diverge from the ITB6 earlier in evolution. The subunit pairing concept (Figure 7), however, groups the ITB8 subunit belonging to cluster 2 of both studies together (ITB3-ITB5-ITB6-ITB8) which is consistent with a previous report [15]. This concept shows two lines of evolution diverging from β1, one towards blood cell integrins consisting of β2 or β7 subunits and the other towards a cluster consisting of β5, β6, β8 and β3 subunits that are primarily expressed in tissues other than blood with exception of αIIb.β3 that is expressed also in blood platelets (see text).

Vertebrate ITB Gene and Protein Identification
BLAST (Basic Local Alignment Search Tool) studies were undertaken using web tools from the National Center for Biotechnology Information (NCBI) (http://blast.ncbi.nlm.nih.gov/Blast.cgi) [127]. Protein BLAST analyses used human and mouse ITB amino acid sequences previously described (Table 1). Non-redundant protein sequence databases for several vertebrate genomes were examined using the blastp algorithm, including human (Homo sapiens) [128]; horse (Equus caballus) [129]; mouse (Mus musculus) [130]; opossum (Monodelphis domestica) [131]; chicken (Gallus gallus) [132]; frog (Xenopus tropicalis) (http://genome.jgi-psf.org/Xentr3/Xentr3.home.html); zebrafish (Danio rerio) (http://www.sanger.ac.uk/Projects/D_rerio/); and nematode (Caenorhabditis elegans) (http://genome.ucsc.edu/). This procedure produced multiple BLAST 'hits' for each of the protein databases which were individually examined and retained in FASTA format, and a record kept of the sequences for predicted mRNAs and encoded ITB-like proteins. These records were derived from annotated genomic sequences using the gene prediction method: GNOMON and predicted sequences with high similarity scores for human ITB. Predicted ITB-like protein sequences were obtained in each case and subjected to analyses of predicted protein and gene structures.
BLAT analyses were subsequently undertaken for each of the predicted ITB amino acid sequences using the UCSC Genome Browser (http://genome.ucsc.edu/cgi-bin/hgBlat) [45] with the default settings to obtain the predicted locations for each of the mammalian ITB genes, including predicted exon boundary locations and gene sizes. Structures for human and mouse isoforms (splicing variants) were obtained using the AceView website to examine predicted gene and protein structures (http://www.ncbi.nlm.nih.gov/IEB/Research/Acembly/index.html?human) [57].

Prediction of Signal Peptide Sequence and the Secondary Structure of Human Vertebrate ITB Proteins
FASTA sequence of different human  integrin amino acid sequences were subjected to SignalP 3.0 Server (http://www.cbs.dtu.dk/services/SignalP) [133] to determine the number of amino-acids and the predicted secondary structures in the N-terminal end of the ITGB isoform involved in the formation of the signal peptide. The secondary structures of each signal peptide were determined using a SWISS-MODEL workspace (http://swissmodel.expasy.org) [134].

Comparative Human Beta Integrin (ITB) Expression
The UCSC Genome Browser (http://genome.ucsc.edu) [45] was used to examine GNF Expression Atlas 2 data using various expression chips for human ITB genes (http://biogps.gnf.org) [91]. Gene array expression 'heat maps' were examined for comparative gene expression levels among human and mouse tissues showing high (red); intermediate (black); and low (green) expression levels.

Comparative CpG Islands, Transcription Factor Binding Sites (TFBS) and microRNA Sequences of Human Beta Integrin Genes (ITB)
The UCSC Human Genome Browser (http://genome.ucsc.edu) [45] was used to examine the comparative location, number and sequences for human CpG islands, transcription factor binding sites (TFBS) and microRNA sites located in the 3'-untranslated region (UTR) of human ITB genes in association with the TargetScan website (http://www.targetscan.org).

Phylogeny Studies and Sequence Divergence
Alignments of vertebrate ITB-like and nematode (Caenorhabditis elegans) PAT3 protein sequences were assembled using BioEdit v.5.0.1 and the default settings [137]. Alignment ambiguous regions were excluded prior to phylogenetic analysis yielding alignments of 370 residues for comparisons of vertebrate ITB sequences with the nematode PAT3 (beta-integrin homolog) sequence (Table 1). Evolutionary distances were calculated using the Kimura option [138] in TREECON [139]. Phylogenetic trees were constructed from evolutionary distances using the neighbor-joining method [140] and rooted with the nematode PAT3 sequence. Tree topology was reexamined by the boot-strap method (100 bootstraps were applied) of resampling and only values that were highly significant (≥90) are shown [141].

Conclusions
Bioinformatic analyses of the integrin genes and proteins in vertebrates revealed a high degree of diversity in terms of their chromosome locations, alternate splicing, transcriptional and post-transcriptional regulations, and tissue specific expressions. Results suggested that the evolution of integrins within vertebrates followed a divergent path for these genes and protein structures but with common functions specializing towards adhesion, migration and transmigration of cells in succession. Our phylogenetic analysis revealed for the first time that ITB4 (encoding the β4 integrin) is the most likely ancestral form of integrin β-like genes. This subunit has inherited the ancestral role for β-integrins in forming simple adhesions (hemidesmosomes) in vertebrate cells similar to unicellular organism and is also involved in the migration of transformed (cancer) cells [7]. The subunit sharing analysis of ITB subunits reveals that β2 and β7 subunits that are expressed only in the cells of hematopoietic and immune system are possibly the most specialized forms of integrins.