- freely available
Genes 2014, 5(1), 33-50; doi:10.3390/genes5010033
Published: 23 January 2014
Abstract: The centromere is the chromosomal locus essential for chromosome inheritance and genome stability. Human centromeres are located at repetitive alpha satellite DNA arrays that compose approximately 5% of the genome. Contiguous alpha satellite DNA sequence is absent from the assembled reference genome, limiting current understanding of centromere organization and function. Here, we review the progress in centromere genomics spanning the discovery of the sequence to its molecular characterization and the work done during the Human Genome Project era to elucidate alpha satellite structure and sequence variation. We discuss exciting recent advances in alpha satellite sequence assembly that have provided important insight into the abundance and complex organization of this sequence on human chromosomes. In light of these new findings, we offer perspectives for future studies of human centromere assembly and function.
The centromere is the chromosomal locus that controls chromosome segregation during cell division. Visually, the centromere appears on metaphase chromosomes, at least in metazoans that have excellent cytology, as a primary constriction. This is also the site of kinetochore assembly, the multi-protein structure that forms to coordinate attachment to and movement of chromosomes along microtubules. The proteins associated with centromeres are conserved among species, consistent with the functional significance of the locus. A surprising feature of centromeres is that the DNA sequences present at the locus are dissimilar, not only among organisms but often within the same organism. However, protein components of centromeres are shared among species, suggesting an epigenetic basis for centromere assembly. Such centromere proteins (CENPs) include CENP-A, CENP-C, and CENP-T that are important for structural and functional aspects of the centromere and kinetochore. CENP-A is of particular significance since it is a histone H3 variant that contributes to specialized chromatin at centromeres. The Holliday Junction Recognition Protein HJURP and its fungal homolog Scm3 are chaperones that direct the loading of CENP-A into chromatin primed by the Mis18 complex and ensure propagation of epigenetically marked centromeric nucleosomes (reviewed by [1,2]).
Despite the lack of sequence identity, many centromeres are located in regions of repetitive DNA or satellites. In humans, repetitive alpha satellite DNA defines the centromere. The sequence basis of centromere identity is widely debated, since variant centromeres have been identified in humans and other organisms. These unusual centromeres include neocentromeres, new centromeres that are formed on unique or non-centromeric DNA sequences [3,4]. Dicentric human chromosomes, those chromosomes that are formed by fusion or translocation, have two regions of centromeric DNA, but often only one is the site of kinetochore formation. In these instances, the alpha satellite DNA appears to be neither necessary nor sufficient for centromere function. Nevertheless, other evidence exists that supports the importance of DNA sequence in centromere formation in humans, particularly de novo centromere assembly. In this review, we will discuss advances in our understanding of human centromeric DNA, from the discovery of human centromeric sequences through integration of physical and genetic maps of centromeres during the Human Genome Project era to the first centromeric genome assemblies that are only now emerging.
2. Alpha Satellite DNA: Discovery, Organization, and Variation
Human centromeres, and in fact most primate centromeres, are composed of alpha satellite DNA . This sequence is thought to be important for centromere function since it is present at the primary constriction of all human chromosomes. It comprises up to 5% of the genome. Alpha satellite is based on a 171 bp monomer arranged in a tandem, head-to-tail fashion. Individual monomers share 50%–70% sequence identity. An integral number of monomers give rise to a higher order repeat (HOR) unit that is itself repeated in a largely uninterrupted fashion so that within a given centromeric locus, the alpha satellite array can span from 250 to 5,000 kb. Such re-iteration of the HOR gives rise to a homogenized alpha satellite array in which the HORs differ in sequence by only a few percent (Figure 1), even though the constituent monomers show much less sequence similarity [6,7]. Some monomers within the HORs contain a 17 bp sequence called the CENP-B box, a motif that is recognized by the DNA-binding centromere protein CENP-B . Outside of the higher order arrays, monomers are randomly arranged and span the region between the homogeneous array and the chromosome arm .
Variation within the alpha satellite is common and complex. Each chromosome type is defined by an alpha satellite array in which the multimers of the HOR contain a particular number of tandem monomers [7,10,11]. The homogeneity of HORs of the same monomer number makes the alpha satellite array chromosome-specific and distinguishable from related sequences at other centromeres. Certain chromosomes share greater homology of HORs based on monomer subtypes and organization, allowing them to be classified into one of three suprachromosomal families . Diverged monomeric alpha satellite falls into two additional suprachromosomal families . On a given chromosome type, the number of times the HOR is re-iterated is heterogeneous, spanning hundreds to thousands of copies. Consequently, total array size on a given chromosome varies between homologs and among individuals (Figure 2) [14,15,16,17,18]. Although array sizes can be as small as a few hundred kilobases or as large as five megabases [16,19,20], the range appears to be less extensive on a particular chromosome type . Array size polymorphisms are largely stable through meiosis since they can be efficiently tracked through multigenerational families . These polymorphisms make alpha satellite a useful centromeric marker for tracking inheritance of individual chromosomes.
Additional alpha satellite variation exists at the level of the HOR. On a given chromosome, the primary HOR unit can exhibit size polymorphisms that are most likely the result of deletions caused by unequal exchange [22,23]. Human chromosome 17 is a good example of HOR polymorphism within the D17Z1 alpha satellite array. The predominant HOR on D17Z1 is a 16-monomer (16-mer) [18,22]. However, less prevalent 15-mer and 14-mer HORs are present on many D17Z1 arrays, as well as 13-mers and 12-mers [22,24]. Within this group, the 13-mer is the most abundant after the 16-mer. These size polymorphisms create centromeric haplotypes, with the 16-/15-/14-mer comprising a haplotype found on 65% of chromosome 17 s and the additional 13-mer present on 35% of chromosome 17 s [22,25]. A recent study that evaluated centromere assembly on multiple chromosome 17 s suggested that HOR variants might have different functional capacities . This possibility, however, remains to be formally tested in an independent functional assay.
3. Functional Studies that Have Defined Genomic Centromeres
The strongest evidence for implicating alpha satellite DNA in human centromere function came from two lines of chromosome engineering experiments that took “bottom up” and “top down” approaches (Figure 3). In the “top down” strategy, telomere-mediated chromosomal truncation was used to modify the X chromosome or Y chromosome, some of which had been transferred to either rodent somatic cell hybrids or DT40 chicken cells (Figure 3B) [27,28]. Because DT40 cells are proficient in homologous recombination, targeted seeding of the telomere truncation constructs accelerated the deletion process. Multiple rounds of telomere truncation generated a series of deleted chromosomes, each containing less X or Y chromosome material (Figure 3B). The stability of the minichromosomes was monitored and those that maintained the least amount of the original chromosome but were still mitotically stable were concluded to contain the minimal sequence(s) necessary for centromere function. In both truncated X and Y chromosomes, minichromosomes containing alpha satellite DNA arrays DXZ1 and DYZ3, respectively, equated with the most stable linear minichromosomes. These studies strongly implicated alpha satellite DNA as the sequence that corresponds to centromere function and chromosome stability.
However, it could be argued that in the top-down studies the centromere, once established on any sequence, stays at that sequence, and does not shift with truncation. At the same time, pioneering experiments were being developed by two groups to take a “bottom up” approach to define the sequences required for centromere function. In these studies, alpha satellite sequences were introduced into linear yeast artificial chromosome (YAC) or circular bacterial artificial chromosome (BAC) vectors (Figure 3A). Hunt Willard’s group created first generation artificial chromosomes from synthetic alpha satellite arrays . One higher order repeat from D17Z1 (chromosome 17) or DYZ3 (chromosome Y) was amplified through successive rounds of directional cloning to yield a 1Mb array that was inserted into a BAC vector. Introduction of these artificial chromosome assembly constructs by liposome-mediated transfection into the HT1080 cell line yielded clones that contained a microchromosome or human artificial chromosome (HAC). The HACs recruited centromere proteins and were stable in mitosis for at least 6 months. Careful analysis of the HACs showed that the D17Z1 HACs were completely derived from the input construct. However, the DYZ3-derived HAC had acquired additional alpha satellite sequences from host chromosomes. The functional significance of the inability of DYZ3 to form a functional HAC containing only Y centromere sequences was not fully appreciated at the time. Subsequent studies shed light on the correlation between DYZ3 sequence and its competence for de novo centromere assembly (see below).
At the same time that the Willard group was creating HACs from synthetic alpha satellite DNA, Howard Cooke’s and Hiroshi Masumoto’s groups were collaborating to clone large alpha satellite arrays from human chromosome 21 into linear YAC vectors. In their studies, the higher order array α21-I and the unordered monomeric array α21-II were introduced into HT1080 cells and compared for de novo centromere competency . Only YACs containing the α21-I HOR array were capable of forming mitotically stable HACs that properly assembled centromere proteins. These innovative studies complemented those of the Willard group, and contributed important structure-function information that implicated HOR alpha satellite as a preferred substrate for de novo centromere assembly. In the time that has elapsed since these groundbreaking experiments, additional studies have established HACs as models for testing the genomic (and epigenetic) requirements for de novo centromere assembly and function. Circular BAC and PAC vectors, rather than linear YACs, are the most useful assembly vectors and are associated with high rates of HAC formation [31,32]. Not all alpha satellite arrays translate to HAC formation. Y chromosome alpha satellite DNA (DYZ3) lacks CENP-B boxes and is unable to efficiently form de novo centromeres on HACs [29,32]. Furthermore, arrays containing mutated CENP-B boxes cannot form de novo HACs . Thus, the presence of CENP-B binding sites is required for centromere assembly. This has been a perplexing finding, given that the Y chromosome clearly assembles a functional centromere and recruits essential centromere proteins. These findings hint at key differences between de novo versus established centromere function that are not well understood.
Initial studies that tested the ability of alpha satellite to nucleate functional centromeres introduced cosmids containing human alpha satellite DNA from chromosome 17 into African green monkey (AGM) cells . These experiments did not result in supernumerary chromosomes or HACs, but instead, integration of the alpha satellite construct into AGM chromosomes (Figure 3A). Indeed, up to 60% of HAC constructs introduced into human cells integrate into the genome rather than forming an independent chromosome. While some might point out that this argues against the case for sequence-dependent centromere assembly, another interpretation is that de novo chromosome assembly and de novo centromere formation are two different processes. Indeed, some integrated alpha satellite arrays recruit centromere proteins [34,35], although they may not retain some or all of the proteins long-term. At the very least, both integrated and free-lying HAC studies suggest that alpha satellite provides sequence information for some aspects of centromere function.
Contemporary studies are now using centromere-based chromosome engineering to create a new generation of HACs that contain alpha satellite in addition to tetracycline operator (tetO) sequences . The tetO sequences are bound with high affinity by the tet repressor (tetR) that can be fused to different proteins in order to manipulate the chromatin or protein composition of the HAC [37,38]. In this way, centromere assembly on the alpha satellite can be enhanced or inhibited, the long-term stability of the HAC can be monitored by tethering tetR fluorescent protein fusions, and expression of genes included on the HAC can be tested .
4. Centromere Regions in the Human Genome Project Era
As the understanding of the relationship between alpha satellite DNA and centromere function emerged at the end of the 20th century, it led to a call for the identification and mapping of functional centromere sequences . However, the nature of alpha satellite, with its megabase-scale regions of higher-order repetitive structure, made it highly refractory to sequencing and assembly . As the Human Genome Project (HGP) rapidly increased the sequence information available for testing human genome function, gains were largely not seen at the pericentromeres and centromeres of most human chromosomes. A 1998 plan for the project that outlined the HGP’s goal for a 2001 working draft and a 2003 final draft acknowledged that “the small proportion of highly repeated sequence represented by the centromeres and other constitutive heterochromatic regions of the genome” might not be included in the final reference assembly . A contemporary perspective on the plan warned of the possibility that potentially important duplications and tandem repeats would be “swept under the carpet”. There was a repeated call for at least some centromeric regions to be characterized in order to confirm that their structure was as homogenous as originally claimed . But again, due to the computational complexity required to accurately assemble such highly repetitive regions, few labs attempted to close these sequence gaps [44,45,46]. A decade later, multi-megabase-sized gaps remain at the centromeres of most chromosome assemblies. This problem is not exclusive to the human genome, since centromere and pericentromere sequence gaps in other organisms such as mouse and Drosophila remain unclosed [47,48,49,50]. Only in the past year have advances in sequencing technologies and innovative computational efforts focused on elucidating alpha satellite structure helped to make a full understanding of the genome and some of its most critical elements a real possibility [51,52].
5. Linking Physical and Genetic Maps of Human Centromeres
By the late 1990s and early 2000s, several groups had pushed forward the centromere field by producing integrated physical and genetic maps of centromere regions including chromosomes X, 5, and 12 [53,54,55]. These studies used pulsed-field gel electrophoresis to estimate physical alpha satellite array sizes and either radiation hybrid or linkage analyses to estimate genetic distance across the centromere. In addition to confirming the repression of recombination across centromeres, the integrated maps that resulted allowed for the anchoring of alpha satellite regions to existing genomic maps, and sometimes identified unique pericentric sequences that had not been represented in the human genome drafts .
Of the sequence assemblies around the centromere that do exist, the pericentric regions are the best characterized. Within these regions, a high proportion of segmental duplications have accumulated [44,56]. Many pericentric duplications corresponding to unmapped regions of the genome were identified using monochromosomal somatic cell hybrids and PCR or FISH with known pericentric sequences and genomic BACs that recognized paralogous sequences across the genome . Genome-wide analysis of the January 2001 draft assembly further revealed pericentromeric and subtelomeric enrichment for duplicated sequences, and showed that such sequences were frequently present in unmapped or misassembled segments . The discordance between FISH and BLAST results in these analyses was much higher than the genome-wide rate reported in the same year . Together, these studies demonstrated the importance of elucidating highly duplicated pericentric regions in order to accurately understand the Human Genome Project’s results. More recent progress was made in assembling “inaccessible” regions by using linkage disequilibrium analysis of genetically distinct (admixed) genomes to map almost 20 Mb of sequence near centromeres . As the number of admixed genomes available for analysis increases, this powerful technique is expected to reduce the gaps in the current reference assembly.
6. Correlating the Genetic and Functional Centromere
In 2001, a major breakthrough in reaching beyond the boundaries of alpha satellite occurred when chromosome X short arm (Xp) genomic clones were mapped into the homogenous higher order DXZ1 array . This tour-de-force used combined in silico and high-stringency BAC clone screening to demonstrate that even in higher order alpha satellite, enough sequence variation existed to assemble a contig extending almost half a megabase from the satellite boundary towards the centromere core that is the location of the functional centromere (Figure 4). This study revealed that heterogeneity of alpha satellite DNA increased with more distance from the DXZ1 core. These studies permitted the definition of transitions between the higher order alpha satellite and flanking regions. Monomers of alpha satellite DNA that are not ordered into multi-monomer repeat units are located directly outside of the homogenous HOR domain . These monomers exhibit enough sequence variation that they can be more easily assembled and in fact represent most of the alpha satellite that exists in the human reference assembly [60,61]. The monomeric alpha satellite regions show greater sequence dissimilarity and more interspersed elements, such as L1 sequences, as they approach the chromosome arms. Currently, HSAX and HSA8 are the only human chromosomes represented in the genome assembly with contiguous sequence from higher-order alpha satellite to both arms [62,63].
Subsequent to these findings, several groups began analyzing alpha satellite at increasing sequence depth, discovering new alpha satellite polymorphisms and repeat organization. Building on the work of previous decades, targeted sequencing of several well-characterized arrays was performed. The high copy number of alpha satellite HORs on each chromosome permitted analysis of intra-homolog SNPs in addition to inter-individual variation that was paired with restriction digestion for haplotype analysis . These studies revisited the molecular basis for variation within alpha satellite by pinpointing where unequal exchange occurred to produce array homogenization.
7. The Computational Challenge of Alpha Satellite Genome Assemblies
The bottleneck in generating alpha satellite assemblies has undoubtedly been the sophistication of assembly tools that are required to order distinguishable monomeric sequences within highly homogenous arrays. Several groups have developed in silico tools for analyzing higher order alpha satellite sequence available in genome assemblies [65,66]. These computational and in silico approaches are most effective when combined with experimental approaches that mapped clones by FISH to verify their location in or near the higher order array. Indeed, such dry/wet approaches were used to map the region spanning the Xp centromere-arm junction and to characterize the centromere of human chromosome 17 [60,61,67]. In the latter instance, a novel higher-order array (D17Z1-B) was discovered on chromosome 17 , emphasizing the power of this integrative approach. Another novel HOR array, localized by BLAST to HSA22 and verified by FISH to hybridize to HSA14 and 22, was found by “rescuing” unassembled alpha satellite sequence information from whole genome sequencing (WGS) repositories . These studies revealed that while challenging to assemble, repetitive satellite regions, particularly in the centromere, hold a wealth of complex genomic structure and potentially functional information.
8. Assembling Centromeres in the Present Day
Previous studies utilized traditional sequencing technologies that have the potential to contain several 171-bp monomers per read. Next generation short-read sequencing technology has enabled the recent increase in whole-genome sequencing and the amount of human sequence information available overall. Nevertheless, short reads present a particular challenge for assembling alpha satellite sequence. It appears that this obstacle of aligning short-read alpha satellite sequences can be overcome to utilize functional information gleaned from chromatin immunoprecipitation with centromeric protein antibodies and Illumina sequencing of the DNA that is captured . This ChIP-sequencing (ChIP-seq) approach utilized the reference assembly as well as the HuRef genome, first by aligning the HuRef alpha satellite reads to the reference assembly. After this alignment, the reference alpha satellite was broken into sliding windows, and the alignment checked back onto the HuRef reads to determine the “mappability” of each window. This mappability information was then used for alignment of the short Illumina reads generated by ChIP-seq. It should be noted, however, that this study did not have the means to extend beyond the edges of the reference assembly into the homogenous centromere cores (see Future Perspectives). Another major discovery from the assembly annotation of this work was that many more chromosomes than previously thought contain two or more higher order alpha satellite arrays [51,61,69,70,71]. This finding has raised the complexity of centromeres to a new level and introduced the possibility that the location of centromere assembly may be quite variable in humans. This is indeed the case for human chromosome 17 on which the centromere can be assembled at either of the two higher order repeat arrays . This new information suggests that in addition to alpha satellite haplotypes, there may a number of functional centromeric genotypes. How a functional genotype might affect long-term chromosome stability is an open question.
9. Future Perspectives
It is now 2014, so what can we expect from the centromere field in the next decade? Based on the foundation laid by the Human Genome Project era, the most exciting areas of centromere research are in some of the following areas.
9.1. Centromere Assemblies
Clearly, the most significant frontier that remains to be explored in centromere biology is complete genomic centromere assemblies. With the recent advances in the past two years alone using long-template sequencing and advanced computational approaches that have sampled, annotated, and assembled centromere sequences in multiple genomes, centromere reference sequences are a real possibility. Just recently, ordering of monomer sequences from whole-genome shotgun reads has produced the first linear characterization of centromeric assemblies for alpha satellite arrays from chromosomes X and Y . Increasing read lengths offered by multiple platforms offer the potential to contain several multi-kilobase HORs in one read. In fact, long PacBio reads have already accelerated the discovery and mapping of centromeric tandem repeats in a variety of species . These third generation sequencing techniques should enable longer alpha satellite sequence assemblies and better understanding of centromere structure and neighboring variant HORs. Completion of even a few centromere assemblies will undoubtedly be important, but given the amount of variation in alpha satellite organization and size, the ultimate goal would be to produce centromere assemblies for each individual. These personalized maps would be useful for defining the spectrum of sequences that correlate with functional competency. In addition, they will allow identification of other features—such as genes or non-coding elements—that are present within current centromere/pericentromere gaps. These sequences may require centromeric locations for proper function, similar to heterochromatic genes in Drosophila [48,73]. Indeed, a human muscle disorder has been mapped to the gene KCNJ18 that is located in an assembly gap on 17p11.2 . It is possible that other genes or elements within centromere regions may be associated with diseases for which the molecular basis remains undefined.
9.2. Centromeric Variation and Functional Capacity
The ability to confidently assemble centromeric contigs should permit identification of the full range of variability in alpha satellite, including sequence and size variants . Such variation will shed light on the molecular mechanisms that regulate alpha satellite homogenization, but also effects of fundamental processes such as DNA replication and DNA repair on alpha satellite stability. Ultimately, characterization of alpha satellite variation would reveal the range of sequences that are capable of supporting centromere function. HAC studies have taught us that not all alpha satellite sequences have the capacity to support de novo centromere assembly [29,32]. The reasons for this have been largely unexplored, and mostly attributed to the presence or absence of CENP-B boxes in alpha satellite [33,75]. One would expect that like a given complex human disease that is often associated with various SNPs, many types of sequence variation would be associated with diminished centromere function. Complete, personalized centromeric assemblies linked to functional centromere status would expedite experiments to compare efficiencies of various sequence variants in de novo centromere assembly and/or centromere maintenance.
9.3. Maps of Functional Centromeric Domains
The consensus in the centromere field is that centromere identity is specified by epigenetic mechanisms. However, without detailed genomic information, this theory is not irrefutable. Centromere proteins, such as the histone H3-like protein CENP-A, are assembled onto alpha satellite DNA to create a specialized type of nucleosome within unique chromatin that distinguishes the centromere from the rest of the genome [76,77]. CENP-A and other proteins create a complicated network of protein sub-complexes that link the chromatin to the structural kinetochore that interacts with spindle microtubules . However, chromatin that contains CENP-A nucleosomes is only assembled on a portion of alpha satellite DNA [79,80]. How and why CENP-A is recruited to only a subset of alpha satellite HOR and/or monomers is unclear. Recent studies have revealed that CENP-A nucleosomes on the human X chromosome are positioned at monomers that do not contain CENP-B boxes . One could speculate that distribution of CENP-B boxes within an alpha satellite array and sequence variation that interrupts the CENP-B box motif or makes the motif non-functional (not bound by CENP-B) might impact CENP-A chromatin assembly and centromere function. Complete centromeric assemblies of many human chromosomes will be important for addressing this possibility experimentally.
Since the discovery of alpha satellite DNA in the late 1970s, the field has moved from identification of centromeric sequences at every human centromere to a basic molecular understanding of the organization and structure of alpha satellite monomers into homogeneous higher order repetitive arrays (Figure 5). The Human Genome Project was essential in providing a rough and limited reference assembly for centromeres of three chromosomes (X, 8, 17). These fundamental studies of alpha satellite DNA paved the way for pioneering functional assays in which the sequence was tested in de novo centromere assembly in human artificial chromosome assays. HACs have been the gold standard for testing centromere assembly, but are now being used to explore chromosome stability and gene expression. The next challenge will be to complete genomic assemblies for all human centromeres in multiple individuals and populations and to develop the next generation of functional assays to test the role of alpha satellite variation in centromere function, chromosome stability, and disease association.
We apologize to our colleagues whose work on alpha satellite DNA and human centromeres could not be cited due to space constraints. Research in the Sullivan lab is supported in part by R01 GM098500 (NIH) and Gene Discovery and Translational Research Grant #1-FY13-517 (March of Dimes Foundation).
Wrote the paper: MEAM, BAS.
Conflicts of interest
The authors declare no conflict of interest.
- Panchenko, T.; Black, B.E. The epigenetic basis for centromere identity. Prog. Mol. Subcell. Biol. 2009, 48, 1–32, doi:10.1007/978-3-642-00182-6_1.
- Valente, L.P.; Silva, M.C.; Jansen, L.E. Temporal control of epigenetic centromere specification. Chromosome Res. 2012, 20, 481–492, doi:10.1007/s10577-012-9291-2.
- Choo, K.H. Domain organization at the centromere and neocentromere. Dev. Cell 2001, 1, 165–177, doi:10.1016/S1534-5807(01)00028-4.
- Warburton, P.E. Chromosomal dynamics of human neocentromere formation. Chromosome Res. 2004, 12, 617–626, doi:10.1023/B:CHRO.0000036585.44138.4b.
- Manuelidis, L.; Wu, J.C. Homology between human and simian repeated DNA. Nature 1978, 276, 92–94, doi:10.1038/276092a0.
- Waye, J.S.; Willard, H.F. Nucleotide sequence heterogeneity of alpha satellite repetitive DNA: A survey of alphoid sequences from different human chromosomes. Nucleic Acids Res. 1987, 15, 7549–7569, doi:10.1093/nar/15.18.7549.
- Willard, H.F. Chromosome-specific organization of human alpha satellite DNA. Am. J. Hum. Genet. 1985, 37, 524–532.
- Muro, Y.; Masumoto, H.; Yoda, K.; Nozaki, N.; Ohashi, M.; Okazaki, T. Centromere protein B assembles human centromeric alpha-satellite DNA at the 17-bp sequence, CENP-B box. J. Cell Biol. 1992, 116, 585–596, doi:10.1083/jcb.116.3.585.
- Schueler, M.G.; Sullivan, B.A. Structural and functional dynamics of human centromeric chromatin. Annu. Rev. Genomics Hum. Genet. 2006, 7, 301–313, doi:10.1146/annurev.genom.7.080505.115613.
- Choo, K.H.; Vissel, B.; Nagy, A.; Earle, E.; Kalitsis, P. A survey of the genomic distribution of alpha satellite DNA on all the human chromosomes, and derivation of a new consensus sequence. Nucleic Acids Res. 1991, 19, 1179–1182, doi:10.1093/nar/19.6.1179.
- Vissel, B.; Choo, K.H. Human alpha satellite DNA—Consensus sequence and conserved regions. Nucleic Acids Res. 1987, 15, 6751–6752, doi:10.1093/nar/15.16.6751.
- Alexandrov, I.A.; Mitkevich, S.P.; Yurov, Y.B. The phylogeny of human chromosome specific alpha satellites. Chromosoma 1988, 96, 443–453, doi:10.1007/BF00303039.
- Alexandrov, I.; Kazakov, A.; Tumeneva, I.; Shepelev, V.; Yurov, Y. Alpha-satellite DNA of primates: Old and new families. Chromosoma 2001, 110, 253–266, doi:10.1007/s004120100146.
- Devilee, P.; Kievits, T.; Waye, J.S.; Pearson, P.L.; Willard, H.F. Chromosome-specific alpha satellite DNA: Isolation and mapping of a polymorphic alphoid repeat from human chromosome 10. Genomics 1988, 3, 1–7, doi:10.1016/0888-7543(88)90151-6.
- Mahtani, M.M.; Willard, H.F. Pulsed-field gel analysis of alpha-satellite DNA at the human X chromosome centromere: High-frequency polymorphisms and array size estimate. Genomics 1990, 7, 607–613, doi:10.1016/0888-7543(90)90206-A.
- Greig, G.M.; Parikh, S.; George, J.; Powers, V.E.; Willard, H.F. Molecular cytogenetics of alpha satellite DNA from chromosome 12: Fluorescence in situ hybridization and description of DNA and array length polymorphisms. Cytogenet. Cell Genet. 1991, 56, 144–148, doi:10.1159/000133071.
- Wevrick, R.; Willard, H.F. Long-range organization of tandem arrays of alpha satellite DNA at the centromeres of human chromosomes: High-frequency array-length polymorphism and meiotic stability. Proc. Natl. Acad. Sci. USA 1989, 86, 9394–9398, doi:10.1073/pnas.86.23.9394.
- Willard, H.F.; Waye, J.S.; Skolnick, M.H.; Schwartz, C.E.; Powers, V.E.; England, S.B. Detection of restriction fragment length polymorphisms at the centromeres of human chromosomes by using chromosome-specific alpha satellite DNA probes: Implications for development of centromere-based genetic linkage maps. Proc. Natl. Acad. Sci. USA 1986, 83, 5611–5615, doi:10.1073/pnas.83.15.5611.
- Abruzzo, M.A.; Griffin, D.K.; Millie, E.A.; Sheean, L.A.; Hassold, T.J. The effect of Y-chromosome alpha-satellite array length on the rate of sex chromosome disomy in human sperm. Hum. Genet. 1996, 97, 819–823, doi:10.1007/BF02346196.
- Oakey, R.; Tyler-Smith, C. Y chromosome DNA haplotyping suggests that most European and Asian men are descended from one of two males. Genomics 1990, 7, 325–330, doi:10.1016/0888-7543(90)90165-Q.
- Willard, H.F. Evolution of alpha satellite. Curr. Opin. Genet. Dev. 1991, 1, 509–514, doi:10.1016/S0959-437X(05)80200-X.
- Waye, J.S.; Willard, H.F. Molecular analysis of a deletion polymorphism in alpha satellite of human chromosome 17: Evidence for homologous unequal crossing-over and subsequent fixation. Nucleic Acids Res. 1986, 14, 6915–6927, doi:10.1093/nar/14.17.6915.
- Waye, J.S.; Willard, H.F. Structure, organization, and sequence of alpha satellite DNA from human chromosome 17: Evidence for evolution by unequal crossing-over and an ancestral pentamer repeat shared with the human X chromosome. Mol. Cell. Biol. 1986, 6, 3156–3165.
- Willard, H.F.; Greig, G.M.; Powers, V.E.; Waye, J.S. Molecular organization and haplotype analysis of centromeric DNA from human chromosome 17: Implications for linkage in neurofibromatosis. Genomics 1987, 1, 368–373, doi:10.1016/0888-7543(87)90041-3.
- Warburton, P.E.; Willard, H.F. Interhomologue sequence variation of alpha satellite DNA from human chromosome 17: Evidence for concerted evolution along haplotypic lineages. J. Mol. Evol. 1995, 41, 1006–1015.
- Maloney, K.A.; Sullivan, L.L.; Matheny, J.E.; Strome, E.D.; Merrett, S.L.; Ferris, A.; Sullivan, B.A. Functional epialleles at an endogenous human centromere. Proc. Natl. Acad. Sci. USA 2012, 109, 13704–13709, doi:10.1073/pnas.1203126109.
- Brown, K.E.; Barnett, M.A.; Burgtorf, C.; Shaw, P.; Buckle, V.J.; Brown, W.R. Dissecting the centromere of the human Y chromosome with cloned telomeric DNA. Hum. Mol. Genet. 1994, 3, 1227–1237, doi:10.1093/hmg/3.8.1227.
- Farr, C.J.; Bayne, R.A.; Kipling, D.; Mills, W.; Critcher, R.; Cooke, H.J. Generation of a human X-derived minichromosome using telomere-associated chromosome fragmentation. EMBO J. 1995, 14, 5444–5454.
- Harrington, J.J.; van Bokkelen, G.; Mays, R.W.; Gustashaw, K.; Willard, H.F. Formation of de novo centromeres and construction of first-generation human artificial microchromosomes. Nat. Genet. 1997, 15, 345–355, doi:10.1038/ng0497-345.
- Ikeno, M.; Grimes, B.R.; Okazaki, T.; Nakano, M.; Saitoh, K.; Hoshino, H.; McGill, N.I.; Cooke, H.; Masumoto, H. Construction of YAC-based mammalian artificial chromosomes. Nat. Biotechnol. 1998, 16, 431–439, doi:10.1038/nbt0598-431.
- Grimes, B.R.; Babcock, J.; Rudd, M.K.; Chadwick, B.; Willard, H.F. Assembly and characterizationof heterochromatin and euchromatin on human artificial chromosomes. Genome Biol. 2004, 5, R89, doi:10.1186/gb-2004-5-11-r89.
- Grimes, B.R.; Rhoades, A.A.; Willard, H.F. Alpha-satellite DNA and vector composition influence rates of human artificial chromosome formation. Mol. Ther. 2002, 5, 798–805, doi:10.1006/mthe.2002.0612.
- Ohzeki, J.; Nakano, M.; Okada, T.; Masumoto, H. CENP-B box is required for de novo centromere chromatin assembly on human alphoid DNA. J. Cell Biol. 2002, 159, 765–775, doi:10.1083/jcb.200207112.
- Haaf, T.; Warburton, P.E.; Willard, H.F. Integration of human alpha-satellite DNA into simian chromosomes: Centromere protein binding and disruption of normal chromosome segregation. Cell 1992, 70, 681–696, doi:10.1016/0092-8674(92)90436-G.
- Nakashima, H.; Nakano, M.; Ohnishi, R.; Hiraoka, Y.; Kaneda, Y.; Sugino, A.; Masumoto, H. Assembly of additional heterochromatin distinct from centromere-kinetochore chromatin is required for de novo formation of human artificial chromosome. J. Cell Sci. 2005, 118, 5885–5898, doi:10.1242/jcs.02702.
- Nakano, M.; Cardinale, S.; Noskov, V.N.; Gassmann, R.; Vagnarelli, P.; Kandels-Lewis, S.; Larionov, V.; Earnshaw, W.C.; Masumoto, H. Inactivation of a human kinetochore by specific targeting of chromatin modifiers. Dev. Cell 2008, 14, 507–522, doi:10.1016/j.devcel.2008.02.001.
- Bergmann, J.H.; Rodriguez, M.G.; Martins, N.M.; Kimura, H.; Kelly, D.A.; Masumoto, H.; Larionov, V.; Jansen, L.E.; Earnshaw, W.C. Epigenetic engineering shows H3K4me2 is required for HJURP targeting and CENP-A assembly on a synthetic human kinetochore. EMBO J. 2011, 30, 328–340, doi:10.1038/emboj.2010.329.
- Cardinale, S.; Bergmann, J.H.; Kelly, D.; Nakano, M.; Valdivia, M.M.; Kimura, H.; Masumoto, H.; Larionov, V.; Earnshaw, W.C. Hierarchical inactivation of a synthetic human kinetochore by a chromatin modifier. Mol. Biol. Cell 2009, 20, 4194–4204, doi:10.1091/mbc.E09-06-0489.
- Kononenko, A.V.; Lee, N.C.; Earnshaw, W.C.; Kouprina, N.; Larionov, V. Re-engineering an alphoid(tetO)-HAC-based vector to enable high-throughput analyses of gene function. Nucleic Acids Res. 2013, 41, e107, doi:10.1093/nar/gkt205.
- Murphy, T.D.; Karpen, G.H. Centromeres take flight: Alpha satellite and the quest for the human centromere. Cell 1998, 93, 317–320, doi:10.1016/S0092-8674(00)81158-7.
- Henikoff, S. Near the edge of a chromosome’s “black hole”. Trends Genet. 2002, 18, 165–167, doi:10.1016/S0168-9525(01)02622-1.
- Collins, F.S.; Patrinos, A.; Jordan, E.; Chakravarti, A.; Gesteland, R.; Walters, L. New goals for the U.S. Human Genome Project: 1998–2003. Science 1998, 282, 682–689, doi:10.1126/science.282.5389.682.
- Eichler, E.E. Repetitive conundrums of centromere structure and function. Hum. Mol. Genet. 1999, 8, 151–155, doi:10.1093/hmg/8.2.151.
- Horvath, J.E.; Bailey, J.A.; Locke, D.P.; Eichler, E.E. Lessons from the human genome: Transitions between euchromatin and heterochromatin. Hum. Mol. Genet. 2001, 10, 2215–2223, doi:10.1093/hmg/10.20.2215.
- Horvath, J.E.; Viggiano, L.; Loftus, B.J.; Adams, M.D.; Archidiacono, N.; Rocchi, M.; Eichler, E.E. Molecular structure and evolution of an alpha satellite/non-alpha satellite junction at 16p11. Hum. Mol. Genet. 2000, 9, 113–123, doi:10.1093/hmg/9.1.113.
- She, X.; Horvath, J.E.; Jiang, Z.; Liu, G.; Furey, T.S.; Christ, L.; Clark, R.; Graves, T.; Gulden, C.L.; Alkan, C.; et al. The structure and evolution of centromeric transition regions within the human genome. Nature 2004, 430, 857–864, doi:10.1038/nature02806.
- Hoskins, R.A.; Carlson, J.W.; Kennedy, C.; Acevedo, D.; Evans-Holm, M.; Frise, E.; Wan, K.H.; Park, S.; Mendez-Lago, M.; Rossi, F.; et al. Sequence finishing and mapping of Drosophila melanogaster heterochromatin. Science 2007, 316, 1625–1628, doi:10.1126/science.1139816.
- Smith, C.D.; Shu, S.; Mungall, C.J.; Karpen, G.H. The Release 5.1 annotation of Drosophila melanogaster heterochromatin. Science 2007, 316, 1586–1591, doi:10.1126/science.1139815.
- Kalitsis, P.; Griffiths, B.; Choo, K.H. Mouse telocentric sequences reveal a high rate of homogenization and possible role in Robertsonian translocation. Proc. Natl. Acad. Sci. USA 2006, 103, 8786–8791, doi:10.1073/pnas.0600250103.
- Mouse Genome Sequencing Consortium; Waterston, R.H.; Lindblad-Toh, K.; Birney, E.; Rogers, J.; Abril, J.F.; Agarwal, P.; Agarwala, R.; Ainscough, R.; Alexandersson, M.; et al. Initial sequencing and comparative analysis of the mouse genome. Nature 2002, 420, 520–562, doi:10.1038/nature01262.
- Hayden, K.E.; Strome, E.D.; Merrett, S.E.; Lee, H.R.; Rudd, M.K.; Willard, H.F. Sequences associated with centromere competency in the human genome. Mol. Cell. Biol. 2012, 33, 763–772.
- Melters, D.P.; Bradnam, K.R.; Young, H.A.; Telis, N.; May, M.R.; Ruby, J.G.; Sebra, R.; Peluso, P.; Eid, J.; Rank, D.; et al. Comparative analysis of tandem repeats from hundreds of species reveals unique insights into centromere evolution. Genome Biol. 2013, 14, R10, doi:10.1186/gb-2013-14-1-r10.
- Mahtani, M.M.; Willard, H.F. Physical and genetic mapping of the human X chromosome centromere: Repression of recombination. Genome Res. 1998, 8, 100–110.
- Puechberty, J.; Laurent, A.M.; Gimenez, S.; Billault, A.; Brun-Laurent, M.E.; Calenda, A.; Marçais, B.; Prades, C.; Ioannou, P.; Yurov, Y.; et al. Genetic and physical analyses of the centromeric and pericentromeric regions of human chromosome 5: Recombination across 5cen. Genomics 1999, 56, 274–287, doi:10.1006/geno.1999.5742.
- Vermeesch, J.R.; Duhamel, H.; Raeymaekers, P.; van Zand, K.; Verhasselt, P.; Fryns, J.P.; Marynen, P. A physical map of the chromosome 12 centromere. Cytogenet. Genome Res. 2003, 103, 63–73, doi:10.1159/000076291.
- Horvath, J.E.; Schwartz, S.; Eichler, E.E. The mosaic structure of human pericentromeric DNA: A strategy for characterizing complex regions of the human genome. Genome Res. 2000, 10, 839–852, doi:10.1101/gr.10.6.839.
- Bailey, J.A.; Yavor, A.M.; Massa, H.F.; Trask, B.J.; Eichler, E.E. Segmental duplications: Organization and impact within the current human genome project assembly. Genome Res. 2001, 11, 1005–1017, doi:10.1101/gr.GR-1871R.
- Cheung, V.G.; Nowak, N.; Jang, W.; Kirsch, I.R.; Zhao, S.; Chen, X.N.; Furey, T.S.; Kim, U.J.; Kuo, W.L.; Olivier, M.; et al. Integration of cytogenetic landmarks into the draft sequence of the human genome. Nature 2001, 409, 953–958, doi:10.1038/35057192.
- Genovese, G.; Handsaker, R.E.; Li, H.; Kenny, E.E.; McCarroll, S.A. Mapping the human reference genome’s missing sequence by three-way admixture in Latino genomes. Am. J. Hum. Genet. 2013, 93, 411–421, doi:10.1016/j.ajhg.2013.07.002.
- Schueler, M.G.; Higgins, A.W.; Rudd, M.K.; Gustashaw, K.; Willard, H.F. Genomic and genetic definition of a functional human centromere. Science 2001, 294, 109–115, doi:10.1126/science.1065042.
- Rudd, M.K.; Willard, H.F. Analysis of the centromeric regions of the human genome assembly. Trends Genet. 2004, 20, 529–533, doi:10.1016/j.tig.2004.08.008.
- Nusbaum, C.; Mikkelsen, T.S.; Zody, M.C.; Asakawa, S.; Taudien, S.; Garber, M.; Kodira, C.D.; Schueler, M.G.; Shimizu, A.; Whittaker, C.A.; et al. DNA sequence and analysis of human chromosome 8. Nature 2006, 439, 331–335, doi:10.1038/nature04406.
- Ross, M.T.; Grafham, D.V.; Coffey, A.J.; Scherer, S.; McLay, K.; Muzny, D.; Platzer, M.; Howell, G.R.; Burrows, C.; Bird, C.P.; et al. The DNA sequence of the human X chromosome. Nature 2005, 434, 325–337, doi:10.1038/nature03440.
- Roizes, G. Human centromeric alphoid domains are periodically homogenized so that they vary substantially between homologues. Mechanism and implications for centromere functioning. Nucleic Acids Res. 2006, 34, 1912–1924, doi:10.1093/nar/gkl137.
- Paar, V.; Pavin, N.; Rosandic, M.; Gluncic, M.; Basar, I.; Pezer, R.; Zinic, S.D. ColorHOR—Novel graphical algorithm for fast scan of alpha satellite higher-order repeats and HOR annotation for GenBank sequence of human genome. Bioinformatics 2005, 21, 846–852, doi:10.1093/bioinformatics/bti072.
- Rosandic, M.; Paar, V.; Gluncic, M.; Basar, I.; Pavin, N. Key-string algorithm—Novel approach to computational analysis of repetitive sequences in human centromeric DNA. Croat. Med. J. 2003, 44, 386–406.
- Rudd, M.K.; Schueler, M.G.; Willard, H.F. Sequence organization and functional annotation of human centromeres. Cold Spring Harb. Symp. Quant. Biol. 2003, 68, 141–149, doi:10.1101/sqb.2003.68.141.
- Alkan, C.; Ventura, M.; Archidiacono, N.; Rocchi, M.; Sahinalp, S.C.; Eichler, E.E. Organization and evolution of primate centromeric DNA from whole-genome shotgun sequence data. PLoS Comput. Biol. 2007, 3, 1807–1818.
- Alexandrov, I.A.; Mashkova, T.D.; Akopian, T.A.; Medvedev, L.I.; Kisselev, L.L.; Mitkevich, S.P.; Yurov, Y.B. Chromosome-specific alpha satellites: Two distinct families on human chromosome 18. Genomics 1991, 11, 15–23, doi:10.1016/0888-7543(91)90097-X.
- Choo, K.H.; Earle, E.; Vissel, B.; Filby, R.G. Identification of two distinct subfamilies of alpha satellite DNA that are highly specific for human chromosome 15. Genomics 1990, 7, 143–151, doi:10.1016/0888-7543(90)90534-2.
- Wevrick, R.; Willard, H.F. Physical map of the centromeric region of human chromosome 7: Relationship between two distinct alpha satellite arrays. Nucleic Acids Res. 1991, 19, 2295–2301, doi:10.1093/nar/19.9.2295.
- Miga, K.H.; Newton, Y.; Jain, M.; Altemose, N.; Willard, H.F.; Kent, W.J. Centromere reference models for human chromosomes X and Y satellite arrays. arXiv 2013. arXiv:1307.0035v3[q-bio.GN].
- Schulze, S.; Sinclair, D.A.; Silva, E.; Fitzpatrick, K.A.; Singh, M.; Lloyd, V.K.; Morin, K.A.; Kim, J.; Holm, D.G.; Kennison, J.A.; et al. Essential genes in proximal 3L heterochromatin of Drosophila melanogaster. Mol. Gen. Genet. 2001, 264, 782–789, doi:10.1007/s004380000367.
- Ryan, D.P.; da Silva, M.R.; Soong, T.W.; Fontaine, B.; Donaldson, M.R.; Kung, A.W.; Jongjaroenprasert, W.; Liang, M.C.; Khoo, D.H.; Cheah, J.S.; et al. Mutations in potassium channel Kir2.6 cause susceptibility to thyrotoxic hypokalemic periodic paralysis. Cell 2010, 140, 88–98, doi:10.1016/j.cell.2009.12.024.
- Masumoto, H.; Nakano, M.; Ohzeki, J. The role of CENP-B and alpha-satellite DNA: De novo assembly and epigenetic maintenance of human centromeres. Chromosome Res. 2004, 12, 543–556, doi:10.1023/B:CHRO.0000036593.72788.99.
- Blower, M.D.; Sullivan, B.A.; Karpen, G.H. Conserved organization of centromeric chromatin in flies and humans. Dev. Cell 2002, 2, 319–330, doi:10.1016/S1534-5807(02)00135-1.
- Sullivan, B.A.; Karpen, G.H. Centromeric chromatin exhibits a histone modification pattern that is distinct from both euchromatin and heterochromatin. Nat. Struct. Mol. Biol. 2004, 11, 1076–1083, doi:10.1038/nsmb845.
- Hori, T.; Fukagawa, T. Establishment of the vertebrate kinetochores. Chromosome Res. 2012, 20, 547–561, doi:10.1007/s10577-012-9289-9.
- Spence, J.M.; Critcher, R.; Ebersole, T.A.; Valdivia, M.M.; Earnshaw, W.C.; Fukagawa, T.; Farr, C.J. Co-localization of centromere activity, proteins and topoisomerase II within a subdomain of the major human X alpha-satellite array. EMBO J. 2002, 21, 5269–5280, doi:10.1093/emboj/cdf511.
- Sullivan, L.L.; Boivin, C.D.; Mravinac, B.; Song, I.Y.; Sullivan, B.A. Genomic size of CENP-A domain is proportional to total alpha satellite array size at human centromeres and expands in cancer cells. Chromosome Res. 2011, 19, 457–470, doi:10.1007/s10577-011-9208-5.
- Hasson, D.; Panchenko, T.; Salimian, K.J.; Salman, M.U.; Sekulic, N.; Alonso, A.; Warburton, P.E.; Black, B.E. The octamer is the major form of CENP-A nucleosomes at human centromeres. Nat. Struct. Mol. Biol. 2013, 20, 687–695, doi:10.1038/nsmb.2562.
© 2014 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).