Targeted capture and massively parallel sequencing in pediatric cardiomyopathy : development of novel diagnostics

Pediatric cardiomyopathy is a genetically heterogeneous disease associated with significant morbidity. Although identification of underlying etiology is important for management, therapy, and screening of at risk family members, molecular diagnosis is difficult due to the large number of causative genes, the high rate of private mutations, and cost. In this study, we aimed to define the genetic basis of pediatric cardiomyopathy and test a novel diagnostic tool using a custom targeted microarray coupled to massively parallel sequencing. Three patients with cardiomyopathy were screened using a custom NimbleGen sequence capture array containing 110 genes and providing 99.9% coverage of the exons of interest. The sensitivity and specificity was over 99% as determined by comparison to long-range polymerase chain reaction (PCR)based massively parallel sequencing, Sanger sequencing of missense variants, and single nucleotide polymorphisms genotyping using the Illumina Infinium Omni1 array. Overall, 99.73% of the targeted regions were captured and sequenced at over 10x coverage, allowing reliable mutation calling in all patients. Analysis identified a total of 165 non-synonymous coding single nucleotide polymorphisms (cSNPs) of which 89 were unique and 14 were novel. On average, each patient had 4 cSNPs predicted to be pathogenic. In conclusion, we report a cardiomyopathy sequencing array that allows simultaneous assessment of 110 genes. Comparison of targeted sequence capture versus PCR-based enrichment methods demonstrates that the former is more sensitive and efficient. Array-based sequence capture technology followed by massively parallel sequencing is promising as a robust and comprehensive tool for genetic screening of cardiomyopathy. These results provide important information about genetic architecture and indicate that improved annotation of variants and interpretation of clinical significance, particularly in cases with multiple rare variants, are important for clinical practice. Introduction Pediatric cardiomyopathies are clinically heterogeneous disorders of heart muscle that are responsible for significant morbidity and mortality. The primary cause of the majority of cases of pediatric cardiomyopathy is genetic. To date, more than 100 genes have been implicated in cardiomyopathy, but comprehensive genetic diagnosis has been problematic because of the large number of genes and private nature of mutations. In the last two decades, polymerase chain reaction (PCR)-based enrichment of genomic DNA has been the dominant technique for targeted enrichment of specific genomic regions. Massively parallel (Next-Generation) sequencing has recently emerged as a powerful tool for sequencing utilizing PCR-based genomic enrichment. However, the high throughput benefits of massively parallel sequencing remain limited by the relatively labor intensive nature of PCR-based enrichment. As a result, novel non-PCR based enrichment procedures have emerged as promising alternatives.1 Microarray-based sequence capture hybridization (SeqCap) uses a chip arrayed with specific probes and allows enrichment of multiple dispersed segments or long single genome segments.2-4 Recently, this approach has been shown to be successful in identifying novel genes involved in inherited human monogenic (recessive or dominant) disorders5-8 or polymorphic genetic variants associated with complex human diseases.9,10 While these approaches have been proven to be useful in researchbased testing, it is not clear whether these platforms are suitable for the higher stringency required for clinical molecular testing. In the present study, we applied a customized array-based SeqCap enrichment strategy coupled to massively parallel sequencing and successfully screened 110 genes in cardiomyopathy patients in a single sequencing run each and identified mutations associated with disease. We have evaluated the performance and quality of long-range PCR (LR-PCR) versus SeqCap enrichment methods using cardiomyopathy as a disease model. The results indicate that this is a promising platform for clinical diagnostics and provide important information about the complexity of the genetic architecture in this disease. Materials and Methods Patients and DNA samples Three cardiomyopathy patients, CM0001, CM0002 (Pedigree A, Figure 1) and CM0006 (Pedigree B, Figure 1) were studied. The patients underwent clinical evaluation by a pediatric geneticist and pediatric cardiologist at Cincinnati Children's Hospital Medical Center (CCHMC). Written informed consent was obtained from the parents or study participants prior to initiation. This study design was approved by the CCHMC Institutional Review Board. Genomic DNA was extracted using a Qiagen purification kit (Puregene Blood Case kit C, QIAGEN Sciences, Maryland, USA). DNA quality and quantity were calculated with a NanoDrop spectrophotometer (ThermoScientific). Selection of candidate genes One hundred and ten nuclear genes were selected for analysis with microarray-based targeted SeqCap. Of these, a smaller subset of 31 genes was chosen for PCR-based enrichment (Table 1). Genes available on commercially available panels for cardiomyopathy at Cardiogenetics 2012; volume 2:e7 Correspondence: Stephanie Ware, Cincinnati Children’s Hospital Medical Center, 240 Albert Sabin Way, MLC 7020, Cincinnati, OH 45229, USA. Tel. +513.636.9427 Fax: +513.636-5958. E-mail: stephanie.ware@cchmc.org


Introduction
Pediatric cardiomyopathies are clinically heterogeneous disorders of heart muscle that are responsible for significant morbidity and mortality.The primary cause of the majority of cases of pediatric cardiomyopathy is genetic.To date, more than 100 genes have been implicated in cardiomyopathy, but comprehensive genetic diagnosis has been problematic because of the large number of genes and private nature of mutations.
In the last two decades, polymerase chain reaction (PCR)-based enrichment of genomic DNA has been the dominant technique for targeted enrichment of specific genomic regions.Massively parallel (Next-Generation) sequencing has recently emerged as a powerful tool for sequencing utilizing PCR-based genomic enrichment.However, the high throughput benefits of massively parallel sequencing remain limited by the relatively labor intensive nature of PCR-based enrichment.As a result, novel non-PCR based enrichment procedures have emerged as promising alternatives. 13][4] Recently, this approach has been shown to be successful in identifying novel genes involved in inherited human monogenic (recessive or dominant) disorders [5][6][7][8] or polymorphic genetic variants associated with complex human diseases. 9,10While these approaches have been proven to be useful in researchbased testing, it is not clear whether these platforms are suitable for the higher stringency required for clinical molecular testing.In the present study, we applied a customized array-based SeqCap enrichment strategy coupled to massively parallel sequencing and successfully screened 110 genes in cardiomyopathy patients in a single sequencing run each and identified mutations associated with disease.We have evaluated the performance and quality of long-range PCR (LR-PCR) versus SeqCap enrichment methods using cardiomyopathy as a disease model.The results indicate that this is a promising platform for clinical diagnostics and provide important information about the complexity of the genetic architecture in this disease.

Patients and DNA samples
Three cardiomyopathy patients, CM0001, CM0002 (Pedigree A, Figure 1) and CM0006 (Pedigree B, Figure 1) were studied.The patients underwent clinical evaluation by a pediatric geneticist and pediatric cardiologist at Cincinnati Children's Hospital Medical Center (CCHMC).Written informed consent was obtained from the parents or study participants prior to initiation.This study design was approved by the CCHMC Institutional Review Board.Genomic DNA was extracted using a Qiagen purification kit (Puregene Blood Case kit C, QIAGEN Sciences, Maryland, USA).DNA quality and quantity were calculated with a NanoDrop spectrophotometer (Thermo-Scientific).

Selection of candidate genes
One hundred and ten nuclear genes were selected for analysis with microarray-based targeted SeqCap.Of these, a smaller subset of 31 genes was chosen for PCR-based enrichment (      the time of study initiation were included.The 110 genes and their encoded proteins are involved in sarcomere, cytoskeletal, or desmosome organization, fatty acid oxidation, carnitine transport, storage disorders, syndromic phenotypes, and mitochondrial organization.These genes have previously been reported as disease-causing in distinct human cardiomyopathy phenotypes.

Long-range PCR and DNA pooling
For PCR-based enrichment, genes of interest were divided into 84 long-range amplicons to provide coverage to all coding regions and splice-junctions.The custom design aimed to minimize the number of amplicons and generate a majority of amplicons ranging from 3 kb to 10 kb.As a result, the majority of introns less than 4 kb were covered in their entirety.Primer sets were designed using Primer3 program (sequences available upon request).PCR amplifications were performed in a 30 µL reaction mixture using 20 ng DNA, 20 pmoles of each primer, 400 µM dNTPs, and 3 units of LA Taq polymerase in 1x LA PCR buffer (Takara Inc., Japan).Thermocycling included an initial denaturation for 3 min at 94ºC followed by 34 cycles of denaturation at 94ºC for 30 s, annealing and extension at 68ºC for 3, 6 or 10 min depending on amplicon length, and a final extension at 72ºC for 8 min.PCR products were purified using QIAquick PCR purification kit (Cat # 28106, QIAGEN Sciences), quantified by Nanodrop and pooled in an equimolar fashion based on DNA concentration and amplicon size.

Cardiomyopathy capture array
Custom SeqCap 385K human arrays were designed to our specifications and manufactured by Roche NimbleGen (Madison, WI, USA) by printing 60-mer oligonucleotide probes complementary to the genomic sequences of the coding regions and splice sites of 110 candidate genes (NCBI build 36.1, hg18) onto glass slides.A total of 1505 target regions containing 384,000 probes were tiled onto arrays after exclusion of repetitive sequences.Supplementary Table 1 shows that 99.9% of targeted nucleotides are represented on the arrays.

Library preparation
Five genomic libraries (for SeqCap arrays) and 3 LR-PCR libraries (after PCR enrichment and pooling) were prepared for massively parallel sequencing in an identical fashion according to the manufacturer's recommendations (Illumina Inc.San Diego, CA, USA).Briefly, 5 µg of DNA (from genomic DNA or 84 pooled long-range PCR products) was fragmented by nebulization for 6 min using 38psi nitrogen gas.Fragments were concentrated and ligated to adaptors from the Illumina Paired-End Library construction kit.After separation by 2% agarose gel electrophoresis, the 300bp (±25bp) fraction was gel cut and purified using QIAquick gel purification kit (Cat # 28706, QIAGEN Sciences).Library construction was completed by enriching the size selected fragments through 18 cycles of PCR.At this stage, LR-PCR libraries are ready to sequence while SeqCap libraries will be enriched further after array hybridization and elution.

Enrichment of genomic libraries by SeqCap
Five massively parallel sequencing libraries from 3 patients (CM0001, CM0006 and CM0002, including 3 technical replicates) were generated from genomic DNA and hybridized to NimbleGen custom SeqCap microarrays for targeted enrichment.After elution from arrays, an additional enrichment PCR was performed for these libraries.The manufacturer's recommended protocol was modified as follows: i) 1 µg (~4 µmole) of library was used in the hybridization; and ii) supplemented with 300 µmole (75x excess) each of Illumina primers PE1.1 and PE2.1; iii) after NaOH based elution of captured library molecules, 18 cycles of PCR were performed.

Massively parallel sequencing
Massively parallel sequencing of the SeqCap array enriched genomic libraries or LR-PCR libraries (Table 2) was carried out on the Illumina Genome analyzer IIx according to the manufacturer's protocol as single end 35-and 50-bp reads or as paired end (PE) 72-bp reads.One lane of an Illumina flow cell was used for each SeqCap sample whereas 3 LR-PCR amplified samples were run in groups of 10 per lane (pooled samples with 10 different DNA barcodes including 7 other unrelated samples to this study).All sequence reads were mapped to the reference human genome (UCSC hg 18) using the Illumina Pipeline software version 1.5 featuring a gapped aligner (ELAND v2).For variant identification downstream of CASAVA, the bioinformatics and sequencing core at CCHMC uses a validated custom, post-alignment software for read-sequence visualization and analysis called SeqMate (P Putnam, unpublished data, 2011).The tool combines the aligned reads with the reference sequence and computes a distribution of call quality at each aligned base position taking into account strand bias.Variants are reported on the basis of a configurable formula.For the custom arrays, the following settings were used: 10x minimum coverage, quality score (Phred) greater than 25, heterozygote allelic ratios allowed from 50%/50% to 75%/25%, and a minimum of 3 distinct reads per location per base call.All variants were searched in dbSNP135 and 1000 genomes project (June 2011).Amino acid changes were identified by comparison with the UCSC RefSeq database track.

Sanger sequencing validation
Non-synonymous coding SNPs (cSNPs) and   Illumina Quad-Omni array Samples were genotyped using the Illumina Infinium Omni1 SNP array using standard methods.1045 markers intersect with the region captured by the custom arrays and 310 markers are located within the LR-PCR products (Table 3).

In silico functional assessment
2][13][14] Novel cSNPs predicted as pathogenic by one program were considered mutations.Rare variants present in dbSNP at a frequency of less than 0.05 and predicted as pathogenic by two bioinformatic programs were also considered mutations.

Clinical findings
Two familial cases of cardiomyopathy were selected for genetic analysis.Pedigree A (Figure 1) shows 2 patients, CM0001 and CM0002, with restrictive cardiomyopathy (RCM) inherited an autosomal dominant trait.CM0002, now 8 years of age, was diagnosed with RCM at the age of 4 years when she developed echocardiographic evidence of biatrial enlargement (left atrial z-score 4.13, Figure 1C) and diastolic dysfunction.Cardiac catheterization subsequently confirmed the diagnosis.Genetic testing for MYBPC3, MYH7, TPM1, MYL2, MYL3, ACTC1, TNNT2, and TNNC1 genes was performed as part of the clinical diagnostic evaluation.Results showed no pathogenic mutation.CM0001 developed arrhythmia and diastolic dysfunction at the age of 16 years and was diagnosed with RCM after her first pregnancy.The oldest child died at the age of 8 years (status post transplantation for RCM).
Pedigree B (Figure 1) is consistent with an autosomal recessive inheritance pattern in a family of Amish descent.CM0006 was diagnosed with hypertrophic cardiomyopathy and probable left ventricular non-compaction (LVNC) at 5 months of age (LV diastolic septal thickness z-score 7.18; M-mode LV mass 8.17).The proband's sister died from LVNC with heart failure at 3 months.Her clinical course was complicated by tricuspid atresia.The proband showed an undulating cardiac phenotype with dilation and poor cardiac function.The patient met modified Walker criteria, a strict clinical scoring system, for mitochondrial disease.The patient had lactic acidosis.A skin biopsy sent for electron transport chain analysis showed markedly elevated citrate synthase activity, consistent with mitochondrial proliferation.Deficiencies in complexes I, III and IV were detected.The degree of deficiency was marked, especially given that the sample was from skin fibroblasts rather than muscle.CM0006 had clinical testing for one gene known to cause autosomal recessive mitochondrial disorders, SCO2.No mutation was identified.The family declined additional clinical or research testing.The patient died at the age of 9 months.

SeqCap and molecular genetic analysis
A SeqCap based hybridization approach was used to investigate 110 genes causing cardiomyopathy (Table 1).While mutations in sarcomeric and cytoskeletal genes have been well documented, mutations in metabolic genes are known to cause disease but prevalence is unknown.A custom 385K NimbleGen array provided 99.9% coverage of the exons of interest and hybridization was followed by massively parallel sequencing.On average, 4,223,998 mappable, aligned reads were generated for the custom array (PE, 72 bp) (Supplementary Table 1).The average exon depth of coverage was 442x.Alignment and coverage depth was visualized using a novel graphical user interface (Supplementary Figure 1).Read depth is a very important parameter for making accurate SNP calls.We set a 10x depth of coverage, a minimum for reliable calls.On average, 99.73% of exon bases were greater than or equal to 10x coverage with only 897 bases of a total 328,498 protein coding bases lacking sufficient depth for reliable calls (Supplementary Table 1).
The quality of the hybridization based approach and sequencing was assessed using four approaches to determine reproducibility, false positive and false negative rates.Three technical replicates using sample CM0002 showed high reproducibility with base pair coverage less than 10x occurring in 0.27%, 0.55%, and 0.36% (Supplementary Table 2).Somewhat surprisingly, increasing the overall depth of coverage did not correlate with increased comprehensive coverage at the base pair level, as the replicate with the highest overall depth of coverage (CM0002-2, 1105x exon coverage) had the lowest rate of coverage greater than 10x across all exon base pairs (99.45%).Second, we compared our results using the hybridization-based platform to a PCR enrichment approach (LR-PCR) for 31 of   1 and 2) and to Sanger sequencing for 9 genes for 2 samples.Comparison of the LR-PCR approach (Supplementary Table 3) with the SeqCap approach indicates that the areas with low depth of coverage (<10x) are distinct (data not shown), suggesting a platform specific basis.Third, we performed whole genome genotyping of samples CM0001, CM0002, and CM0006 using the Illumina Infinium Omni1 array and compared base calls at SNPs shared with those identified via resequencing.Concordance rate provides an overall assessment of specificity.

Method
Since the majority of pathogenic mutations are rare cSNPs, the heterozygous loss rate (SNPs identified by Illumina genotyping but not by massively parallel sequencing approaches) provides an estimate of false negatives whereas the heterozygous gain provides an estimate of false positives.Concordance rates between our variant calling algorithm and the SNP microarray position calls are indicated in Table 3.Taken together with the overall concordance, the results are consistent with over 99% sensitivity and specificity.Fourth, we performed Sanger sequencing of non-synonymous cSNPs identified by SeqCap and LR-PCR approaches and validated 98.64%, again consistent with a very low false positive rate.

Analysis of mutations
To distinguish potentially pathogenic DNA mutations from synonymous and other variants, we focused on non-synonymous cSNPs.A total of 3813 SNPs were detected by SeqCapmediated massively parallel sequencing, including 479 coding changes in all 3 patients (Figure 2).Of these, 75 were present in dbSNP (in total 148 in 3 patients; Supplementary Table 4) and 14 were novel (in total 17 in 3 patients; Supplementary Table 5).In CM0001 and CM0002 from pedigree A, a previously described heterozygous mutation, p.A213V, was identified in DES gene using both SeqCap and LR-PCR platforms.In CM0001, 7 additional novel cSNPs were identified including 3 predicted to be potentially damaging (p.K435N in the XIRP1 gene; p.T1351P and p.G1833W in the CABIN1 gene; Supplementary Table 5).In CM0002, two additional cSNPs were identified.
In CM0006, a homozygous splice site mutation in the MYBPC3 gene (c.3330+2T>G) was identified on both SeqCap and LR-PCR platforms (Table 4).This mutation has previously been described in the Amish population. 15urthermore, two predicted pathogenic novel cSNPs, p.V134A and p.P31A, were detected in NDUFV2 and NDUFAF4, respectively on the SeqCap platform.The NDUFAF4 p.P31A mutation was not identified in more than 200 control chromosomes in a mitochondrial clinical diagnostic laboratory (Lee-Jun Wong, personal communication, 2011).The SeqCap platform also identified a novel cSNP, p.R1346W, in the MYH6 gene in CM0006.Mutations in this gene have been reported in patients with atrial septal defects 16 and idiopathic or familial dilated cardiomyopathy. 172][13][14] Fifty-four variants were pre-   4 and 5 for full details.dicted as pathogenic by at least one bioinformatic tool (Figure 3) and 27 of these were predicted as pathogenic by more than one program, increasing the likelihood of functional significance.On average, each patient has 4 cSNPs predicted to be deleterious (Table 4).Figure 4 illustrates that amino acid residues altered in missense mutations in NDUFAF4, XIRP1, PYGM, CABIN1 and MYH6 are highly conserved across species, arguing for functional significance.

Towards development of a novel cardiomyopathy SeqCap platform
Advances in genomic technologies have markedly accelerated the search for genetic causes of human disease and answered previously intractable questions regarding disease mechanisms.In this context, selected genomic regions, genes or exons are targeted by different enrichments methods (e.g.][20] Although traditional Sanger sequencing remains the gold standard for mutation detection, it is not readily scalable for diseases with significant genetic heterogeneity such as cardiomyopathy.Our results indicate the utility of a novel custom cardiomyopathy chip for identification of rare variants and disease causing mutations.This custom array hybridization approach was efficient and less time consuming and costly compared to PCR amplification and Sanger sequencing of the same genes.
Analytical validity and clinical validity are two important metrics for genetic testing within a diagnostic laboratory.The SeqCap platform for the analysis of 110 genes in a single reaction showed excellent analytical validity, with 99.73% of base pairs called and over 99% sensitivity and specificity.There were few regions with less than adequate coverage.In clinical testing, increases above 99.73%call rates could potentially be achieved with double or triple probe coverage in regions known to be problematic for capture.In addition, it is likely that improvements in bioinformatic alignment programs and technical improvements in the SeqCap technology and massively parallel sequencing output will lead to overall improved technical performance in the future.While whole-exome and whole-genome sequencing are becoming increasingly affordable, one advantage of a more restricted panel of genes for clinical testing is the less problematic interpretation and improved clinical validity.Finally, the LR-PCR based approach, with 89.1% base pair coverage, does not meet analytical requirements for a clinical test.Nevertheless, we note that this degree of coverage is within the realm of most research based applications and given the high specificity and sensitivity, this approach remains a viable option for research based testing.Overall, the LR-PCR approach was much more labor intensive than SeqCap.

Identification of mutations in cardiomyopathy patients
The custom array was used to investigate two types of cardiomyopathy in which the molecular basis is not well understood: RCM and mitochondrial cardiomyopathy.RCM is a distinct cardiomyopathy characterized by diastolic dysfunction but intact systolic function until later stages of the disease.RCM accounts for less than 5% of all cardiomyopathies in the United States and Europe 21 and the prognosis is particularly poor in children.To date, dominant mutations causing RCM have been reported in DES, ACTC1, TNNI3, TNNT2, and MYH7, but the majority of cases are considered idiopathic. 22,23In the present study, we identified the heterozygous mutation p.A213V in DES in the mother and daughter CM0001 and CM0002 from pedigree A. This mutation has been described previously as conditionally pathogenic causing lateonset dilated cardiomyopathy, familial RCM and desminopathy. 24,25Desmin (DES) encodes a muscle-specific cytoskeletal protein found in smooth, cardiac, and skeletal muscles.Disease causing mutations have been reported in DES in dilated cardiomyopathy or desmin-related myopathy. 24,25he sample from CM0006 provided the opportunity to investigate the genetic basis of disease in a patient with a complex pedigree and phenotype consistent with mitochondrial disease.Mitochondrial cardiomyopathy is poorly understood at the molecular level.In children, the majority of mitochondrial cardiomyopathies are autosomal recessive and caused by mutations in the nuclear genome rather than the mitochondrial genome.Because many nuclear genes important for mitochondrial function have not yet been identified, molecular diagnosis is challenging.In CM0006, two rare cSNP predicted to be disease-causing were identified in genes encoding proteins important for complex I of the mitochondrial electron transport chain, p.V134A in NDUFV2 and p.P31A in NDUFAF4.NDUFAF4 encodes an assembly factor of mitochondrial complex I and has been previously described in complex I deficiency causing infantile mitochondrial encephalomyopathy and cardiomyopathy. 26The functional significance of mutations for two complex I mitochondrial proteins requires further assessment.
Surprisingly, the patient was also found to be homozygous for a previously reported splice mutation in MYBPC3 (c.3330+2T>G).This mutation has been associated previously with a severe neonatal hypertrophic cardiomyopathy in the Amish community when homozygous and hypertrophic cardiomyopathy with incomplete penetrance when heterozygous. 15,27,28In addition, a novel cSNP in MYH6 gene was identified that may play a role in the familial cardiomyopathy or history of congenital heart defect based on its published functions. 16,17The finding of multiple putative disease-causing alleles in this family may explain the complex inheritance pattern noted in the pedigree.Unfortunately, family members declined further research testing, precluding the possibility of defining segregation with phenotype.Nevertheless, these results indicate the importance of screening multiple genes on a single platform, allowing examination of the potential interplay between genetic alterations in the structural apparatus of the cardiomyocyte and metabolic pathway-specific genes.Further development of these technologies will offer a unique opportunity to interrogate complex patterns of inheritance due to the involvement of more than one gene.

Novel sequencing technology identifies multiple rare variants
Phenotypic variability is frequently observed in closely related family members carrying the same cardiomyopathy causing mutations. 29his phenotypic variability has been attributed to environmental influences (e.g.blood pressure, diet, activity) and the individual's specific genetic background.However, recognition of modifying genetic influences has significantly increased as the identification of diseasecausing genes has expanded.
Recent studies on hypertrophic cardiomyopathy have shown that individuals may have 2 distinct pathogenic mutations. 302][33] However, this view is controversial with some stating that there is insufficient evidence of the functional effects of multiple mutations. 34As demonstrated by this study, one of the challenges for clinical laboratories will be the interpretation of the multiple rare or novel cSNPs identified using massively parallel sequencing.In this small study, an average of 4 potential disease-causing cSNPs was identified in each patient.In the past, a final common pathway hypothesis had been proposed to explain similar phenotypes caused by different genes (genetic heterogeneity).This hypothesis suggests that mutated proteins of similar functions or signaling pathways [35][36][37] can lead to cardiomyopathy phenotypes that are clinically indistinguishable.This study documents both the technical ability to identify multiple rare variants or mutations in a single patient as well as the challenges of interpretation.In the future, it will be possible to analyze whether multiple rare mutations in the same or different genes act in concert to modulate the disease phenotype.
9][40][41] Results with emerging genomic technologies indicate that approximately 400 novel protein coding variants are identified with full exome sequencing.In addition, bioinformatic programs predict that approximately 20% of common human non-synonymous SNPs damage the proteins. 12Because of this, recognition of rare variants that significantly impact protein function in genetically and phenotypically heterogeneous diseases, such as cardiomyopathy, is challenging.Mutations identified in multiple genes in the present study provide insight into the large number of altered protein-protein interactions, potentially resulting in a combined effect on function.Although bioinformatics programs continue to improve, they do have limitations.For example, the variant p.R1745H in dystrophin is present in the population at a frequency of 43% but is nevertheless predicted to be deleterious.Use of multiple prediction programs, as in this study, can improve overall prediction but nevertheless cannot prove functional significance.High throughput sequencing technologies provide the technical basis to screen candidate genes or the complete exome in large cohorts of patients, However, the interpretation of the results of these studies will require further development of bioinformatics, mutation prediction, and functional screens.By addressing these issues with interpretation of massively sequencing data we will be able to provide genome-guided diagnosis and treatment.

Study limitations
Detection of deletions and insertions with the stringency required for clinical testing remains problematic with massively parallel sequencing.Incorporation of probes specific to previously described deletions and insertions associated with cardiomyopathy may be used to overcome this problem.In addition, further development of efficient and robust bioinformatic approaches addressing novel indel variations will aid mutation identification.The 13 cSNPs identified in this study are predicted to be pathogenic (in silico) as missense mutations (Table 4) but additional studies are required to more fully investigate their effect and combined functions.

Figure 1 .
Figure 1.(A, B) Pedigrees of families A and B; (C) Echocardiographic findings of restrictive cardiomyopathy in CM0002.
N o n -c o m m e r c i a l u s e o n l y a splice-site mutation detected by SeqCap and PCR-enrichment were validated by Sanger sequencing.Primer sets for exons of interest were designed flanking the nucleotide change using ExonPrimer perl script available at Genome Browser UCSC homepage and Primer3 program.More than 9000 bases targeted by both platforms were sequenced by Sanger sequencing and results were analyzed via Bioedit sequence alignment editor, version 6.0.7.Primer sequences, amplicon size, and PCR conditions for individual cSNPs are available upon request.

7 6 1 LR
-PCR, long-range polymerase chain reaction; cSNPs, coding single-nucleotide polymorphisms; SeqCap, sequence capture; Het, heterozygous; Covg, coverage.*Number of nucleotide positions genotoyped by Illumina whole genome SNP array that are also present in the indicated LR-PCR or SeqCap sequencing platforms.°Het loss is defined as a heterozygous position (e.g.A/C) identified by Illumina genotyping that is identified as homozygous by massively parallel sequencing; # Het gain is defined as a homozygous position (e.g.A/A) identified by Illumina genotyping that is identified as heterozygous by massively parallel sequencing; § Allele switch is defined as discrepant SNP calls between Illumina genotyping and massively parallel sequencing (e.g A/C vs G/T).all samples (Tables MAF, Minor allele frequency (HapMap/CEU); N/A, Not identified in 226 ethnically matched control alleles.*PolyPhen, SIFT, and PANTHER were used to assess pathogenicity.Each program that predicted a deleterious change is designated with +.See Supplementary Tables

Figure 2 .
Figure 2. Summary of the total variants detected by sequence capture based massively parallel sequencing in all 3 patients.N o n -c o m m e r c i a l u s e o n l y

Figure 4 .
Figure 4. Novel missense mutations identified occur in highly conserved amino acids.Partial amino acid sequence comparisons of human NDUFAF4, XIRP1, PYGM, CABIN1 and MYH6 with other orthologs.The shaded residues indicate high conservation across species.Variations at protein level are shown above each shaded amino acid.

Conclusions
In summary, we have investigated a novel genetic diagnostic platform for pediatric cardiomyopathy using array-based sequence technology followed by massively parallel sequencing.Furthermore, we compared the performance of targeted array-based and long-range PCR based genomic DNA enrichments for massively parallel sequencing and found arraybased enrichment more cost and time efficient and more sensitive.Through this array-based sequencing, 89 different non-synonymous cSNPs and one splice site mutation were identified including multiple predicted diseasecausing variants in each patient.This study highlights the difficulty of interpreting highthroughput sequencing data with a number of predicted pathogenic variants even in a small sequence capture experiment.N o n -c o m m e r c i a l u s e o n l y

Article Continued from previous page. Genes targeted by capture Total coding Encoded Targeted sequence (bp) on NCBI GenBank Chromosomal array enrichment exons protien (AA) capture custom array accession # location
*Genes also analyzed by long-range polymerase chain reaction followed by massively parallel sequencing.

Table 2 . Summary of long-range polymerase chain reaction versus sequence capture based massively parallel sequencing.
LR-PCR, long-range polymerase chain reaction; SeqCap, sequence capture; Avg, average.