Severity of Idiopathic Scoliosis Is Associated with Differential Methylation: An Epigenome-Wide Association Study of Monozygotic Twins with Idiopathic Scoliosis

Epigenetic mechanisms may contribute to idiopathic scoliosis (IS). We identified 8 monozygotic twin pairs with IS, 6 discordant (Cobb angle difference > 10°) and 2 concordant (Cobb angle difference ≤ 2°). Genome-wide methylation in blood was measured with the Infinium HumanMethylation EPIC Beadchip. We tested for differences in methylation and methylation variability between discordant twins and tested the association between methylation and curve severity in all twins. Differentially methylated region (DMR) analyses identified gene promoter regions. Methylation at cg12959265 (chr. 7 DPY19L1) was less variable in cases (false discovery rate (FDR) = 0.0791). We identified four probes (false discovery rate, FDR < 0.10); cg02477677 (chr. 17, RARA gene), cg12922161 (chr. 2 LOC150622 gene), cg08826461 (chr. 2), and cg16382077 (chr. 7) associated with curve severity. We identified 57 DMRs where hyper- or hypo-methylation was consistent across the region and 28 DMRs with a consistent association with curve severity. Among DMRs, 21 were correlated with bone methylation. Prioritization of regions based on methylation concordance in bone identified promoter regions for WNT10A (WNT signaling), NPY (regulator of bone and energy homeostasis), and others predicted to be relevant for bone formation/remodeling. These regions may aid in understanding the complex interplay between genetics, environment, and IS.


Introduction
Adolescent idiopathic scoliosis (IS) is a three-dimensional spinal deformity affecting 1-3% of otherwise normal prepubescent and adolescent individuals [1,2]. Screening programs, conservative treatment, and surgical care in the case of progressive curvatures impose significant personal, familial, financial, and societal costs across the lifetime of affected individuals. The etiology of IS remains unknown. However, it has been shown to have a strong familial component [3] with a sibling recurrence risk of 18%, and heritability estimates of approximately 87.5% [4][5][6].
One frequently studied mechanism of epigenetic regulation is DNA methylation in which a methyl group is added to the cytosine nucleotide within a DNA sequence. Typically occurring in a cytosine phosphate guanine (CpG) dinucleotide pair, methylation has the capacity to change chromatin structure and alter transcription factor binding [66]. This is a reversible event that may provide a link between genetic variation, environment, and disease [67,68]. Although tissue-specific, up to 80% of the variation in the epigenome may be due to genotype [69], therefore a challenge among epigenome wide association studies (EWAS) is determining whether the observed epigenetic phenotype association is due to environmental or genetic effects. Studying monozygotic (MZ) twins, is one way to minimize this concern. MZ twins discordant for the phenotype of interest are near perfect genetic matches, therefore their DNA methylation levels can be compared to shed light on the phenotypic expression of the disease.
Previous epigenome-wide association studies have provided evidence supporting the role of DNA methylation in numerous complex musculoskeletal diseases including osteoarthritis [70], osteoporosis [70], cerebral palsy [71], and Paget's disease of bone [72]. The role of DNA methylation in IS has not been well studied. Targeted studies have reported associations between methylation and IS near the COMP [73] and PITX1 [74] genes. Two recent studies [75,76] of ESR1 and ESR2 methylation from paravertebral muscle tissue in females with IS supports potential interrelationship between sex hormone levels, methylation, and the clinical manifestation of IS. ESR1 methylation levels from paravertebral muscle tissue on the concave side of curve were associated with curve severity [75], and furthermore, ESR2 promoter methylation levels differed between concave and convex sides of the curves [76]. Epigenome-wide discovery analyses in MZ twins are limited to studies including only one [77] and two [78] MZ twin pairs discordant for IS. There is a strong need for additional epigenome wide analyses to understand the potential role of DNA methylation in IS. Therefore, the aim of this EWAS was to identify differences in DNA methylation levels between monozygotic twin pairs discordant for IS. Within twin pairs, we also aimed to determine if differences in methylation were associated with differences in curve severity.

Study Population
Peripheral whole blood samples were obtained from 8 female monozygotic twin pairs (n = 16 individuals) diagnosed with idiopathic scoliosis (IS). Participants were identified from an existing registry, the Genetics of Idiopathic Scoliosis project (the GenesIS project), that has been described by Baschal et al. [19]. A diagnosis of IS required that subjects had no congenital deformities or other co-existing genetic disorders and a standing anteroposterior radiograph showing a curvature of at least 10 • by the Cobb method [79]. There were 6 twin discordant twin pairs (difference in primary curve Cobb angle >10 • ) and 2 concordant twin pairs in our study population. The difference in the primary curvature among the two concordant twin pairs was <1 and 2 • , respectively (Table 1). Written informed consent and assent, when appropriate, was obtained from all study participants and/or their legal guardians in accordance with protocols approved through the Johns Hopkins School of Medicine Institutional Review Board and the University of Colorado Anschutz Medical Campus Institutional Review Board.

DNA Methylation Processing
Genomic DNA was extracted from fresh whole blood using a standard phenolchloroform purification procedure [80]. DNA was further purified using the Zymo DNA Clean & Concentrator kit, followed by Nanodrop quantification. Approximately 1 µg DNA was bisulfite converted using the Zymo EZ DNA Methylation kit (Zymo Research, Irvine, CA, USA). The precipitated DNA was dispensed onto the Infinium MethylationEPIC 850 K BeadChip (Illumina). The EPIC chip provides methylation measurements across the genome. All sequencing was performed at the University of Colorado Genomics and Microarray Shared Resource. Twin pairs were consistently arranged in sequential order on the plate to minimize within and between batch effects.
The 850 K Infinium platform includes 866,836 annotated probes representing individual cytosine phosphate guanine (CpG) probe sites. Data were normalized using the SWAN normalization method implemented within the minfi R package [81]. Standard quality control checks were performed at both the sample and probe level using the minfi R package [81]. Probes or samples that failed any of the following standard filtering procedures were treated as missing data: probe was not detectable above background noise (n = 776 [<0.1%]), probes with a low bead count (>5% of samples with a beadcount <3) (n = 17,272 [2%]), cross reactive probes (n = 17,028 [2%]), dropped during initial quality control processing (n = 528 [<0.1%]). Probes located on sex chromosomes (n = 15,648 [1.8%]) and probes that included known SNPs (n = 165,678 [19.1%]) were also excluded from the analysis. Consistent with recommendations of Logue et al. [82], we filtered low variability probes known to be associated with poor reproducibility. We filtered out all probes with β range value < 0.05%, n = 145,163 (16.7%). In total, 504,743 probes met the inclusion criteria and were included in subsequent steps.
The microarray methylation measurements were performed in two batches. The ComBat function implemented in the sva R package [83] was used to adjust the SWAN normalized M values for potential batch effects. The batch adjusted M-values were used in all statistical analyses. The current Illumina annotation for the 850 K platform is on hg19. Only probes which match 100% to a single location were used for further analyses.
We used three analytical approaches to identify individual CpG sites ( Figure 1): (A) we performed a discordant differentially methylated position (DMP) analysis, testing for differences in methylation (batch adjusted M-values) between six twin pairs where the difference in the primary spinal curvature Cobb angle was >10 • , (B) Using all twins, we tested the association between within twin pair differences in methylation (∆ methylation = methylation levels in affected/more severe twin-methylation levels unaffected/less severe twin) and differences in curve severity within twin pairs (∆ curve = primary curve magnitude in affected/more severe twin-primary curve magnitude is unaffected/less severe twin), (C) We tested for differences in methylation variability between the discordant twin pairs (Differentially Variable Position [DVP] analysis). The methods workflow is outlined in Figure 1. Differentially methylated position (DMP) analyses were used to identify single probes. We used two DMP strategies (1) Discordant DMP Analysis (A), differences in methylation between cases (twin with more severe IS) relative to controls (twin with less sever IS) (2) Severity DMP Analysis (B) looked at association between difference in curve severity and methylation ∆ curve vs ∆ methylation , where ∆ methylation = methylation levels in affected/more severe twin-methylation levels unaffected/less severe twin and ∆ curve = primary curve magnitude in affected/more severe twin-primary curve magnitude is unaffected/less severe twin. We also looked at differences in variability at single probes (differentially variable position analysis, DVP) among cases compared to controls (C). Region analyses based on single probes from (A,B) were used to identify regions of consistent methylation effects within promoter regions. We only considered regions with 5 or more probes where direction of effect was consistent across all probes. DVP probes (C) were not considered in the region analysis due to challenges interpreting a 'consistent' direction of effect based on variability.
Methylation is tissue specific. Confounding due to differences in cell composition between cases and controls is a potential concern in epigenome-wide association studies [84]. Cell proportions were estimated from methylation values using the minifi [81] R package. The distribution of CD8T, CD4T, B Cells, natural killer Cells, monocytes, and neutrophils was similar in cases compared to controls (Appendix A, Table A1). Due to the small sample size and similar distribution of cell proportions across all individuals, we did not adjust for cell type in subsequent analyses.

Statistical Analysis
We used descriptive statistics to summarize the demographic and clinical characteristics of all subjects included in the study. Wilcoxon signed rank tests were used to compare the distribution of the five cell types between case and control twins (Table 1 and Appendix A, Table A1). Normalized M-values were used in all analyses. β-values and percent methylation were also reported to facilitate biological interpretation. In the discordant DMP analysis, paired t-tests were used to test for differences in M values between discordant twins (n = 6 pairs, n = 12 individuals). The individual with the more severe curvature was designated as the "case", and the corresponding twin with the less severe curvature was designated as the "control." For the DVP discordant analysis, a regularized version of Bartlett's test was used to identify differentially variable probes between discordant twin pairs. For the curve severity DMP analysis involving all twin pairs, linear regression models were used to test the association between differences in methylation (∆ methylation ) and differences in curve severity within twin pairs (∆ curve ). To account for multiple testing, false-discovery rate (FDR) adjusted p values were estimated using the algorithm described by Benjamini and Hochberg [85].
An exploratory analysis was used to identify differentially methylated regions (DMR) using the mCSEA [86] R package among promoter regions with a minimum of 5 probes. Significance was assessed based on 100,000 permutations. The DMR analysis was implemented for the discordant DMP and the curve severity DMP analyses. Based on the exploratory nature of the DMR analysis, only regions where 100% of probes were in the same direction of effect and the FDR adjusted p value was < 0.05 were considered significant.
Functional gene overrepresentation analysis was performed based on the results of the discordant DMP and curves severity DMP analyses. The top DMRs, nominal p value <0.001, were included in the DMR overrepresentation analysis. Overrepresentation analyses of the severity and discordant gene lists were conducted using PANTHER v16.0, test release 20,210,224 (PantherDB.org) [87][88][89]. Custom gene background inputs were used in accordance with gene promoter region DMRs analyzed within the Infinium Human-Methylation EPIC Beadchip platform. The Annotation Data Sets Gene Ontology (GO) Cellular Component Complete, GO Molecular Function Complete, and GO Biological Process Complete were analyzed separately, each using a Fisher's Exact Test and Bonferroni correction. We report overrepresented GO terms with a Bonferroni-adjusted p < 0.05. Significant terms were reviewed; parent terms were manually removed and REVIGO [90] was used to eliminate redundant terms.

Prioritization of Candidates: Blood Methylation Correlation
While the tissue of origin for IS has not been determined, the primary clinical manifestation is that of the bony spinal column. Ebrahimi et al. [91] previously reported on methylation levels in whole blood versus trabecular bone. We utilized these data to prioritize our methylation candidates as ones that may have a functional role in bone. We reviewed the distribution of correlation coefficients, representing the strength of association between methylation levels in blood versus bone, among all probes evaluated by Ebrahimi et al. [91] Probes were considered strongly correlated if the correlation coefficients exceeded the 75th percentile among all probes evaluated by Ebrahimi et al. [91] (ρ = 0.49). We then reviewed the correlation coefficients for probes identified as candidates in our DMP and DMR analyses. For the DMR analysis, we reported the maximum correlation coefficient among all probes in the DMR as well as the percentage of strongly, positively correlated probes across the entire region.

Demographics
The study population included 6 discordant and 2 concordant female monozygotic twin pairs with idiopathic scoliosis (IS, see Table 1). The average age among all individuals at the time of blood acquisition was 37.3 years (±22.5). The average Cobb angle of the primary curve was 39.6 • (±15.3). The average difference in age between twin pairs at the time of sample acquisition was 1.2 months (range: 0 to 6 months). The average difference in curve severity among all twins was 19 • (range: 0 to 44 • ). Among discordant twins, the average difference in curve severity was 25 • (range: 11 to 44 • ).

Discordant Curvature Analysis
In the discordant analysis, none of the individual CpG probes were significant at the FDR adjusted p = 0.10. Differentially methylated region (DMR) analyses identified 200 promoter regions (containing 5-14 CpG sites) that were significant at the FDR adjusted p value of 0.05. Among these regions, 58 DMRs included probes/sites where the direction of effect (hypermethylation or hypomethylation) was consistent across 100% of the probes (Appendix A, Table A2). The most significant DMR represented a region on chr. 14 in the promoter region for the BCL2L2-PABPN1 gene (FDR adjusted p = 0.0113, see Figure 2). Using the Panther enrichment algorithm with these 58 DMRs, we identified 1 significantly enriched gene ontology (GO) term (Appendix A, Table A3). In addition to differences in methylation levels, we looked for differences in methylation variability, which may provide valuable information about the heterogenous environmental exposures that contribute to disease etiology. In the differentially variable position (DVP) analysis, methylation variability at cg02477677 was significantly lower (FDR adjusted p value = 0.0791) in cases with a more severe curve compared to unaffected or less severely affected controls (Figure 3). The cg02477677 CpG probe is an open sea probe on chr. 7 near DPY19L1.

Curve Severity Analysis
We also tested whether the difference in methylation between cases and controls was associated with the difference in curve severity between cases and controls. We identified 4 CpG sites where the difference in methylation was significantly associated (FDR adjusted p value = 0.0753) with the difference in curve severity (Figure 4). At each of these open sea probes, increasing disparity in curve severity between cases and controls was associated with a pattern of hypomethylation. The differentially methylated region analyses identified n = 197 promoter regions (ranging from 5 to 34 CpG sites) significant at the FDR adjusted p value of 0.05. Among these, 28 regions included probes where the direction of effect (difference in curve severity was either positively or negatively associated with the difference in methylation between twin pairs) was consistent across 100% of the probes (Appendix A, Table A4). The top DMR consisted of 34 probes on chr. 20 within the promoter region for the NNAT gene (FDR adjusted p value = 0.0237, Figure 5). Using Panther, we identified 15 significantly enriched ontologies (Appendix A, Table A5). The top biological process terms included pituitary gland development (GO:0021983) and anterior/posterior pattern specification (GO:0009952).

Candidate Prioritization
Ebrahimi et al. [91] conducted an epigenome-wide analysis to measure the correlation between methylation levels in whole blood and trabecular bone. We used these correlation coefficients to prioritize the methylation candidates identified in our study. Among the four probes identified as candidates in our DMP analysis (cg02477677, cg12922161, cg08826461, and cg16382077), only one probe, cg08826461, was strongly correlated with bone tissue (ρ = 0.494, FDR adjusted p value = 0.41329). Among DMRs, we prioritized candidate regions where either one or more probes within the DMR was significantly correlated with bone (FDR adjusted p value of less than 0.10), or, greater than 50% of probes included in the region were strongly correlated with bone. We identified 13 priority regions based on the discordant DMR analysis and 8 priority candidate regions based on the severity DMR analysis (Table 2).

Discussions
We utilized an epigenome-wide association study (EWAS) in monozygotic (MZ) twins to identify individual methylation sites and regions across the genome relevant to idiopathic scoliosis (IS). We identified a single CpG site where methylation variability was different between discordant MZ twins and identified CpG sites where increasing curve severity was more often associated with hypomethylation. Differentially methylated region (DMR) analyses identified multiple regions potentially indicative of unique methylation changes within twin pairs discordant for IS as well as unique methylation patterns associated with curve severity. Integration of a peripheral blood/bone methylation dataset allowed us to prioritize regions and sites based on their potential relevance to the IS disease process in bone. Collectively, these results highlight both new and previously reported pathways related to IS curve progression including those involved in neurogenesis and body segmentation.
Differential variability in methylation represents large shifts in methylation that may reflect differential epigenetic and/or environmental effects in cases relative to controls. Differential variability analyses in disease-discordant monozygotic twins have been used to identify DNA methylation signatures associated with Type 1 Diabetes [92] and rheumatoid arthritis [93]. In our analysis, variability at cg0247767, chr. 7 near DPY19L1, was significantly different between discordant twin pairs. DPY19L1 is a transmembrane protein localized in the endoplasmic reticulum that regulates neuronal migration and extension during development [94,95]. Zebrafish with mutations within this gene demonstrate spinal axial curvatures [96]; however, the phenotype has not been studied in detail.
We also identified four individual CpG sites that were associated with curve severity (cg02477677, cg12922161, cg08826461, and cg1638077). At these sites, an increase in curve severity tended to be associated with a decrease in methylation (hypomethylation) within the twin pairs. One of the probes, cg12922161, maps to a location near LOC150622/SILC1, a non-coding RNA gene. Although the function of this non-coding RNA is not well known, it has been shown to regulate neuron outgrowth and neuroregeneration via cis-acting activation of the transcription factor SOX11 [97]. To date, select non-coding RNAs as non-protein coding regulatory transcripts within the genome have been hypothesized to functionally participate in the initiation and progression of IS [98]. The cg02477677 probe was also associated with curve severity. This probe maps to a region near RARA on chr. 17, which encodes a transcription factor for the retinoic acid receptor protein During development, RA signaling plays an essential role in embryonic body axis extension, left-right somite synchronization, and limb development [99]. It is a central mechanism underlying bilateral symmetry during development of the mouse embryo [100]. Right-left asymmetries have previously been hypothesized as a potential contributing factor to IS based on the increased prevalence of IS among individuals demonstrating vestibular and posterior basicranial morphological asymmetries in MRI cross-sectional studies [101][102][103].
Regions of differentially methylated probes (DMRs) may have more important functional implications than methylation levels at a single CpG site, particularly in promoter regions which are areas of the genome where methylation levels tend to be negatively correlated with gene expression [67,68]. Based on the discordant analysis, we identified 58 significant DMRs in known promoter regions, the most significant region included methylation sites with the promoter region for the BCL2L2-PABPN1 gene on chr. 14. BCL2L2-PABPN1 is a paralog of PABPN1, which is associated with the development of oculopharyngeal muscular dystrophy, a disease characterized by muscular weakness in eyelids, pharyngeal musculature, and limbs [104]. In the curve severity analysis, we identified 28 significant DMRs in known promoter regions, the most significant included probes with the promoter region for the NNAT gene on chr. 20. This gene is important for brain development and implicated in neurodegenerative diseases including anterior horn disease [105]. The paternal copy of the NNAT gene is exclusively expressed due to imprinting [106,107]. This is potentially relevant to IS given the sex bias of progressive curvatures (females > males), and the higher percentage of affected offspring from paternal IS cases compared to maternal IS cases (80% vs. 56%) [63].
Enrichment analyses of the top DMRs from both the discordant and curve severity results revealed both broad, non-specific ontologies and select ontologies related to neurogenesis, axon guidance and neuron differentiation, all of which can be supported by current literature from both family-based exome sequencing and genome-wide association (GWAS) studies [49,50]. Combined with results in the current study, these data suggests a potential role of neuropathological processes underlying the development and progression of IS.
The complex genetic architecture underlying IS is further complicated by the lack of a clear tissue target. Despite the major clinical manifestation and therapeutic dilemma of the spinal curvature, IS ultimately affects multiple tissue types, one of which is bone. To understand the potential functional relevance of our methylation results in osseous tissue, we reviewed overlap between our identified CpG sites and those reported by Ebrahimi et al. [91] correlating the same methylation markers in matched blood and trabecular bone samples. We identified 21 regions where the DNA methylation in our dataset was correlated with methylation in bone tissue, and therefore, could potentially be considered biologically relevant ( Table 2). Annotation of two of the regions implicated the NPY gene on chr. 7 and the WNT10A gene on chr. 2. The NPY gene encodes a neuropeptide expressed throughout the central and peripheral nervous systems [108] and is an essential regulator of bone homeostasis and metabolism [109]. NPY is also a local regulator of osteoblastic lineage and is responsive to mechanical stimuli with potential roles in fracture healing and osteoarthritis [109]. The WNT10A gene is a member of the WNT gene class and functions within the WNT10A/β-catenin signaling pathway in regulation of adult epithelial proliferation [110,111], mesenchymal stem cell regulation by stimulating osteoblastogenesis [112], coordination of vertebrate segmentation, and motile cilia function [113].
The role of DNA methylation in IS has not been well studied in current literature outside of two targeted studies and extremely small genome-wide discovery analyses. Mao et al. [73] reported IS cases were associated with increased methylation near the promotor region for the COMP gene on chr. 19 and more importantly, decreased COMP expression. Shi et al. [74] identified significantly higher levels of methylation in IS cases versus controls in a region near the pituitary homeobox-1 (PITX1) gene, a homeobox transcriptional regulator that plays a role in maintenance of side-to-side musculoskeletal symmetry during development [114]. In our study, methylation levels in the promoter regions for the PITX1 and/or the COMP gene were not differentially methylated in the discordant and/or curve severity analysis. However, pituitary gland development (GO:0021983) and the anterior/posterior pattern specification (GO:0009952) ontologies represented the top 2 most enriched terms in the curve severity DMR analysis (see Appendix A, Table A5).
Meng et al. [78] and Liu et al. [77] conducted the only other IS EWAS studies in the current literature. They used a similar strategy, testing for methylation differences in peripheral blood samples in a discovery cohort (1 and 2 MZ twin pairs, respectively) discordant for curve progression. A second cohort consisting of individuals with IS versus controls was used to confirm methylation sites or regions identified in the discovery cohort. Meng et al. [78] identified a single probe cg01374129 (near the HSA2 gene) that was significantly hypomethylated in the progressive group compared to the non-progressive group. Liu et al. [77] identified a DMR near the promoter region for the NDN gene that was significantly associated with IS. These studies were limited in that the discovery EWAS was performed in a very limited number of individuals. Our study builds on these initial findings with added methylation data from MZ twins both concordant and discordant in their spinal curves. Our complementary analyses provide a list of candidate sites and regions across the genome that may assist in the development of prognostic tools capable of identifying individuals at risk for curve progression. Methylation is tissue specific, we were also able to prioritize hits identified in our analysis based on known correlation with methylation in bone tissue. Additional validation in larger cohorts is needed to confirm these methylation markers as relevant and explore their utility in the clinical setting.

Limitations
Our study includes several limitations. First, the samples were obtained after disease onset. We cannot exclude the possibility that differences in methylation were caused by changes in curve severity. Although samples within each twin pair were obtained no more than six months apart and thus age at sample acquisition was balanced across the twin pairs, there was substantial heterogeneity in age at sample acquisition across the twin pairs. This is potentially problematic if age modifies the effect of methylation on curve progression. Similarly, we used a case-control design. Cases and controls were defined based on curve pattern information available at the time of sample acquisition. Misclassification of controls is possible if spinal progression occurred over the lifetime of the individual.

Conclusions
A better understanding of the genetic, epigenetic, and environmental factors underlying IS onset and/or curve progression has significant clinical implications [115,116]. DNA methylation markers may provide value as a prognostic tool for predicting both the initiation and progression of this disorder and furthermore, may also aid in the identification of homogenous subgroups of individuals allowing for more personalized treatment algorithms. In the current study, we identified methylation at specific sites across the genome. Differentially methylation region (DMR) promoter enrichment analyses identified several biologically relevant ontologies related to pituitary gland development, body segmentation and neuronal differentiation. We prioritized the DMR candidates based on known correlation between methylation in blood versus bone. Priority candidates include DMRs in promoter regions related to the WNT signaling pathway (WNT10A), a signaling pathway that is relevant to bone formation and remodeling [117], and neuropeptide Y (NPY), a regulator of bone and energy homeostasis [109]. This information allows for further targeted studies aimed at understanding the functional relevance of these findings in relation to IS and axial spinal development, alignment, and side-to-side symmetry.