The Genome-Wide DNA Methylation Profile of Peripheral Blood Is Not Systematically Changed by Short-Time Storage at Room Temperature

Background: Epigenetic epidemiology has proven an important research discipline in the delineation of diseases of complex etiology. The approach, in such studies, is often to use bio-banked clinical material, however, many such samples were collected for purposes other than epigenetic studies and, thus, potentially not processed and stored appropriately. The Danish National Birth Cohort (DNBC) includes more than 100,000 peripheral and umbilical cord blood samples shipped from maternity wards by ordinary mail in EDTA tubes. While this and other similar cohorts hold great promises for DNA methylation studies the potential systematic changes prompted by storage at ambient temperatures have never been assessed on a genome-wide level. Methods and Results: In this study, matched EDTA whole blood samples were stored up to three days at room temperature prior to DNA extraction and methylated DNA immunoprecipitation coupled with deep sequencing (MeDIP-seq). We established that the quality of the MeDIP-seq libraries was high and comparable across samples; and that the methylation profiles did not change systematically during the short-time storage at room temperature. Conclusion: The global DNA methylation profile is stable in whole blood samples stored for up to three days at room temperature in EDTA tubes making genome-wide methylation studies on such material feasible.


Introduction
DNA methylation plays a pivotal role in gene regulation and genome stability and has been implicated in numerous conditions when altered at critical genomic loci [1][2][3].DNA methylation predominately occurs at cytosine-guanine dinucleotides (CpG) and profiling hereof in clinical peripheral blood (PB) samples has helped uncover disease-associated changes also for conditions manifested in tissue other than PB, i.e., cancer, heart disease, and mental disorders [1,[4][5][6].However, careful analysis is required when processing DNA methylation data, as the methylation status at CpGs is sensitive to environmental exposures such as pharmaceuticals and smoking, but also changes as a function of age [7][8][9][10].Often larger DNA methylation studies are performed retrospectively on bio-banked material where the initial purpose of collection has been other than epigenetic studies.In such cases, samples may not have been processed immediately after collection.Nevertheless, others and we have previously shown that prolonged storage of dried blood spots at room temperature had little influence on the methylation profile [11][12][13][14][15][16].
The Danish National Birth Cohort (DNBC) comprises, amongst others, hundred thousand maternal PB samples collected at maternity wards around the country and shipped to the State Serum Institute by ordinary mail.Hence, the PB has been at ambient temperatures for up to three days before processing.Inadequate handling and storage of blood samples can have detrimental effects on quality and quantity of extracted DNA [17,18].Also, the integrity of RNA and protein can be affected by storage temperature [19][20][21].Given that the dynamic nature of epigenetic regulation allows for an acute response to extracellular stimuli such as exercise, stress and nutrition factors [22][23][24], it is plausible that specific epigenetic changes can occur in blood cells ex vivo.
In line with this, we here have interrogated the DNA methylome at a genome-wide scale in PB either processed immediately after retrieval or following storage at room temperature (RT) for up to three days.

High Quality Sequencing Data Generated from Time Points up to Three Days
Blood samples collected from five volunteers were equally divided into four tubes and stored at room temperature (RT) for zero, one, two or three days (Figure 1).
Epigenomes 2017, 1, 23 2 of 11 such cases, samples may not have been processed immediately after collection.Nevertheless, others and we have previously shown that prolonged storage of dried blood spots at room temperature had little influence on the methylation profile [11][12][13][14][15][16].
The Danish National Birth Cohort (DNBC) comprises, amongst others, hundred thousand maternal PB samples collected at maternity wards around the country and shipped to the State Serum Institute by ordinary mail.Hence, the PB has been at ambient temperatures for up to three days before processing.Inadequate handling and storage of blood samples can have detrimental effects on quality and quantity of extracted DNA [17,18].Also, the integrity of RNA and protein can be affected by storage temperature [19][20][21].Given that the dynamic nature of epigenetic regulation allows for an acute response to extracellular stimuli such as exercise, stress and nutrition factors [22][23][24], it is plausible that specific epigenetic changes can occur in blood cells ex vivo.
In line with this, we here have interrogated the DNA methylome at a genome-wide scale in PB either processed immediately after retrieval or following storage at room temperature (RT) for up to three days.

High Quality Sequencing Data Generated from Time Points up to Three Days
Blood samples collected from five volunteers were equally divided into four tubes and stored at room temperature (RT) for zero, one, two or three days (Figure 1).Assessment of the extracted DNA showed that yield and quality were comparable at all time points (Figure S1, Table S1).Based on the degree of methylated DNA enrichment by methylated DNA immunoprecipitation (MeDIP), individuals 1, 3, 4 and 5 were selected for sequencing.Sample individual 5 day 3 (Ind5_T3), however, failed the library post-amplification QC and was removed from the sample-set.All sequenced samples generated high quality reads with most Phred scores >30 along the 100 bp read.Clean reads were aligned to the human genome (hg38) using the Burrows-Wheeler aligner (BWA) algorithm and filtered using quality-associated attributes.The filtering resulted in a final dataset of 40-60 million reads per library.Pre-analytical checks included a search for copy number variations (CNVs) (Figure S2), as a potential CNV could lead to false positive read density differences.Taking non-CpG containing reads into account for this analysis no CNVs larger than 1 Mb were detected in any of the samples.Compared to the input all samples displayed the expected enrichment of base coverage in library size normalized 180 bp windows (Figure S3).This enrichment is expected since CpG densities are non-uniformly distributed along the genome, producing coverage peaks in the enriched library.Calculation of CpG enrichment-the ratio between observed/expected CpG content in the MeDIP library and the reference genome-supported the notion of CpG enriched libraries (scores above 1) (Table S2).Furthermore, the global distribution of reads visualized as the sum of reads in the lowest covered percentile of the genome to the highest followed similar trajectories in all samples indicating that the samples could be compared without normalization (Figure S4).Taken together, the analyses indicate that no major technical differences exist between the 15 obtained MeDIP-seq datasets.
A pair-wise correlation using all mapped reads in 500 bp windows displayed Spearman's correlation coefficients between 0.78 and 0.94 among the enriched libraries (Figure 2).Moreover, a Assessment of the extracted DNA showed that yield and quality were comparable at all time points (Figure S1, Table S1).Based on the degree of methylated DNA enrichment by methylated DNA immunoprecipitation (MeDIP), individuals 1, 3, 4 and 5 were selected for sequencing.Sample individual 5 day 3 (Ind5_T3), however, failed the library post-amplification QC and was removed from the sample-set.All sequenced samples generated high quality reads with most Phred scores >30 along the 100 bp read.Clean reads were aligned to the human genome (hg38) using the Burrows-Wheeler aligner (BWA) algorithm and filtered using quality-associated attributes.The filtering resulted in a final dataset of 40-60 million reads per library.Pre-analytical checks included a search for copy number variations (CNVs) (Figure S2), as a potential CNV could lead to false positive read density differences.Taking non-CpG containing reads into account for this analysis no CNVs larger than 1 Mb were detected in any of the samples.Compared to the input all samples displayed the expected enrichment of base coverage in library size normalized 180 bp windows (Figure S3).This enrichment is expected since CpG densities are non-uniformly distributed along the genome, producing coverage peaks in the enriched library.Calculation of CpG enrichment-the ratio between observed/expected CpG content in the MeDIP library and the reference genome-supported the notion of CpG enriched libraries (scores above 1) (Table S2).Furthermore, the global distribution of reads visualized as the sum of reads in the lowest covered percentile of the genome to the highest followed similar trajectories in all samples indicating that the samples could be compared without normalization (Figure S4).Taken together, the analyses indicate that no major technical differences exist between the 15 obtained MeDIP-seq datasets.
A pair-wise correlation using all mapped reads in 500 bp windows displayed Spearman's correlation coefficients between 0.78 and 0.94 among the enriched libraries (Figure 2).Moreover, a clear distinction to the input was observed.On a global level, inter-individual correlation within blood samples is anticipated to be high (>0.8;Pearson's correlation coefficients) even without correction for variables such as sex, age and cell composition [11,[25][26][27][28][29].Yet, if no variable affects the methylation profile, the matched samples from each individual are expected to cluster.An unsupervised hierarchical clustering did not suggest a general origin or storage dependent subdivision although the three samples from individual 5 clustered together (Figure 2).clear distinction to the input was observed.On a global level, inter-individual correlation within blood samples is anticipated to be high (>0.8;Pearson's correlation coefficients) even without correction for variables such as sex, age and cell composition [11,[25][26][27][28][29].Yet, if no variable affects the methylation profile, the matched samples from each individual are expected to cluster.An unsupervised hierarchical clustering did not suggest a general origin or storage dependent subdivision although the three samples from individual 5 clustered together (Figure 2).In line with this, a principle component analysis showed that the first and second component explained only 14% and 8% of the common variation, respectively (Figure 3A).This seems to fit with the notion that variance explained by individual is minor to the variance explained by tissue [30].In line with this, a principle component analysis showed that the first and second component explained only 14% and 8% of the common variation, respectively (Figure 3A).This seems to fit with the notion that variance explained by individual is minor to the variance explained by tissue [30].
Epigenomes 2017, 1, 23 3 of 11 clear distinction to the input was observed.On a global level, inter-individual correlation within blood samples is anticipated to be high (>0.8;Pearson's correlation coefficients) even without correction for variables such as sex, age and cell composition [11,[25][26][27][28][29]]. Yet, if no variable affects the methylation profile, the matched samples from each individual are expected to cluster.An unsupervised hierarchical clustering did not suggest a general origin or storage dependent subdivision although the three samples from individual 5 clustered together (Figure 2).In line with this, a principle component analysis showed that the first and second component explained only 14% and 8% of the common variation, respectively (Figure 3A).This seems to fit with the notion that variance explained by individual is minor to the variance explained by tissue [30].Also, here no overall stratification based on origin or storage time was apparent, whereas the second component could capture sex specific variation.Notably, the higher similarity between T3 and T0, than between T3 and T1 is arguing against a progressive change in methylation over time.A different approach to sliding windows of fixed size is to allow for data defined generation of contigs of variable size based on read enrichment.Based on all samples the contigs defined stratified the samples by individual as displayed by the second and third principal component (PC) (Figure 3B).The first PC explained 39% of the variance and stratified the samples based on sex.

No Storage-Dependent Change in Canonical Methylation Patterns
The read count calculated in 500 bp genomic windows showed that all samples performed well, with a slightly larger spread in Ind1_T1 and Ind1_T2 (Figure S5).One conserved methylation pattern is a general absence of methylation in CpG islands (CGIs).Visualizing the mean read count in 500 bp genomic windows overlapping intergenic or intragenic CGIs (±2 kb) revealed the canonical pattern of an increase in methylation in the shore region proximate to the CGI and the steep decrease in methylation across the CGI (Figure S6A,B) as expected.Displaying the variance at CGI regions in a Box-Whisker plot did not indicate large methylation changes attributed to storage (Figure S7).Likewise, repetitive elements like the Alu and L1 family of Short and Long Interspersed Nuclear Elements (SINEs and LINEs), respectively, are heavily methylated in order to prevent transcription and mobility [31].Relative read count in 500 bp genomic windows across chromosome 1 Alu elements revealed the expected pattern of hypermethylation compared to flanking regions (Figure S8A).Also, pyrosequencing of five CpGs of the AluYB8 subtype did not suggest a time dependent loss of methylation (Figure S8B-F).The coefficient of variance (COV) based on all technical triplicates from all individuals at each time point did not differ notably with time also arguing against a storage-time depending increase in variation.We also analysed the global methylation level across L1 repeats interrogating three CpGs (Figure S9).As for AluYB8, the COV for L1 varied only marginally.The difference in mean methylation comparing T0 and T3 was a decrease of 1.6%, 6.5%, and 2.2% at CpG1, CpG2, and CpG3, respectively, of which none were significant (t-test, p > 0.15).

Stochastic Variation between Time-Points and No Differently Methylated Regions between Groups
With the intent to evaluate if loci-specific changes in methylation patterns are associated with storage at ambient temperature, we looked for differently methylated regions (DMRs) between the grouped individuals as well as between pairwise matched individuals at all time-points (Figure S10).Even with a liberal false discovery rate (FDR)-corrected (Benjamini-Hochberg) threshold of <0.2 no DMRs were present comparing groups.In the pairwise comparison the number of DMRs using the same statistical approach but with a FDR-corrected threshold of <0.05 ranged from 32 to 10118 (Table S3).Plotting the pairwise DMRs in a 2D map, where proximity is based on the proportion of co-occurrence, indicated very low overlap and no apparent clustering based on the individual or storage-time (Figure 4).In order to look for trends across the pairwise DMRs, normalized read counts of each DMR was displayed in a line graph as a function of time (Figure S11).The actual read count difference in DMRs overlapping inter-individually (bottom panel) or temporally (right panel) was small (left y-axis) compared to the mean (right y-axis).Furthermore, many DMRs had non-linear trajectories indicating that these are likely artifacts or stochastic effects.Calculating the CpG density of the MeDIP libraries and DMRs, showed that the DMRs were enriched in CpG rich regions (Figure 5).
Annotating the collective intra-individual DMRs (Ind_3, Ind_4, Ind_5) to genomic features and comparing it to a random set of 500 bp segments of the MeDIP libraries, showed that there was an enrichment in satellite repeats (p < 0.01, Fishers exact test), which was driven by a CpG rich (CpG density of 0.08) 15 kb region on chromosome 16 (16q11.2)(Figure S12).No enrichment was found in promoter regions or gene bodies.
As a stress response may be evoked in ex vivo blood cells, we looked specifically for DNA methylation changes in a stress and toxicity related gene set (Figure S13).With a threshold of FDR <0.2 a single DMR was found at T1, one at T2, and six at T3.The actual difference in read count, however, was subtle and the trajectories across time inconsistent.In order to look for trends across the pairwise DMRs, normalized read counts of each DMR was displayed in a line graph as a function of time (Figure S11).The actual read count difference in DMRs overlapping inter-individually (bottom panel) or temporally (right panel) was small (left y-axis) compared to the mean (right y-axis).Furthermore, many DMRs had non-linear trajectories indicating that these are likely artifacts or stochastic effects.Calculating the CpG density of the MeDIP libraries and DMRs, showed that the DMRs were enriched in CpG rich regions (Figure 5).
Annotating the collective intra-individual DMRs (Ind_3, Ind_4, Ind_5) to genomic features and comparing it to a random set of 500 bp segments of the MeDIP libraries, showed that there was an enrichment in satellite repeats (p < 0.01, Fishers exact test), which was driven by a CpG rich (CpG density of 0.08) 15 kb region on chromosome 16 (16q11.2)(Figure S12).No enrichment was found in promoter regions or gene bodies.
As a stress response may be evoked in ex vivo blood cells, we looked specifically for DNA methylation changes in a stress and toxicity related gene set (Figure S13).With a threshold of FDR <0.2 a single DMR was found at T1, one at T2, and six at T3.The actual difference in read count, however, was subtle and the trajectories across time inconsistent.

Discussion
With an increasing epidemiological interest in the link between environmental exposures, epigenetic changes and health outcomes, the number of studies investigating DNA methylation in clinical samples is increasing.Immediate extraction of DNA from clinical samples is, however, not always feasible due to a lack of human or technical resources at the site of collection.Moreover, sample degradation and ex vivo mediated effects due to improper storage may influence subsequent

Discussion
With an increasing epidemiological interest in the link between environmental exposures, epigenetic changes and health outcomes, the number of studies investigating DNA methylation in clinical samples is increasing.Immediate extraction of DNA from clinical samples is, however, not always feasible due to a lack of human or technical resources at the site of collection.Moreover, sample degradation and ex vivo mediated effects due to improper storage may influence subsequent quantitative molecular analyses [32].Previous studies examining whole blood stored for twelve months at RT in a DNA-protecting solution [17] or placenta tissue stored for two days at RT [33] identified no major DNA methylation changes.Moreover, the feasibility of genome-wide DNA methylation studies using DNA from post-mortem brains [34], as well as archived dried blood spots has also been described before [15,35,36].To our knowledge this is the first report of a genome-wide interrogation of the DNA methylome in matched whole blood samples stored at RT.
All analyzed methylation-enriched samples displayed the expected read density patterns at, e.g., CGIs, L1, and Alu elements.Inter-individual variation in blood methylation was observed to be relatively small, and a group-wise comparison did not return epigenome-wide significant DMRs even at a very relaxed FDR threshold.These results seem to fit and confirm previous findings on room temperature stored EDTA-stabilized blood [17].Furthermore, analysis of samples where reads were clumped into data generated contigs, showed that the largest variation was explained by sex followed by individual of origin.This finding also argues against systematic introduction of methylation changes as a consequence of short-term storage at room temperature.
Methylation levels of the repeat elements SINE and LINE are often used as a surrogate measure for whole-genome methylation levels.We addressed AluYB8 and L1 methylation by both MeDIP and pyrosequencing.Both analyses identified no significant systematic changes in methylation levels over time.Again, this is in accordance with previous studies examining potential DNA methylation changes in placenta tissue and whole blood in response to storage-time [17,21,33].
Intra-individual pairwise comparisons can bypass the confounding effects of genetic variation and blood cell composition.However, there was little overlap between the intra-individually identified DMRs (FDR < 0.05) and the actual difference in read count was limited, being in the range of ±0.4 compared to a mean above 10 on a log 2 scale.However, in general the DMRs were enriched for a high CpG density and in particular for a juxtacentromeric 15 kb satellite repeat containing region on chromosome 16.A previous inter-individual comparison based on MeDIP-seq in monocytes found differences to be highly enriched in repetitive elements and especially in satellite repeat sequences [37], suggesting that these sites are more prone to epigenetic drift.However, our data does not indicate that any broad-scale changes occur at repetitive elements within three days storage at room temperature.
Acute cellular stress has been shown to alter DNA methylation patterns at genes involved in stress response [38].Gene-specific examinations of promoter regions and gene-bodies of 22 genes involved in stress and toxicity pathways including DNA damage, hypoxia signaling and growth arrest, did not reveal any change in the DNA methylation pattern over time.Furthermore, the changes in read count at T 3 are small compared to the mean (<0.2 vs. 10.5 log 2) and likely arise as a consequence of lower sample size and variance.
In conclusion, we find that storage of whole blood at RT for up to three days has no influence on MeDIP-seq quality.Moreover, DNA methylation patterns appear intact with, if any, only minor stochastic changes at CpG rich regions.Thus, making DNA methylation epidemiological studies based on bio-banked blood samples not processed immediately after collection valid.

Blood Samples
Peripheral blood was collected and divided into four EthyleneDiamineTetraacetic Acid (EDTA) tubes (BD Vacutainer, BD-Plymouth, UK) from five adult individuals.From one tube per individual DNA was extracted immediately after collection and stored at −20 • C. Sequential DNA extraction was performed and stored at −20 • C on tubes kept one day, two days and three days at room temperature.Genomic DNA was extracted using Maxwell (Promega, Fitchburg, WI, USA).Concentration and quality of the extracted genomic DNA (gDNA) was measured using the Nanodrop 1000 instrument (Thermo Scientific, Waltham, MA, USA).

MeDIP-seq
Extracted gDNA was sonicated in a Pico Bioruptor (Diagenode, Seraing, Belgium) at 10 ng/µL for 13 cycles of '30 s on-30 s off' to a mean fragment size of 180 bp.Fragment length distribution was assessed by microelectrophoresis using the Qiaxcel instrument and a high-resolution gel cartridge (Qiagen, Hilden, Germany).Sonicated gDNA (555 ng) was further used for end-repair and adaptor ligation employing the NEBNext Ultra DNA Library Prep Kit for Illumina (New England Biolabs, Ipswich, MA, USA).The reaction mix was purified using Ampure XP beads (Beckman-Coulter, Brea, CA, USA) after which methylated DNA immunoprecipitation (MeDIP) was set up on the SX-8G-IP-Star Compact robot (Diagenode) according to manufacturer's instructions applying the Auto MeDIP kit (Diagenode) and including unmethylated and methylated spike-in controls.Effectively 460 ng of adaptor-ligated DNA was used in the MeDIP procedure.Antibody incubation was performed at 4 • C for 15 h.Immunoprecipitated samples were magnetically purified on the robot using the Auto iPure v2 kit (Diagenode) according to manufacturer's instructions.Recovery and enrichment was evaluated by qPCR using primer sets specific for the spike-in controls.Minimum criteria were set to 10% recovery and 25-fold enrichment.Based on the recovery rate samples were PCR-amplified at 9-11 cycles using multiplex oligos (New England Biolabs, MA, USA).Samples were size-selected with Ampure XP beads on the SX-8G-IP-Star Compact robot (Diagenode) using a total of 80 µL beads per sample and eluted in 25 µL DNase-free H 2 O. Purity and fragment length distribution was evaluated on the Qiaxcel instrument.Post-amplification enrichment was verified by qPCR using primer pairs targeting the endogenous hypermethylated promoter region of testis specific histone 2B (TSH2B) (Diagenode, cat.nr.C17011041) or the hypomethylated Transcription Start Site (TSS) of glyceraldehyde 3-phosphate dehydrogenase (GAPDH) (Diagenode, cat.nr.C17011047).

Sequencing and Bioinformatics
Samples were PE91 sequenced on two lanes on a HiSeq4000 instrument (Illumina, San Diego, CA, USA).The raw reads were cleaned using Soapnuke 1.5.0 (BGI, Shenzhen, China) applying the following filters: remove reads with adaptors, remove reads with >10% unknown reads, remove reads with >50% low quality bases (Q score < 5).Clean reads only qualify if Q20% > 85%.In the end, this generated around 53-85 million clean reads per sample.
In SeqMonk (v0.32.1 Barbraham Institute) segments were generated by quantitation of read counts in data defined contigs of variable length displaying enrichment or in 500 bp windows (250 bp overlap) covering DMRs.Filters include subdivision based on defined features (e.g., genes and CGIs).

Figure 1 .
Figure 1.Study design.Peripheral blood was collected from five individuals in parallel, after which genomic DNA was either immediately extracted (T0) or following a storage period at room temperature of one (T1), two (T2) or three days(T3).RT; room temperature.

Figure 1 .
Figure 1.Study design.Peripheral blood was collected from five individuals in parallel, after which genomic DNA was either immediately extracted (T0) or following a storage period at room temperature of one (T1), two (T2) or three days(T3).RT; room temperature.

Figure 2 .
Figure 2. Correlation heatmap.Clustering of the analyzed samples based on the pair-wise Spearman´s correlation coefficient.Color gradient; continues correlation scale ranging from no (dark red) to perfect correlation (dark blue).

Figure 2 .
Figure 2. Correlation heatmap.Clustering of the analyzed samples based on the pair-wise Spearman´s correlation coefficient.Color gradient; continues correlation scale ranging from no (dark red) to perfect correlation (dark blue).

Figure 2 .
Figure 2. Correlation heatmap.Clustering of the analyzed samples based on the pair-wise Spearman´s correlation coefficient.Color gradient; continues correlation scale ranging from no (dark red) to perfect correlation (dark blue).

Figure 3 .
Figure 3. Principal component analysis (PCA) plot.(A) PCA plot of first and second principal component based on all reads in sliding windows of fixed size.(B) PCA plot of second and third principal component based on all reads in data defined contigs.

Figure 4 .
Figure 4. Overlap between pair-wise differently methylated regions (DMRs).Unsupervised clustering of pair-wise DMRs based on overlap.Sphere size is proportional to number of DMRs and proximity an indicator of overlap.

Figure 4 .
Figure 4. Overlap between pair-wise differently methylated regions (DMRs).Unsupervised clustering of pair-wise DMRs based on overlap.Sphere size is proportional to number of DMRs and proximity an indicator of overlap.

Figure 5 .
Figure 5. DMRs enriched for high cytosine-guanine dinucleotides (CpG) density and satellite repeats.Box-Whisker plot depicting CpG density in the pair-wise DMRs compared to the methylated DNA immunoprecipitation (MeDIP) libraries.