Using Openly Accessible Resources to Strengthen Causal Inference in Epigenetic Epidemiology of Neurodevelopment and Mental Health

The recent focus on the role of epigenetic mechanisms in mental health has led to several studies examining the association of epigenetic processes with psychiatric conditions and neurodevelopmental traits. Some studies suggest that epigenetic changes might be causal in the development of the psychiatric condition under investigation. However, other scenarios are possible, e.g., statistical confounding or reverse causation, making it particularly challenging to derive conclusions on causality. In the present review, we examine the evidence from human population studies for a possible role of epigenetic mechanisms in neurodevelopment and mental health and discuss methodological approaches on how to strengthen causal inference, including the need for replication, (quasi-)experimental approaches and Mendelian randomization. We signpost openly accessible resources (e.g., “MR-Base” “EWAS catalog” as well as tissue-specific methylation and gene expression databases) to aid the application of these approaches.


Epidemiological Evidence Linking Epigenetics and Mental Health
Mental health and neurodevelopmental disorders are under the influence of both genetic and environmental factors. Epigenetic mechanisms regulate gene expression and are potential mediators of both these genetic and environmental effects on mental traits and disorders. Of the known epigenetic processes involved in gene regulation, DNA methylation, which consists of the covalent addition of a methyl group to a cytosine base at CpG dinucleotides, is the most widely studied. The main reason for its popularity is the availability of cost-effective, high throughput laboratory assays that utilise DNA extracted using standard protocols. To date, most epigenetic studies of mental health have measured DNA methylation at the genome-wide level using Illumina Infinium 450K or EPIC arrays in peripheral blood or saliva samples, since these tissues are most commonly available in large studies.

Challenges to Assess Causality
Although there are indications that peripheral DNA methylation could be a plausible mechanism that leads to certain brain-related conditions, causality is often difficult to establish in epigenetic epidemiology. Many studies based on epigenome-wide associations are observational and do not allow for a direct assessment of whether the observed DNA methylation differences are a cause, consequence or confounder for the disease of interest.
Firstly, evidence is often based on studies with small sample sizes without replication. Even if the effects are replicated across studies, they might arise due to similar confounding structures in the data sets, such as the distribution of tobacco smoking behaviours. Even after adjusting for self-reported smoking, residual confounding could still be present due to reporting bias. For example, the association study of DNA methylation on educational attainment has revealed that all sites linked with education have previously been associated with smoking behaviour. Since smoking is often negatively correlated with years of education, this suggests that the observed association between DNA methylation and education is largely due to confounding, rather than describing a causal relationship [21].
Another possible scenario where DNA methylation changes are not causal for a disease arises when the disease manifestation itself causes changes in DNA methylation, also referred to as reverse causation. This could arise in cross-sectional studies, where the samples for DNA methylation analysis are obtained at the same time point as the administration of a questionnaire to assess the outcome of interest, or where the methylation measurement was taken after the diagnosis of a disease was made. For instance, in the large epigenome-wide association study (EWAS) on major depressive disorder, DNA methylation was measured after the diagnosis was made. Hence, based on the association study alone, it is impossible to disentangle whether epigenetic changes are the cause or consequence of the disease [3].
Most human epigenetic studies of mental health are based on peripheral samples. Although in some cases methylation changes occur in CpG sites linked to genes that have relevant brain functions, it is often challenging to relate changes in peripheral methylation to the development of a condition that affects the central nervous system (CNS). This problem is of relevance mainly because DNA methylation in the brain of living individuals cannot be quantified. Post-mortem samples, while rare, only allow the assessment of DNA methylation changes after the disease has manifested [23], as, for instance, in an EWAS of autism spectrum disorder conducted across several brain regions [24]. In this case, epigenetic changes could be confounded by treatment effects, as DNA methylation changes have been reported, for instance, in relation to antipsychotic treatment [25].
The 'gold standard' experimental approach used to seek causal evidence is the randomised controlled trial (RCT). However, this is not a feasible option for DNA methylation research, as it is not yet possible or ethical to undertake an RCT with DNA methylation as the primary controlled exposure. Some studies have taken advantage of RCTs set up with other primary exposures and subsequently measured DNA methylation as a surrogate or intermediate, but these have tended to be serendipitous, relying on RCTs that have collected DNA samples for other purposes (see below for further discussion of this issue).
Animal studies, particularly in the laboratory, have the advantage of allowing for controlled experimental conditions and access to specific tissues other than peripheral blood, therefore avoiding the issue of confounding and the otherwise limited inferences that can be made with respect to tissue specificity. In mouse studies, DNA methylation can, for example, be manipulated by deleting the genes coding for DNA methyltransferases (Dnmt1/Dnmt3a/Dnmt3b), the enzymes that catalyse the transfer of a methyl group to a cytosine nucleotide. A study by Hutnick et al. [26] showed that the deletion of Dnmt1, even when restricted to the forebrain, caused widespread hypomethylation, neuronal degeneration and behavioural impairment in learning and memory. This is in line with other mouse studies, where Dnmt1 deletion seemed to cause increases in anxiety-like behaviour and deleting both Dnmt1 and Dnmt3 led to synaptic abnormalities with functional consequences for hippocampal plasticity [27,28]. These studies indicate a causal link between overall DNA methylation Genes 2019, 10,193 4 of 21 and brain-related traits, however they do not allow for the identification of specific methylation loci within the genome at which the changes in DNA methylation might be exerting their influence. Recently, with the technology of the CRISPR-Cas9 system applied in vivo to laboratory mice, it has become possible to demonstrate that DNA methylation at the FMR1 gene causes the molecular and physiological phenotype of fragile-X syndrome [29]. While fragile-X syndrome has a specific and detectable molecular phenotype (lack of FMR1 protein), the limitation of most animal studies is that many human psychiatric diseases are defined by behavioural traits that can only be partially observed in other species. Most animal models are based on the resemblance of the behavioural symptoms and therefore mostly correspond to a sub-set of symptoms and traits of the modelled human psychiatric diseases, rather than the full disease. Similarly, the pathological mechanisms leading to the human psychiatric conditions might not necessarily correspond to the changes observed in the animal models that only partially mimic the human condition.

Strength and Robustness of the Associations
True epigenetic associations often tend to replicate in population samples with similar characteristics and confounding structures, thus the associations observed could be due to real effects or to other non-causal explanations. To assess the strength and robustness of the associations it is recommended, where feasible, to work collaboratively across multiple studies, as true causal associations ought to be reproduced across studies with different confounding structures. Such collaborations can be achieved within consortia, where several studies with available epigenomic data can contribute to addressing the same research questions according to agreed and standardized analysis plans. Selected examples of such consortia that have been used in the field of epigenetic epidemiology are listed in Table 2 below. For these cross-cohort analyses, it is, however, essential to standardize pre-processing steps, including normalisation, quality checks, and epigenome-wide association study (EWAS) analyses procedures. Data sharing is often a limiting factor in analyses of this type and harmonizing data across studies can sometimes be resource intensive. Software packages have been developed to facilitate such analyses. For example, the meffil R package, which was created to enable cross-cohort harmonization without data sharing, is available for download at https://github.com/perishky/meffil [33].
Where there is no opportunity for collaboration, or the phenotypes of interest are not available in consortia, it is sometimes possible to access DNA methylation data and their association with the phenotype from openly available online repositories, such as Gene Expression Omnibus (GEO) (https://www.ncbi.nlm.nih.gov/geo/). In the GEO repository, data can be downloaded or analysed online with the interactive GEO2R tool [34].
Replicating associations across different datasets also provides an opportunity to verify that results are not due to technical artefacts. Although replication does not necessarily increase the likelihood of associations being causal, it can be a further step in supporting the veracity of the observed association. For instance, investigating the same CpG sites-trait associations across the Illumina 450K or the more recent EPIC array or using different techniques, including pyrosequencing, bisulphite sequencing and qPCR, will strengthen the inferences that can be made with respect to the confidence in true associations.

Experimental and Quasi-Experimental Approaches
The conventional epidemiological design to investigate causality, an RCT requires the participants to be randomly assigned to groups that are similar except for the exposure of interest (here, DNA methylation). Although theoretically it is possible to conduct an RCT of a demethylating agent and assess its impact on a mental health outcome, a targeted manipulation of specific methylation sites is currently not achievable with the available tools.
RCTs are, however, more tractable where methylation is considered as a secondary outcome to investigate the effects of an intervention. For example, RCT designs were exploited to assess the effects of pollution [35] and folate intake [36] on DNA methylation. Linking changes in methylation, which have been identified to be a causal consequence of environmental exposures, to psychiatric disorders could be an interesting and worthwhile extension of such findings.
Natural experiments, where populations are exposed to an unplanned disaster or event, provide valuable data to reveal changes in DNA methylation that are causal for psychiatric conditions. For example, methylation changes due to prenatal exposure to the Dutch famine [37] have been shown to cause changes in mental health in adulthood [38] and suggest that DNA methylation could be a potential mediating mechanism. Similarly, prenatal maternal stress due to a significant ice storm in Quebec in 1998 affected DNA methylation [39] and autism-related traits [40].

Mendelian Randomization
One widely adopted approach to strengthen causal inference is the method of Mendelian randomization (MR), a form of instrumental variable analysis. In MR, the instrument is comprised of one or more genetic variants that are robustly associated with the exposure of interest. As individuals inherit alleles at random, these individuals are assigned to experience a higher-than-average dosage of the exposure.
Mendelian randomization relies on the availability of genetic variants to use as instrumental variables (for a discussion on additional assumptions, see [41,42]). Where genetic variants can be identified that correlate strongly with DNA methylation levels, MR can be applied to study causal effects of DNA methylation on mental health. Depending on the research question, the sample characteristics and data availability, different MR methodologies can be applied, such as one-sample, two-sample, bidirectional, multivariable and two-step MR, the details of which can be found elsewhere [43,44]. Due to limitations in data availability and the computational resources required, MR has predominantly been performed to date on selected methylation loci (e.g., top hits of a robust EWAS), with a few notable exceptions [45,46]. However, with the advent of more detailed data on genetic variants that tag methylation variation, the approach promises to be more widely adopted.

Instruments for Epigenetic Mendelian Randomization Analysis
Potential instruments for DNA methylation are single nucleotide polymorphisms (SNPs) that are strongly associated with methylation at the CpG sites of interest-often referred to as methylation quantitative trait loci (mQTL). These can be found in online databases that have performed genome-wide association studies (GWAS) of DNA methylation ( Table 3). The overwhelming majority of catalogued mQTLs have been derived from populations of European ancestry and are based on peripheral blood DNA, raising the issue of whether the same SNP-DNA methylation relationship is observed in other ethnicities or tissues. Emerging evidence suggests that this assumption might be plausible in some instances [47]. However, as DNA methylation is often tissue-specific, brain-tissue specific databases (Table 3) can be used to identify mQTLs when the hypothesis implies a biological mechanism that acts via changes in brain DNA methylation.
Alternatively, blood-derived mQTLs can be used in MR when an EWAS of a brain-related trait has been conducted in blood and it is plausible that changes in methylation in blood cells are reflected in changes in brain activity, for instance, via circulating hormones that cross the blood-brain barrier (see Section 3.4.1 for a more detailed discussion). Some of the resources listed in Table 3 are based on data from specific developmental periods (e.g., foetal sample, cord blood), however, our ability to use these resources in a developmentally sensitive manner is still restricted and heterogeneity in ethnicity and cell type composition between the target and the reference datasets limits any conclusions drawn from these analyses.
Most mQTLs are cis-associations, i.e., they are located proximal to the CpG of interest. Cis-SNPs have large effects on the CpGs in their proximity, whereas trans-SNPs have smaller effects and tend to act polygenically on several target loci. For these reasons, cis-SNPs, rather than trans, are preferred as instruments for use in MR.

Methodologies in Epigenetic Mendelian Randomization Analyses
If mQTLs are available for the CpGs of interest, they can be used as instruments for MR. In studies where genotypes, DNA methylation data and the outcome (e.g., a mental health trait), are available, it is possible to perform one-sample MR using the 2-stage-least-square regression (Figure 1, top panel). This is easily implemented with the ivreg2 command in the STATA software or the function tsls in the gmm R package (https://cran.r-project.org/web/packages/gmm/index.html) [51]. When this data is not available, a two-sample MR approach can be used (Figure 1, bottom panel). This relies on extracting the genotype-methylation (G-M) summary statistics (beta regression coefficients and standard errors) from one study and the genotype-outcome (G-O) statistics from another, independent study. For one SNP, the causal estimate is the ratio of the genotype-outcome beta coefficient divided by the genotype-methylation beta coefficient. The standard error of the causal estimate is estimated via the delta-method as described in Thomas et al. [52]. When at least three When this data is not available, a two-sample MR approach can be used (Figure 1, bottom panel). This relies on extracting the genotype-methylation (G-M) summary statistics (beta regression coefficients and standard errors) from one study and the genotype-outcome (G-O) statistics from another, independent study. For one SNP, the causal estimate is the ratio of the genotype-outcome beta coefficient divided by the genotype-methylation beta coefficient. The standard error of the causal estimate is estimated via the delta-method as described in Thomas et al. [52]. When at least three genetic variants are available, the G-M/G-O ratio estimates are meta-analysed using standard meta-analysis methods, such as the inverse variance weighted approach with fixed or random effects models. Two-sample MR can be easily performed using the MR-Base online tool (http://www.mrbase.org/) and the TwoSampleMR R package available for download at the github online repository (https:// github.com/MRCIEU/TwoSampleMR) [53]. Similarly, the MendelianRandomization R package performs two-sample MR using existing summary data on genetic associations with exposure and outcome [54]. When several SNPs are available it is useful to choose the MR-Egger model, which provides a test for horizontal pleiotropy and a pleiotropy-adjusted causal estimate [55]. However, this method has lower power and is recommended primarily as a sensitivity analysis. GWAS summary statistics for the G-O associations can be found in several online databases (Table 4). Following this strategy, two-sample MR has recently been applied to test for a causal effect of methylation in the DRD4 gene on physical aggression and did not support a causative link [14].
The direction of the association, if not known a priori, can be queried using bi-directional MR, where both a causal effect of methylation on the trait and a causal effect of the trait on methylation are estimated. Effectively, this procedure involves two MR analyses, requires a set of independent SNPs for each analysis and can be carried out within the one-sample or the two-sample setting.
When the research interest is to estimate the effect of an exposure on an outcome via DNA methylation, to supplement the conventional observational mediation approach, it is useful to adopt an MR strategy that involves two MRs, one from exposure to methylation and one from methylation to the outcome of interest. In the two-step MR approach, the SNPs used as instruments for each step need to be independent. Each MR step adopts the usual assumptions for MR and is performed using the same general principles and methods for MR. This implies that several independent study samples are needed to obtain the summary statistics for the genotype-exposure (G-E), G-M and G-O associations, which can be identified using the resources listed in Tables 3 and 4. Two-step MR has been applied to test the causal role of prenatal nutrients involved in the one-carbon metabolism on schizophrenia via epigenetic changes [60] and to reveal DNA methylation as a mediator between the exposure to prenatal vitamin B 12 and cognitive abilities [61].
Other methods using genetic variants to strengthen causal inference are based on the integration of genome-wide genetic and epigenetic data with the disease of interest, using polygenic risk scores (PRS) for the disease and co-localisation analyses. PRS are defined as the sum of trait-associated alleles across many genetic loci, weighted by the GWAS effect size. Similar to the MR approach, the epigenetic and phenotypic variation associated with PRS is less likely to be confounded by lifestyle exposures such as smoking and environmental factors such as pollution and is less prone to reverse causation. For example, EWAS studies on schizophrenia where PRS rather than diagnosis were used in the analysis have identified DNA methylation differences at novel CpGs [62]. Furthermore, Bayesian co-localisation analysis, where the results of a GWAS of methylation at the CpG sites and the results from an independent GWAS for schizophrenia were compared, supported the hypothesis that some of the genetic variants within the overlapping sites had a regulatory role in the disease via influencing DNA methylation [63]. PRS for brain-related disease can be computed using summary statistics from published GWAS (see Table 4 for a list of resources; to derive polygenic scores, see https: //choishingwan.github.io/PRSice (version 2.1.9), https://www.cog-genomics.org/plink/1.9/score (version 1.9) and [64]). Bayesian colocalization analysis can be performed using existing summary data from mQTL databases and the coloc R package (https://cran.r-project.org/web/packages/coloc/) [65].

A Word of Caution: Mechanism vs. Biomarker
The excitement of obtaining an epigenetic signal that is strong, robust and potentially causal can be exhilarating. However, before deriving conclusions about the 'aetiological mechanism of disease', it is advisable to recall the original aim of the study. Frequently, the aim is to identify causes of disease, which is imperative for interventions to be successful. On the other hand, establishing non-causal associations (often referred to as biomarkers, see below) can be useful in prediction. However, a biomarker can be causal or non-causal. Whether the aim is to identify a causal pathway and/or a biomarker (of risk or of disease) should be set out in the initial stages of the project. Caution is advised with respect to the conclusions that can be drawn from the study design and data in terms of biological mechanisms. The interpretation of results will differ, depending on the underlying assumptions about the likelihood of system-wide effects of the exposure (i.e., genetic or environmental causes of disease), the relationship between the studied tissue and the primary tissue of pathophysiology. In most cases, methylation profiles would have been obtained from peripheral tissues (blood or saliva), with a small proportion of studies using post-mortem brain tissue.
Under the assumption that the causal (but not necessarily initial, see argument below) tissue of pathophysiology is the brain, at least three potential scenarios are possible to describe the relationship between peripheral and CNS methylation profiles: A shared common cause, periphery-mediated or CNS-mediated pathways to disease (left, middle and right panels in Figure 2). Note that a scenario in which DNA methylation is a direct consequence, rather than a precursor, of disease, is an equally likely possibility, but not the focus of the current discussion. A mechanistic interpretation of findings based on peripheral tissue only makes sense assuming that the initial cause of pathophysiology originates in the periphery (Figure 2b,e) or at the very least assuming concordance of methylation patterns across tissues (top panel Figure 2, although see below for additional assumptions).
'Concordance' in this case shall be defined as the consistency in effect of the exposure (i.e., the cause of disease) on DNA methylation across tissue. This is different from 'correlation' of DNA methylation across tissue. For example, relative (but meaningful) perturbations in DNA methylation due to an exposure might be comparable across tissue, while absolute DNA methylation levels themselves are less correlated across tissues (Figure 3a). This assumes that small levels of perturbations can have large effects in some but not in other tissues. Likewise, without knowing what precisely causes cross-tissue correlations in DNA methylation, DNA methylation levels might be correlated across tissue, but the effect of an exposure on DNA methylation in each tissue is different (Figure 3b). Therefore, while correlation of DNA methylation profiles across tissues is often an important indication, it is neither necessary nor sufficient for cross-tissue concordant effects. due to an exposure might be comparable across tissue, while absolute DNA methylation levels themselves are less correlated across tissues (Figure 3a). This assumes that small levels of perturbations can have large effects in some but not in other tissues. Likewise, without knowing what precisely causes cross-tissue correlations in DNA methylation, DNA methylation levels might be correlated across tissue, but the effect of an exposure on DNA methylation in each tissue is different (Figure 3b). Therefore, while correlation of DNA methylation profiles across tissues is often an important indication, it is neither necessary nor sufficient for cross-tissue concordant effects. All too often, cross-tissue concordance and correlation are implicitly assumed and findings are interpreted as potentially mechanistic. However, there is evidence that cross-tissue correlation seems to be the exception, rather than the norm [66]. Concordance of methylation profiles across tissues is hardly ever investigated, due to the difficulty (and cost) in measuring the effect of a risk factor on DNA methylation across several tissues in the same individuals. The notable exception of this is the investigation of tissue-specific mQTLs. For online available resources to investigate cross-tissue concordance and correlation, see Section 3.4.2 and Section 3.4.3. All too often, cross-tissue concordance and correlation are implicitly assumed and findings are interpreted as potentially mechanistic. However, there is evidence that cross-tissue correlation seems to be the exception, rather than the norm [66]. Concordance of methylation profiles across tissues is hardly ever investigated, due to the difficulty (and cost) in measuring the effect of a risk factor on DNA methylation across several tissues in the same individuals. The notable exception of this is the investigation of tissue-specific mQTLs. For online available resources to investigate cross-tissue concordance and correlation, see  Even in the case of cross-tissue concordance, it is easy to overstate risk pathways to disease. In the concordant, common cause scenario (Figure 2a), the tendency is to assume system-wide causal effects, but it might be equally likely that a disease risk factor impacts methylation of the same gene in different tissues independently. In all concordant scenarios (Figure 2a-c), concordant gene function across tissues is presumed, although genes can have different functions in different tissues. For example, assuming that in an analysis based on data from whole blood, a methylation site was identified with a potential relevance for serotonin function. In the periphery, the primary function of serotonin is digestion, while in the CNS, serotonin is mainly involved in sleep and mood [67]. In the 'shared common cause' scenario (Figure 2a), we do not need to focus on digestion-related functions, as these are not likely to be involved in the disease pathophysiology. In the 'periphery-mediated' scenario ( Figure 2b), however, digestion should be a main pathway-of-risk, while in the 'CNS-mediated' scenario (Figure 2c), digestion is, if anything, a downstream pathway of disease. Any mechanistic interpretation of findings depends fundamentally on which scenario is most likely.
When concordance is not assumed (Figure 2d-f), the default position is often that, even though Even in the case of cross-tissue concordance, it is easy to overstate risk pathways to disease. In the concordant, common cause scenario (Figure 2a), the tendency is to assume system-wide causal effects, but it might be equally likely that a disease risk factor impacts methylation of the same gene in different tissues independently. In all concordant scenarios (Figure 2a-c), concordant gene function across tissues is presumed, although genes can have different functions in different tissues. For example, assuming that in an analysis based on data from whole blood, a methylation site was identified with a potential relevance for serotonin function. In the periphery, the primary function of serotonin is digestion, while in the CNS, serotonin is mainly involved in sleep and mood [67]. In the 'shared common cause' scenario (Figure 2a), we do not need to focus on digestion-related functions, as these are not likely to be involved in the disease pathophysiology. In the 'periphery-mediated' scenario ( Figure 2b), however, digestion should be a main pathway-of-risk, while in the 'CNS-mediated' scenario (Figure 2c), digestion is, if anything, a downstream pathway of disease. Any mechanistic interpretation of findings depends fundamentally on which scenario is most likely.
When concordance is not assumed (Figure 2d-f), the default position is often that, even though the epigenetic variation is not likely to be mechanistically involved, it may act as a biomarker of disease risk. However, the precise 'biomarker' definition referred to is often not clear. According to the National Institute of Health Biomarkers Definition Working group, a biomarker is 'a characteristic that is objectively measured and evaluated as an indicator of normal biologic processes, pathologic processes or biological responses to a therapeutic intervention' [68]. While it is beyond the scope of this review to discuss the role of DNA methylation as a biomarker of risk or disease, this term should not be used too lightly. Biomarkers should be easily (in terms of tissue accessibility) and robustly measurable with little measurement error, reproducible across studies (e.g., it is not advised to claim biomarker potential based on a single study without replication) and have predictive power (or alternative advantages, such as reducing costs). Finally, it should be clear what exactly the established biomarker indexes (risk, disease or treatment). While it is often claimed that methylation-based biomarkers have the potential to inform intervention strategies, studies designed to explicitly demonstrate this are rarely seen [69].
It is impossible to test these scenarios ( Figure 2) directly without access to longitudinal and repeated measures of both peripheral and brain tissue in living humans, but their likelihood can be assessed by using tissue-specific causal inference method such as Mendelian randomization (see Section 3.3) and the increasing body of online resources as described in the following sections.

Biological Characterisation
Characterising the biological relevance of an identified methylation site is often part of an epigenome-wide analysis, regardless of whether a potential disease mechanism has been established. While methylation sites are often primarily viewed in relation to the nearest coding gene, it can be equally important to consider DNA methylation in the context of regulation of gene expression via impacting chromatin accessibility and transcription factor binding. For instance, studies have confirmed that DNA methylation around the transcription start site is largely associated with reduced gene expression locally [49]. In a study based on brain samples, DNA methylation and histone modifications were located in regulatory regions and seemed to mediate the association of genetic variants with gene expression [70]. Many of those epigenomic loci were also replicated in peripheral blood samples and were associated with psychiatric diseases, such as schizophrenia and bipolar disorder. To characterize the biological context of a methylation site, the results of an EWAS can first be matched to the annotation file usually provided with the data, or openly accessible online (Illumina 450k and EPIC array annotation are, for example, available via various R packages such as meffil [33]). This will provide CpG information on genomic location, SNPs located in or close to the probe, associated genes and location with respect to the transcription start site of these genes or CpG islands. Furthermore, information is provided on low-or high-CpG density regions associated with Functional Annotation of the Mouse/Mammalian Genome (FANTOM) 4 promoters [71], although the reader should keep in mind that this information was based on human myeloid leukaemia cell lines and is not specific to CNS tissue. Finally, in the annotation file the reader will find information on enhancer elements, DNase I Hypersensitivity Sites, open chromatin regions and transcription factor binding sites (all based on the Encyclopaedia of DNA Elements (ENCODE) data [72]).
Whenever possible, however, querying several databases (see Table 5 for selected resources) is advocated to corroborate results and to summarize all findings to avoid selective reporting. Also, to achieve a more meaningful interpretation of the regulatory nature of the genomic region in question, investigating these regulatory characteristics in a cell-type specific manner is advisable, which can be achieved using ENCODE data (www.encodeproject.org), usually via platforms such as genome.ucsc.edu. For example, DNase I hypersensitivity clusters-indicative of regulatory chromatin regions that are sensitive to cutting by the enzyme DNase-can be viewed for 125 cell types (including cells derived from blood and brain tissue) as part of the ENCODE project. Histone marks and transcription levels are available for up to nine cell lines (including blood, embryonic stem cells and skeletal muscle, among others). Transcription factor binding sites are listed for 161 factors in 91 cell types (for a list on cell types, see here: https://genome.ucsc.edu/cgi-bin/ hgEncodeVocab?type=%22cell%22). Note that information on CNS-specific cell types is not always available but high (or low) correspondence across these diverse cell types could indicate similarly (un-)correlated profiles in brain tissue. For cell-type specific profiles related to brain tissue, a suggestion could be to investigate DNase I and histone mark data from the Roadmap Epigenetics Project (http://www.roadmapepigenomics.org/data/) that assayed ten different brain regions (including the hippocampus, cerebellum and mid-frontal lobe, among others). Note though that DNase I data is only available for foetal brain (not region-specific) and spinal cord tissue. Also note that, to view Roadmap data in the UCSC genome browser, the reader will need to import these tracks via the UCSC Track data hub (https://genome.ucsc.edu/cgi-bin/hgHubConnect) or via http:// www.roadmapepigenomics.org/data/. PsychENCODE is a comprehensive resource with exceptional relevance to brain related traits [73][74][75][76][77][78][79][80][81][82][83]. It provides raw and derived transcriptomic, epigenomic, and genomic data of post-mortem adult and developing human brains, both at the single-cell and tissue level. This dataset also includes measures on (hydroxy-)methylation, is based on up to 2000 individuals and incorporates resources such as GTEx, ENCODE and Roadmap Epigenetics Project, discussed above and elsewhere in this article. Data and results can be downloaded from The PsychENCODE knowledge portal (http://www.synapse.org/pec) and from http://resource.psychencode.org/. After investigating the regulatory nature of the genomic region, it can also be helpful to query whether the CpG itself or the differentially methylated region (DMR) has been implicated in other epigenome-wide analyses, which can be done using a manually curated EWAS catalogue hosted at http://www.ewascatalog.org/.
Finally, it is advised to investigate: (1) Ehether a CpG-of-interest is under genetic control by identifying potential mQTLs, ideally in a tissue-specific manner (see Section 3.3.1 and Table 3 above for a list of resources); (2) whether a genomic region might show epigenetic supersimilarity, i.e., where the similarity in DNA methylation between twins is greater than expected based on shared genetics, as reported by Van Baak et al. [85]; and (3) whether a CpG-linked gene might be imprinted, meaning that the expression of this gene depends on the parental origin. For a list of imprinted genes, see http: //www.geneimprint.com/site/genes-by-species.

Cross-Tissue Comparisons
Cross-tissue correlation (see Section 3.4.1) is an important, but not essential, requirement, even for a mechanistic interpretation of findings (e.g., Figure 2e). In practice, correspondence can be investigated using cell-type specific data on regulatory regions (see Section 3.4.2 and Table 5) and several other openly accessible online resources (Table 6). BECon [86] (https://redgar598.shinyapps. io/BECon/) is based on paired blood and post-mortem brain tissue data from 16 individuals. The user can enter a CpG or gene name to visualize cross-tissue correlation across blood and three brain regions (BA10 (frontal), BA20 (temporal) and BA7 (parietal)). Another online resource with similar functionality is available via https://epigenetics.essex.ac.uk/bloodbrain/, based on matched blood and four post-mortem brain tissues (cerebellum, entorhinal cortex, frontal cortex and superior temporal gyrus) in 74 individuals. These two resources are based on the Illumina 450k array. Methylation data based on bisulphite sequencing are available via MethBase [87] (http://smithlabresearch.org/ software/methbase/) and can be imported via the Track hub option (see Section 3.4.2) into the UCSC genome browser. This resource provides information on methylation levels at individual sites, allele-specific methylation and hypomethylated or hypermethylated regions. Furthermore, MethBase does not only allow for comparisons across cell types (frontal cortex, neural progenitor cells, embryonic stem cells and blood tissue cells in humans), but also across development (from 35 days to 64 years in the case of brain tissue data) and across species (including human, mouse, chimp, dog, zebrafish and plants).
Alternatively, it is possible to test for a tissue-specific enrichment of EWAS probe sets, an option which is currently implemented in eFORGE (http://eforge.cs.ucl.ac.uk/). Relying on data from ENCODE and the Epigenomics Roadmap, eFORGE compares DNase I hypersensitivity site hotspot overlap between an EWAS input list and background probes in a cell-type specific manner.  Table 5 could also be used). There, the authors calculated average cross-tissue methylation for a selected number of CpG sites linked to educational attainment and derived deviation from this average for a range of tissues (including brain tissue). These tissue-specific measures of deviation were then correlated with EWAS test statistics (z-scores). The authors argued that a lack of correlation between EWAS z-scores of educational attainment and tissue-specific derivation (especially in brain tissue, assumed to be the target tissue of interest) indicated an absence of brain-tissue specific effects and might be suggestive of confounding. Of note, this method is based on average methylation levels across tissue and not on correlations (i.e., methylation profiles might be correlated across tissues, but at different absolute methylation levels).
Finally, there is some evidence that the effects of mQTLs on methylation can be stable across tissues [48], although large-scale investigations across a wide range of tissue types (including brain tissue) are still missing. With this in mind, investigating consistency of mQTL effects across tissues (using resources described in Section 3.3.1) can be helpful to obtain some indirect evidence for or against cross-tissue concordance.

Tissue-Specific Gene and Protein Expression
It is generally assumed that DNA methylation influences gene expression. However, this issue is still extensively debated [89] and the absence of a functional effect of methylation of gene expression does not preclude the possibility of a meaningful, causal mechanism. Still, it can be highly informative to investigate whether a gene linked to variation in DNA methylation at a site-of-interest also shows variation in its level of expression in the tissue-of-interest. The following section and Table 7 provide an overview of online resources to assess gene expression profiles by tissue and across development.
The Human Protein Atlas (https://www.proteinatlas.org/humanproteome) is an excellent resource to investigate in which tissues a gene-of-interest is expressed in absolute terms, and also whether the expression of such a gene is elevated in the target tissue relative to average expression levels in all tissues. Lists of whole groups of genes that are preferentially expressed in certain tissues (e.g., n = 1460 genes are listed to show elevated expression profiles in brain tissue relative to all other tissues) can be used to test for enrichment of brain-expressed genes in EWAS results.
The Genotype-Tissue Expression project (GTEx, https://gtexportal.org/home/) provides similar options, listing information on tissue-specific gene expression, regulation and expression quantitative trait loci (eQTL) information. Importantly, the eQTL function allows users to investigate tissue-specific eQTL effects (for example of SNPs that have already been identified to be mQTLs).
To gain insight into gene expression profiles across development, the reader is encouraged to consult the EMBL-EBI expression atlas (https://www.ebi.ac.uk/gxa/home), which displays data from a range of resources (including NIH Epigenomics Roadmap, ENCODE and GTEx).
Several resources are of particular relevance to brain tissue-specific gene expression. The Allen Brain Map portal (http://portal.brain-map.org/) provides a range of useful data, including the Human Brain Atlas and the Developing Human Brain resources. The former is a unique multimodal atlas of the human brain, integrating highly detailed anatomic and genomic information. The user can search for a gene-of-interest and visualize its expression profile in different brain regions using high-resolution, MRI-based 3-D histology scans.
The BrainSpan Atlas of the Developing Human Brain (http://www.brainspan.org) provides information on the human transcriptome (RNA sequencing and exon microarray data) across different brain regions and development. The BrainCloud application informs on genome-wide gene expression and their genetic control in the dorsolateral prefrontal cortex of normal subjects across the lifespan (http://braincloud.jhmi.edu).
The PsychENCODE project combines data from several resources (including GTEx and BrainSpan) to characterize a large spectrum of genomic elements with the human brain, including gene expression as well as multi-QTL maps (for expression, chromatin, transcript expression and cell fraction), enhancers, splice variants and co-expression modules, often specific to cell type, brain region or developmental period. For a more detailed discussion on brain-based resources, see Keil et al [90].
Finally, it is important to note that gene expression levels (either in absolute terms or relative to average levels across tissues) can be misinterpreted. For example, DRD4 (coding for the dopamine D4 receptor) does not appear to be preferentially expressed in brain tissue, but it would be misleading to come to the conclusion that DRD4 has no role in psychopathology, as numerous studies have demonstrated DRD4 functioning to be involved in emotion and complex behaviours such as novelty seeking [94][95][96]. Furthermore, there is a renewed interest in dopamine D4 receptor-based pharmacological treatments for substance use and Parkinson's disease [97]. As highlighted throughout this review, molecular phenotypes including DNA methylation and gene expression vary over time and across tissues, meaning that any measure will be specific to the temporal context at which the sample was taken, thus limiting the inferences that can be made with respect to cause. Integration of expression and other regulatory elements across different brain cell types, regions and developmental periods http://www.psychencode.org/

Gene Ontology Analysis
At last, it can be of interest to carry out an ontology analysis (or, relatedly, pathway or gene property analyses) to investigate whether the most associated CpG probes cluster within distinct biological functions. A plethora of online resources is available for ontology analyses and the reader is referred to excellent reviews on the topic [98,99]. In general, analysis tools with the option to carry out tissue-specific analyses are recommended. For example, FUMA (http://fuma.ctglab.nl, [100]) tests the relationships between tissue-specific gene expression and disease-gene associations, using gene expression data from GTEx and the BrainSpan project, among others. As this resource was primarily designed for genetic data, the user needs to map CpGs first to a gene before carrying out the analysis using the GENE2FUNC option. With this functionality, Linnér et al. [22] reported that genes closest to CpG probes linked to educational attainment were not preferentially expressed in brain tissue, suggesting that findings might have been driven by confounding factors.

Strengths and Limitations
Epigenetic epidemiological studies of mental health and related phenotypes continue to be the focus of much interest with the hope of enhancing understanding of the biological mechanisms underlying the aetiology and progression of psychiatric diseases. However, they still present challenges and limitations.
The platforms to generate data that have been most widely employed sample only a very small portion of CpG sites in the genome. Studies using sequencing-based approaches, such as a recent methylome-wide association study of major depressive disorder that measured DNA methylation in 28 million CpGs, promise to unlock more information on epigenetic variation and will unravel more insights into the role of methylation in mental health [101]. Moreover, while the majority of the current studies focus on CpG methylation, DNA methylation is also present at non-CpG sites, particularly in brain tissue, suggesting a potential role in neurodevelopment and mental health [102]. Methylome sequencing only recently allowed the characterisation of non-CpG methylation in brain tissue [103] but could provide an additional avenue to discover novel effects in relation to neuropsychiatric traits.
Mendelian randomization is proving to be a useful tool to strengthen causal inference and explore molecular mediation by DNA methylation. It does, however, have recognized limitations and is unlikely to provide definitive evidence of causal pathways without triangulation using complementary approaches in epidemiology and other disciplines.
Epidemiological studies of methylation and brain-related processes using peripheral tissue alone may not be able to unravel true biological mechanisms, but the associations found can be translated in useful biomarkers (whether causal or not) for diseases or their progression and therefore are worth investigating. They can also be used to establish how substantial the contribution of genetic factors to variance in methylation is. Also, it is often of interest to know whether a CpG impacts gene expression (or vice versa), even if not causally linked to disease. Finally, these approaches are useful to explain the correlation between peripheral DNA methylation and brain-based processes, even if these processes index (non-causal) disease correlates. Even with the limitation of not necessarily addressing the issue of causal correlates of psychiatric diseases that could be translated into intervention, peripheral epigenetic associations can answer biological questions that ultimately help the understanding of mental health.

Future Perspectives and Conclusions
In conclusion, recently developed openly accessible resources allow epigenetic epidemiological studies of mental health and offer multiple opportunities to understand the aetiology and progression of psychiatric conditions. Future advances in software development specific for epigenetics and statistical methodologies for causal inference as well as large biobanks in multiple complementary populations will substantially increase our understanding of mental health and lead to the generation of reproducible results to inform prevention and intervention strategies.

Conflicts of Interest:
The authors declare no conflicts of interest.