The Mutagenic Consequences of DNA Methylation within and across Generations

DNA methylation is an epigenetic modification with wide-ranging consequences across the life of an organism. This modification can be stable, persisting through development despite changing environmental conditions. However, in other contexts, DNA methylation can also be flexible, underlying organismal phenotypic plasticity. One underappreciated aspect of DNA methylation is that it is a potent mutagen; methylated cytosines mutate at a much faster rate than other genetic motifs. This mutagenic property of DNA methylation has been largely ignored in eco-evolutionary literature, despite its prevalence. Here, we explore how DNA methylation induced by environmental and other factors could promote mutation and lead to evolutionary change at a more rapid rate and in a more directed manner than through stochastic genetic mutations alone. We argue for future research on the evolutionary implications of DNA methylation driven mutations both within the lifetime of organisms, as well as across timescales.


Introduction
DNA methylation is an epigenetic modification that influences the regulation of transcriptional activity and gene expression, leading to changes in phenotype. It is both persistent across rounds of cellular replication as well as responsive to endogenous and environmental cues to both stably and flexibly shape phenotypes [1] ( Figure 1A). It is ubiquitous and fundamental to survival, required for many developmental processes, and essential for flexibility in many physiological, behavioral, and morphological traits [2]. However, this mechanism that helps shape phenotypes has an additional evolutionary consequence: it can act as a potent mutagen, having the potential to alter the phenotype outright as well as constrain future DNA methylation at that site [3,4]. DNA methylation is observed across all three domains of life: bacteria, archaea, and eukarya, as well as in many viruses [5][6][7]. The mechanisms by which methylation is added and removed throughout the genome has changed through evolutionary time, leading to differences in the location, frequency, and function of DNA methylation across taxa [7,8]; as a result, the effect of methylation on mutations has also diverged across evolution. Here, focusing on vertebrate methylation as an example, we synthesize literature on DNA methylation and assess how it may impact organisms within and across generations-through changes in methylation status itself as well as how it may contribute to molecular evolution. First, we describe the structure, machinery, and mechanisms governing DNA methylation, the pathways by which it can impact phenotype, and how it leads to variation in traits within and among individuals. We then explore the implications of DNA methylation both within and across generations. Finally, we consider the fundamental nature of DNA methylation as a mutagen and how methylation driven mutations may promote rapid evolutionary change.  The environmental factors that shape DNA methylation patterns in individuals are numerous and occur throughout the lifetime of an individual: these vary from maternal hormones derived during development to predation events and resource availability (Panel (A)). Further, exposure to environmental stimuli can lead to variable methylation patterns at particular CpG sites through life, which can lead to variable phenotypes (denoted by two-directional arrows in Panel (B)). If environmental stimuli and resulting methylation are stable across generations (Panel (C), top), the CpG site is more likely to become methylated and mutate over time, possibly fixing that phenotype; if, on the other hand, environmental stimuli vary between generations (Panel (C), bottom), methylation is more likely to continue to regulate the phenotype at that CpG site. Here, we use a fictitious example in which puffins are exposed to either variable temperatures (Panel (C), bottom) or stable temperatures (Panel (C), top). When the environment is cold, more CpG sites are methylated and high levels of methylation confers the orange feet phenotype, an advantageous trait in cold environments. When the environment warms, fewer CpG sites are methylated and foot color becomes blue, an advantageous trait in warmer environments. If temperature persistence across generations leads to predictable methylation of CpG sites, this could lead to CpG to TpG or CpA mutations (denoted by a red star in Panel (C), top). These mutations may fix the orange foot phenotype, even in the absence of the colder temperatures (i.e., genetic assimilation); this represents the loss of epigenetic potential, leading to less flexibility in phenotype. In cold environments, this fixation of the phenotype could be advantageous if offspring are "primed" for predicted environmental conditions. However, if the environment changes and becomes warmer over time, this once advantageous mutation might become deleterious. If there is exposure to a variety of different environmental cues between generations, CpG sites may not be lost as methylation was variable (denoted by yellow boxes in Panel (C), bottom). Consequently, epigenetic potential is maintained, and the capacity for phenotypic plasticity is preserved. Note: the figure and phenotypes (e.g., foot color) represent a fictional example; there is no evidence that foot color is derived via epigenetic mechanisms, as depicted. This figure was created by Katie Brust.

DNA Methylation: Structure, Machinery, Mechanism, and Regulation
There are tens of thousands of protein coding genes in the vertebrate genome which must be activated and suppressed specifically and accurately across cells, tissues, time, and context [9,10]; DNA methylation and demethylation help regulate such activation and suppression. In vertebrate genomes, methylation most typically occurs at CpG sites, which confer both genetic and epigenetic information: CpG sites are not only encoded in the DNA sequence but are also interpreted for the presence or absence of methylation [11,12]. CpG sites are commonly found in gene promotors, intergenic regions of genes, transposable elements, and repetitive motifs. CpG sites near transcription start sites in gene promoters are generally hypomethylated across vertebrate taxa [13,14], whereas CpG sites in intragenic regions of actively transcribed genes, transposable elements, and repetitive sequences are generally methylated [1,[15][16][17][18][19][20]. DNA methylation at a CpG island in the promoter region of a gene can influence the affinity for transcription factor binding The number of CpG sites in a gene or genome represents its epigenetic potential, or the latent capacity for DNA methylation to occur and contribute to phenotypic plasticity [29]. Each CpG site represents a place where methylation, and consequently gene regulation, can occur. DNA methylation is catalyzed by a suite of enzymes, most notably the DNA methyltransferases (DNMTs), which transfer a methyl group from S-adenosyl methionine (SAM) to the 5 position of a pyrimidine ring of cytosines in a CpG site. With de novo methylation, methyl groups are added most frequently by DNMT3A and DNMT3B [30]. DNMT3A methylates imprinted genes and repeated elements, whereas DNMT3B methylates actively transcribed genes in the gene body [31,32]. Both enzymes also work in concert with DNMT3L [1,33,34]. DNMT1 can also catalyze de novo methylation under certain circumstances [30, 35,36], but it works primarily to add methylation to hemi-methylated DNA (whereby methylation is added to a newly replicated strand of DNA to maintain methylation through DNA replication and cell division) [37]; preservation of the cellular state through time is dependent on the precise copying of these marks [23,38]. Demethylation occurs both passively and actively. Passively, DNA methylation can be depleted if DNMT1 activity is reduced or absent, consequently constraining the copying of methylation across rounds of DNA replication and cell division [39][40][41]. Active demethylation, on the other hand, occurs via the oxidation of methylated cytosine to 5-hydroxymethylcytosine, 5-formylcytosine, and 5-carboxyylctyosine by ten eleven translocation (TET) enzymes [42]. Oxidized cytosines are removed by thymine DNA glycosylase, and an unmodified cytosine is then restored through base excision repair [40]. Other active mechanisms of DNA methylation removal have also been proposed (e.g., via AID/APOBEC and DNA demethylase enzymes) but their contribution is contested [43][44][45].
One fundamental question remains: How are specific CpG sites targeted for methylation and demethylation? We know that the tendency of individual CpG sites to be methylated is non-random and regulated. For instance, the closer two CpG sites are to one another, the more likely they are to share methylation status [46]. However, the specific mechanisms that methylate (or demethylate) some CpG sites but not others are complex and include several non-mutually exclusive pathways that have not yet fully been elucidated. DNMT3A is likely able to target CpG islands through its PWWP domain [47]. Another way that specific CpG sites are targeted for methylation or demethylation is through the binding of transcription factors and other proteins (Reviewed in [23,48]). Further, DNMTs interact with and are regulated by other proteins, enzymes, and RNA, and there are numerous proteins that can recruit DNMTs to methylate localized CpG sites [49,50]. Additionally, some RNA, such as long non-coding RNA, can bind to DNMTs to suppress or inhibit their activity [50][51][52]. Specifically, PIWI-interacting RNA (piRNA) in the germline can induce DNA methylation at specific target sites that silence transposable elements (Reviewed in [53]). piRNAs are also found in the soma, but their function there is more obscure [54]. Similar processes likely guide the active demethylation process [55], but these processes are not as clear. A better understanding of how methylation is added and removed from specific CpG sites will help us to better understand not only the mechanisms by which methylation can control phenotypes, but also how methylation may influence and be influenced by evolution.

Implications of DNA Methylation within and across Generations
Within an individual, DNA methylation can act both as a flexible response to environmental cues, changing in response to changes in the internal and external environment, as well as allowing for persistent phenotypes in which stable marks lead to differences between cells, tissues, individuals, and populations [56][57][58]; stable marks can either be induced by developmental environments or inherited from parents. DNA methylation is responsive to a broad range of environmental factors, including but not limited to, parental care [59,60], temperature [61], salinity [62], resource availability [63], resource defense [64], social environment [65], climate [66], and pollution ( Figure 1A) [67]. Both flexible and stable DNA methylation have phenotypic consequences. As the effects of DNA methylation on phenotypes have been reviewed and is not the focus on the current manuscript, the discussion below is not exhaustive [68][69][70][71].
Some methyl marks can be induced and removed in response to endogenous cues and environmental factors, both accustomed and unfamiliar, and are reversible ( Figure 1A). Remarkably, such DNA methylation has the capacity to change rapidly, with many studies citing changes in DNA methylation or demethylation of specific CpG sites within just minutes of a stimulus [64,[72][73][74][75]. For instance, in one study, mice were injected with an endotoxin to instigate an immune response. The endotoxin injection led to the activation of Interleukin-2 (IL-2), responsible for the proliferation of T cells, via demethylation of specific CpG sites in the promoter region of the gene within 20 min of endotoxin administration [76]. The ability of DNA methylation to regulate gene expression rapidly permits a temporary response to a stimulus, to cope and return to homeostasis. The rate of such alterations is likely one way individuals survive rapid environmental changes [77]. Thus, DNA methylation is one mechanism underpinning phenotypic plasticity, or the capability for one genotype to produce multiple phenotypes in response to environmental variation [78,79]. Some DNA methylation marks change through time predictably in response to environmental cycles, such as circadian and seasonal changes [80][81][82][83][84][85]. For example, in Siberian hamsters (Phodopus sungorus), photoperiod drives changes to reproductive physiology, with short photoperiods in winter months leading to gonadal regression and longer photoperiods in the summer reversing this [80]. These changes are facilitated by DNA methylation and demethylation of the promoter of the deiodinase type III (dio3) gene, a key regulator of thyroid hormone, such that winter photoperiods reduced and summer photo periods induced DNA methylation [80]. However, unexpected environmental perturbations can also induce reversible DNA methylation through time. Although within individual, repeated measure studies are relatively rare in the ecological epigenetics literature, inducible changes to DNA methylation have been shown in response to changes in diet and temperature, among other environmental variables [61][62][63]86]. In humans, examples of reversible methylation marks also exist (e.g., [87][88][89]).
Some DNA methylation induced from the environment (usually during development) may become a stable mark, maintained through DNA replication and cell division throughout an individual's life. Phenotypic differences in clonal populations or monozygotic twins can be explained by differential patterns of DNA methylation induced early in life [90,91]. Perhaps one of the most dramatic developmentally derived phenotypes is temperature dependent sex determination in some fish and reptiles. Research across several species indicates temperature determines sex via differential methylation patterns in the promoters of the cyp19a and SOX9 genes [92][93][94]. Other developmental environments have also been shown to have lasting phenotypic effects through impacts on methylation such as parental care, rainfall, and nutritional environment [60,65,66,95].
One of the most contested topics concerning DNA methylation, and epigenetic modifications in general, is its capacity to be transmitted across generations ( Figure 1C) [96]. Epigenetic inheritance is thought to be advantageous as it could prime offspring for environmental conditions they are likely to face [97,98]. Although epigenetic signals are known to at least sometimes be stable and heritable, it has largely been studied in plants (for a review see [99]) and had long been dismissed in vertebrates due to extensive demethylation during embryogenesis and in primordial germ cells [97,[100][101][102][103]. However, evidence now exists that this phenomenon occurs in vertebrates [97,98,[104][105][106]. It is now understood that not all DNA methylation is removed during embryogenesis, and other mechanisms may recapitulate marks faithfully [98,101]. Several recent studies in three-spined stickleback (Gasterosteus aculeatus) indicates that a large proportion of methylation could be heritable and stable across generations [107,108]. Given its importance of passing information across generations it is unsurprising that evidence from theoretical models suggests that epigenetic inheritance would impact the both rate and trajectory of evolutionary change (Reviewed in [77]).

Mutation of CpG Sites
CpG sites mutate 10-50 times faster than any other genomic motif [109,110]. This mutability is attributable to DNA methylation: a methylated cytosine is an order of magnitude more likely to mutate than an unmethylated cytosine [4,111]. Obviously, not every CpG site mutation is a direct result of methylation and some mutations will occur by chance at unmodified CpG sites. However, this fraction will be extremely small, as the mutation rate of methylated cytosines is nearly 20,000 times that of unmethylated cytosines [96,112]. In addition to somatic cells ( Figure 1A), CpG mutations are also expected to occur in the germline ( Figure 1C), which will influence how such mutations are inherited [113]. Sperm cells, for instance, are highly methylated compared to other cell types and CpG mutations correlate with male age in humans [114]. As these mutations accumulate over time, other life-history traits, including generation time may also affect the number of mutations passed on [18,115].
CpG sites are mutated through spontaneous hydrolytic deamination. Whereas cytosine deamination yields uracil, deamination of methylated cytosine generates thymine [4]. Uracil, a foreign base in DNA, is identified and removed by uracil-DNA glycosylase and replaced with a cytosine, leaving the CpG site intact [4]. Conversely, the thymine mutation is less efficiently recognized [116]. Two enzymes, thymine DNA glycosylase and methyl-binding domain protein 4, can recognize the foreign thymine, but how they specifically target deamination products and what regulates their efficiency remains unresolved (Box 1) [117][118][119]. If the thymine is not replaced before replication, the mutation persists, resulting in a CpG to TpG or CpA mutation, depending on the orientation of the methylated cytosine [117]. Recent studies suggest that other factors, including surrounding sequence variation, may further contribute to the rate of CpG mutation via deamination [120][121][122][123][124]. As such, TpG and CpA transitions are the most common type of mutation in vertebrate genomes [125]. Unlike other de novo mutations, the mutation of methylated cytosines is independent of cell divisions or replication, and instead accumulate in a time-dependent manner [110,126,127]. Indeed, the accumulation of CpG to TpG or CpA mutations increases with age in humans, the only mutational signature to do so [128][129][130]. Together, this suggests that the longer a CpG site is methylated, the more likely it is to undergo deamination to thymine [131]. In other words, methylation-dependent mutations are more likely to occur at sites that are stably methylated through time. Therefore, when environments dictate methylation status and those environments remain stable through time, resulting in a stable methyl mark, those sites are more likely to mutate. This will reduce the ability of methylation to regulate the phenotype, thus conferring a more stable (albeit not necessarily advantageous) phenotype that is subject to evolution. Conversely, if changes in the environment result in a CpG site only sometimes being methylated, then mutations are less likely to occur ( Figure 1C). In addition, new CpG sites can also be created. Stochastic mutations can occur leading the creation of a new CpG site, however, the most frequently invoked mechanism of CpG site gain is through GC-biased gene conversion [12].

Box 1. Outstanding Questions.
What is the timescale over which DNA methylation attributed mutations occur? In other words, how many generations might methylation need to persist to lead to mutation? Do some environmental contexts (e.g., unpredictable environments, range expansions) delay or accelerate CpG mutation? What factors that contribute to the efficacy and specificity of deamination repair (e.g., TGD and MBD4)? How much individual variation exists in these pathways? What is the impact of these mutations relative to other evolutionary mechanisms, such as stochastic de novo mutations, on the rate and direction of evolutionary change? Do DNA methylation driven mutations cause neutral, deleterious, or adaptive effects at the same rate as stochastic mutations? Are these changes more or less reliant on the environment than stochastic mutations are?
Much like DNA methylation, the loss of CpG sites does not occur at a uniform rate across the genome. Regions of dense CpG sites show reduced rates of CpG depletion compared to other regions [132,133]. This may occur due to two non-mutually exclusive mechanisms: hypomethylation of these regions in the germline may buffer them from mutation, or selection may be acting to preserve CpG sites in these regions [113,134]. For instance, CpG sites in protein-coding regions of genes vital for development (e.g., Hox genes) are strongly selected for [125]. Conversely, regions of low to intermediate levels of methylation, especially within intergenic or intronic regions, generally have increased rates of CpG mutation [135]. Thus, in genes where there are functional constrains or where tight gene regulation is necessary, CpG sites are likely more buffered from deamination, whereas in genes in which plasticity or regulatory flexibility is common, mutation may occur more frequently.
Because of the mutability of CpG sites when methylation is present, DNA methylation has an exceedingly complex, yet more important role in evolution than has been previously appreciated. CpG mutations effectively remove the substrate on which DNA methylation can act, leading to a permanent reduction in the capacity for DNA methylation-based gene regulation [3,136,137]. The loss of methylation can then impact how and when genes are expressed. Where the CpG site is located in the genome will further influence its effect [138,139]. CpG mutations can also underlie allele-specific methylation, and subsequently allele-specific expression [3,140] as well as other aspects of gene regulation, such as transcription factor binding sites [141].

Epigenetic Potential
The number of CpG sites represents a form of epigenetic potential, or the latent capacity for DNA methylation to occur and contribute to phenotypic plasticity [29]. Each CpG site represents a place in the genome where methylation, and consequently gene regulation might occur. The depletion of CpG sites represents the winnowing of epigenetic potential as lost CpG site can no longer mediate gene regulation ( Figure 1B) [29,142]. Therefore, the number of CpG sites represent a genetic constraint, and CpG depletion likely has many impacts on fitness [143]. For instance, high levels of phenotypic plasticity might be favored under certain conditions, such as in invasive species or naturally expanding populations coping with novel environmental conditions [144,145]. In these situations, depletion of CpG sites may impede the ability for individuals, populations, or species to cope with novel environments; thus, CpG sites may be selected to prevent the erosion of plasticity. One of the most successful invasive species, the house sparrow (Passer domesticus), shows differences in epigenetic regulation throughout their native and invasive ranges [146,147]. Further, across 70 years of one of their most recent introductions (to Kenya), house sparrow populations show differences in the number of their CpG sites, and thus their epigenetic potential [137]. Specifically, birds at the range edge (established for 5 generations), maintained significantly more CpG sites than their conspecifics found at the site of the introduction (established for~50 generations). These trends seem to have emerged through selection on CpG sites in birds at the range edge, suggesting that changes in CpG sites occurred across a relatively rapid time scale. Importantly, these changes in epigenetic potential are concurrent with expected changes in plasticity and the ability to cope with novel environments [146,[148][149][150].

The Evolutionary Consequences of CpG Mutations
CpG mutations induced by methylation certainly contribute to adaptive evolution [151]. As methylated cytosines mutate faster than any other genomic motif [152][153][154], DNA methylation alters the rate and direction of evolution by acting as a mechanism to increase genetic diversity that can lead to changes in genetic regulation more quickly than stochastic mutations. CpG mutations can lead to adaptive changes over multiple evolutionary timescales. Over shorter evolutionary timescales, the mutations caused by DNA methylation may be a mechanism leading to genetic assimilation, or the process by which a phenotype produced in response to a stimulus becomes genetically encoded ( Figure 1C) [96,155,156]. Through mechanisms described above, environmental stimuli can induce DNA methylation at specific CpG sites that produce adaptive changes in a phenotype. If the stimulus persists over multiple generations, methylation and the altered phenotype should also persist. However, although methylation continues to induce the altered phenotype, it also increases the likelihood of mutation at that specific CpG site as CpG sites are more likely to become mutated the longer they are methylated [131]. Once mutated, although DNA methylation would no longer be able to regulate the phenotype, in some cases, the phenotype may become fixed with the CpG mutation. This change would be adaptive if the time the CpG site is methylated acts as a signal that environmental conditions are consistent over several generations [157]. This 'inheritance relay' from the flexible phenotypic induction of DNA methylation to one from a more persistent genetic change would likely greatly accelerate genetic adaptation through genetic assimilation [96].
In addition to genetic assimilation, the greatly increased mutation rate associated with CpG methylation has several effects on the genome, and therefore on evolution. In vertebrates, CpG sites are found five-fold less frequently than expected [158,159], largely due to the impact DNA methylation has on these sites [157]. CpG depletion influences the number of transcription factor binding sites found throughout the genome, which, in turn, influences how and when genes are expressed. For example, transposable elements, which have a high density of CpG sites, experience high rates of deamination and mutation [160]. These mutations to CpG sites in transposable elements are responsible for the creation of transcription factor binding sites, and are hypothesized to have created new opportunities for gene regulation that are often specific to a species [15, [160][161][162]. Further, methylation driven CpG mutation can also create new transcription factor binding sites in promoter region of genes [163].
One particularly salient example of an adaptive change caused by CpG mutation is from a study of free-living Andean house wrens (Troglodytes aedon) by Galen et al. (2015). In Peru, Andean house wrens inhabit a range of altitudes, from sea level to above 4500 m. At different altitudes, they exhibit changes in hemoglobin's oxygen affinity such that birds living at the highest altitudes have the highest levels of oxygen affinity, which is enhanced by more than 30% compared to birds living at lower altitudes. This change is attributable to an ancestral CpG site mutating to a CpA, leading to a nonsynonymous substitution. The frequency of the mutated allele linearly tracks with altitude, such that populations living at high altitudes have fixed the mutated CpA site, whereas low altitude populations have negligible levels of the mutation with all individuals retaining the CpG site [164]. Further, when comparing hemoglobin's oxygen affinity in 35 pairs of high and low altitude avian species, increased oxygen affinity of hemoglobin only occurs in the high altitude species and CpG mutations are the overrepresented cause of those changes [151].
CpG mutations also play a major role in divergence and speciation [165]. Across the genome, humans and chimpanzees differ by less than 1%; however, when considering variation only at CpG sites, divergence between the two species increases to over 15% [12,166]. This difference is caused by losses and gains of CpG motifs at an approximately equal rate; nevertheless, the stark contrast in the magnitude of divergence between the whole genome and CpG sites alone highlights the importance of these sites for evolution [166]. In addition, CpG mutations have contributed to diversification amongst domesticated chicken and their closest living relative, the Red Jungle Fowl (Gallus gallus) [167]. As genetic divergence has increased from the Red Jungle Fowl, so too has the number of CpG mutations, which are significantly overrepresented compared to other mutation types [167].
As with stochastic mutations, not all, or even most, of the mutations generated by DNA methylation are likely to be adaptive. In fact, methylation driven CpG mutations are best studied in the context of human disease [168]. In one such study utilizing the Human Gene Mutation Database, CpG mutations were identified as the cause of 18.2% of all mutations causing inherited diseases, more than 10 times the expected rate [169]. Further, within somatic cells, mutations driven by methylation is frequently detected across multiple types of cancer [170]. Although the focus of this review is on evolutionary consequences of DNA methylation mutations, they have, unfortunately, been less well studied than the human disease implications of CpG mutation. We argue that one key area of future research should focus on the consequences of mutations that result from DNA methylation in ecological settings (Box 1). Unraveling this mystery will likely have significant ramifications for understanding how methylation driven mutations contribute to evolution in natural environments.

Concluding Remarks
Unequivocally, DNA methylation is vital for the survival of all organisms: methylation contributes to the development of complex phenotypes and is important for many facets of transcriptional regulation [2]. Importantly, methylation is not inherently permanent. Many intrinsic or environmental stimuli can impact methylation, leading to changes in phenotype. Despite this, however, here we argue that methylation also impacts the evolution of vertebrate genomes more than previously appreciated, namely through its influence on mutation of CpG sites. Because of the impact of methylation, CpG mutations are not entirely random. As methylation predates mutations, it seems highly unlikely that environmental conditions cannot and have not shaped patterns of mutation; however, whether and how environmental cues lead to the methylation and subsequent mutation of specific CpG sites is currently unknown (Box 1). It is possible that the semi-directed nature in which specific CpG sites are methylated in response to environmental cues initiates the switch to adaptive phenotypes. However, as with stochastic mutations, not all CpG mutations will be advantageous. For these reasons, it is vital that we investigate DNA methylation and the attributed mutations with an emphasis on intermediate timescales, where mutations leading to genetic assimilation might be particularly evident, to better understand whether and how this phenomenon occurs over relatively rapid periods of time and what effects this may have on the rate and direction of evolution (Box 1).