DNMTs and Impact of CpG Content, Transcription Factors, Consensus Motifs, lncRNAs, and Histone Marks on DNA Methylation

DNA methyltransferases (DNMTs) play an essential role in DNA methylation and transcriptional regulation in the genome. DNMTs, along with other poorly studied elements, modulate the dynamic DNA methylation patterns of embryonic and adult cells. We summarize the current knowledge on the molecular mechanism of DNMTs’ functional targeting to maintain genome-wide DNA methylation patterns. We focus on DNMTs’ intrinsic characteristics, transcriptional regulation, and post-transcriptional modifications. Furthermore, we focus special attention on the DNMTs’ specificity for target sites, including key cis-regulatory factors such as CpG content, common motifs, transcription factors (TF) binding sites, lncRNAs, and histone marks to regulate DNA methylation. We also review how complexes of DNMTs/TFs or DNMTs/lncRNAs are involved in DNA methylation in specific genome regions. Understanding these processes is essential because the spatiotemporal regulation of DNA methylation modulates gene expression in health and disease.


Introduction
An interesting paradigm in cell fate is to understand how different cellular lineages originate from a single cell precursor and the same genome. One characteristic of the specific cell type is a particular profiling expression encoded in its genome. In mammals, DNA-based processes are highly regulated by epigenetic mechanisms that impact biology and chromatin transcriptional states. Epigenetic studies include, but are not restricted to, heritable modifications on chromatin, independent of alterations in the DNA sequence. Epigenetics involves DNA and RNA methylation, post-translational histone modifications, transcriptional regulation by long noncoding RNAs (lncRNAs), and physicals alterations in nucleosomal positioning. Epigenetic mechanisms are composed of regulation layers set through writers, readers, and epigenetics erasers that allow the dynamic cell-specific gene expression [1,2]. Epigenetics contributes to cell plasticity modeling by establishing several expression profiles and, therefore, distinct phenotypes.
DNA methylation is one of the best-characterized epigenetic modifications, and it has a critical role in active and inactive chromatin equilibrium for gene expression control [3]. Gene silencing by DNA methylation is necessary to balance and regulate cell biological processes. DNA methylation is a chemical modification that occurs in the carbon 5 of cytosines located at the 5 of guanine, and this arrangement is called CpG dinucleotide [4]. DNA methylation augments the information contained in the DNA sequence, and confers read duality to the same sequence, changing the functional

Discovery of 5-Methylcytosine and Its Function in the Genome
In vitro, in early studies on the composition and biochemical properties of nucleic acids, 5mC was discovered, and it was described as an unknown element in the DNA [20]. Later, while differences between pathogenic and nonpathogenic bacteria were being searched for, 5mC was confirmed as a product of the hydrolysis of Bacillus tuberculosis nucleic acids and was identified as the fourth pyrimidine (cytosine, thymine, uracil, and 5-methylcytosine) [20,21]. Consequently, 5mC was identified in mammalian DNA and was described with acid-alkaline properties similar to cytosine, but without being uracil [22]. Almost parallel to the reports of 5mC, in 1939, Waddington proposed the existence of an epigenotype to explain certain aspects of development that were influenced by the environment and not the result of changes in the genotype [23]. Functional studies of DNA methylation evolved from a protective mechanism against foreign DNA in bacteria to a regulatory mechanism of gene expression in vertebrates [24]. Studies in Escherichia coli demonstrated that when foreign DNA was not digested by endonucleases, the DNA was methylated as an alternative host protection mechanism [25]. Based on prokaryotic observations regarding the presence of enzymes that methylate DNA and after visualization of problems with the existent models, Rigss et al., intuited the existence of mammals' DNMTs and consequently raised the X-chromosome inactivation model via DNA methylation [26]. From studies in both vertebrates and plants [27] emerged the first hint of 5mC in gene regulation in 1975 [28]. The direct correlation between gene repression and differentiation was established in 1980. Globin genes were analyzed by means of restriction endonucleases, which are unable to cleave methylated DNA (HpaII). In the germ line, all sites tested in the globin gene region were methylated; however, in somatic tissue, DNA methylation was absent at specific sites in the globin gene [29]. Additionally, the treatment with Cytidine analog (5-azacytidine) in mouse embryo cells induced changes in the differentiation as a consequence of methylation inhibition of newly synthesized DNA [30]. At this point, the aforementioned assays confirmed the existence of specific methylation patterns, symmetric in both chains, heritable, and tissue-specific [31]. Gruenbaum and Bestor and Ingram worked on the pioneering studies of vertebrate DNMTs. These studies included substrate-dependent DNA methylation that showed DNMTs preference for hemimethylated CpG sequences, and that later was established as their methylation mechanism and function [4,29]. DNMTs were sequenced and cloned from mouse cells to study their functional domains and decipher their action mechanism [30]. DNMTs and DNA methylation was established as an integral component of the gene expression regulation in mammals ( Figure 1). The global DNA methylation patterns in mammals are established by three enzymatically active DNA-methyltransferases: DNMT1, DNMT3A, and DNMT3B. that when foreign DNA was not digested by endonucleases, the DNA was methylated as an alternative host protection mechanism [25]. Based on prokaryotic observations regarding the presence of enzymes that methylate DNA and after visualization of problems with the existent models, Rigss et al. intuited the existence of mammals' DNMTs and consequently raised the Xchromosome inactivation model via DNA methylation [26]. From studies in both vertebrates and plants [27] emerged the first hint of 5mC in gene regulation in 1975 [28]. The direct correlation between gene repression and differentiation was established in 1980. Globin genes were analyzed by means of restriction endonucleases, which are unable to cleave methylated DNA (HpaII). In the germ line, all sites tested in the globin gene region were methylated; however, in somatic tissue, DNA methylation was absent at specific sites in the globin gene [29]. Additionally, the treatment with Cytidine analog (5-azacytidine) in mouse embryo cells induced changes in the differentiation as a consequence of methylation inhibition of newly synthesized DNA [30]. At this point, the aforementioned assays confirmed the existence of specific methylation patterns, symmetric in both chains, heritable, and tissue-specific [31]. Gruenbaum and Bestor and Ingram worked on the pioneering studies of vertebrate DNMTs. These studies included substrate-dependent DNA methylation that showed DNMTs preference for hemimethylated CpG sequences, and that later was established as their methylation mechanism and function [4,29]. DNMTs were sequenced and cloned from mouse cells to study their functional domains and decipher their action mechanism [30]. DNMTs and DNA methylation was established as an integral component of the gene expression regulation in mammals ( Figure 1). The global DNA methylation patterns in mammals are established by three enzymatically active DNA-methyltransferases: DNMT1, DNMT3A, and DNMT3B.

Characteristics and Function of DNMT1 (Maintenance Methylation)
DNMT1 maintains current methylation patterns in the DNA, and it is an abundant enzyme in somatic cells. It is highly conserved in eukaryotes and constitutively expressed in dividing cells. DNMT1 consists of 1620 amino acids and 10 conserved motifs involved with its catalytic function [32]. It has a large N-terminal domain with regulatory function and a smaller C-terminal catalytic domain related to DNA methyltransferase activity. The N-terminal domain regulates the recognition of methylation target sites through several subdomains, such as DNA methyltransferase-associated protein 1 (DMAP), the PCNA (proliferating cell nuclear antigen) binding domain (PBD), the replication foci-targeting sequence (RFTD), C, cysteine; X, any amino acid (CXXC), and Bromoadjacent homology 1 and 2 domains (BAH1 and BAH2) [33]. Each domain functions in a specific manner. For example, the CXXC domain acts as a sensor for unmethylated CpG [34]; DMAP motif is essential for the recruitment of the transcriptional repressors as DMAP1 and HDACs in the replication foci [35]; the PBD motif interacts with PCNA [36]; and RFTDs that interact with Ubiquitinlike with PHD and Ring finger domains 1 (UHRF1) to recruit DNMT1 at the DNA replication site and localizes DNMT1 to the centromeric chromatin and replication foci [37]. DNMT1 is more abundant during the entrance to the S phase and it is responsible for symmetrically adding the methyl

Characteristics and Function of DNMT1 (Maintenance Methylation)
DNMT1 maintains current methylation patterns in the DNA, and it is an abundant enzyme in somatic cells. It is highly conserved in eukaryotes and constitutively expressed in dividing cells. DNMT1 consists of 1620 amino acids and 10 conserved motifs involved with its catalytic function [32]. It has a large N-terminal domain with regulatory function and a smaller C-terminal catalytic domain related to DNA methyltransferase activity. The N-terminal domain regulates the recognition of methylation target sites through several subdomains, such as DNA methyltransferase-associated protein 1 (DMAP), the PCNA (proliferating cell nuclear antigen) binding domain (PBD), the replication foci-targeting sequence (RFTD), C, cysteine; X, any amino acid (CXXC), and Bromo-adjacent homology 1 and 2 domains (BAH1 and BAH2) [33]. Each domain functions in a specific manner. For example, the CXXC domain acts as a sensor for unmethylated CpG [34]; DMAP motif is essential for the recruitment of the transcriptional repressors as DMAP1 and HDACs in the replication foci [35]; the PBD motif interacts with PCNA [36]; and RFTDs that interact with Ubiquitin-like with PHD and Ring finger domains 1 (UHRF1) to recruit DNMT1 at the DNA replication site and localizes DNMT1 to the centromeric chromatin and replication foci [37]. DNMT1 is more abundant during the entrance to the S phase and it is responsible for symmetrically adding the methyl group to the CpGs [6,38]. DNMT1 activity is structurally dependent on a hemimethylated substrate [39]. DNMT1 adds methyl groups to the unmethylated cytosines at the nascent chain to generate symmetric methylation. The active center of the C-terminal domain interacts specifically with a preference 30-40-fold higher for hemimethylated DNA [34,40,41]. DNMT1 knockout in mice revealed that it is required for appropriate embryonic development, genomic imprinting, and X-chromosome inactivation [42,43].

Characteristics and Function of DNMT3A and DNMT3B (De Novo Methylation)
DNMT3A and DNMT3B enzymes are essential for de novo methylation during early development as their knockouts are lethal during embryogenesis in murine models [44,45]. DNMT3A and DNMT3B display a high degree of similarity, especially in their catalytic domain, in which they share about 84% homology [46]. However, despite this, they have different methylation mechanisms and nonredundant functions. DNMT3A has a cooperative methylation mechanism, whereas DNMT3B methylated DNA by a noncooperative mechanism [47,48]. DNMT3A is found on the small arm of chromosome 2 at position 23.3 and consists of 26 exons/25 introns and codes for 912 amino acids. DNMT3B is found on the long arm of chromosome 20 at position 11.21, 24 exons/23 introns, and its genome codes for a protein of 853 amino acids. DNMT3 enzymes contain a variable N-terminal portion of 280 and 220 amino acids for DNMT3A and DNMT3B, respectively. The N-terminal domain consists of PWWP-and ADD-regulatory domains, six repeats of the CXXC motif, and is followed by the C-terminal catalytic portion [44,[49][50][51][52][53]. The PWWP domain has a positively charged surface and can interact with the negative charge of DNA, but is not required for the catalytic activity of de novo DNMT3s [50]. The target recognition domain (TDR) or CXXC is part of the catalytic domain of DNMT3 and is designed to recognize DNA [46]. The slight structural differences between DNMT3A and DNMT3B are key to differential methylation and biochemical interaction with the DNA strand. Specific amino acids are involved in its interaction with the DNA for the conversion of cytosine into 5-metilcytosine. Arg836 of the target recognition domain is essential for the CpG contact by DNMT3A, and Asn779 and Lys777 for CpG recognition by DNMT3B [54,55].

Transcriptional and Post-Transcriptional Regulation of DNMTs
Both DNMT1 and DNMT3B expression increases during the cell cycle in the transition from G0 or G1 to S phase [56]. Although the regulation of DNMT1 expression has been described in more detail elsewhere, it is proposed that both may be regulated by the same transcription factors (Sp1 and Sp3, Figure 2a). In normal cells, the transcriptional activity of DNMTs promoters is regulated by Sp1, Sp3, E2F, and p53 transcription factors (TFs) binding [57][58][59][60][61][62]. In cells in phase G0 or G1, the Retinoblastoma protein (Rb) is dephosphorylated and binds to E2F resulting in a repression complex on the DNMT1 promoter [60]. This is reinforced by the interaction between Sp1 and p53 in the DNMTs promoter, where p53 functions as a transcriptional repressor by inhibiting the direct binding of Sp1 to the DNMTs promoter [62]. During the S phase, p53 levels are decreased and Sp1 is released, parallel to the decrease in phosphatases and the increase in cyclin-dependent kinases (CDKs) that phosphorylate Rb. Rb phosphorylation promotes the release of E2F, and thus the expression of DNMTs is activated ( Figure 2a) [60,61]. Similarly, expression of DNMT3A and DNMT3B depended on Sp1 and Sp3 binding to its promoter, and inhibition of Sp1 or Sp3 binding to their target sites leads to a decreased expression [63]. and miR-222 regulate DNMT3B mRNA in breast cancer cell lines and Burkitt lymphoma [72,73]. The re-expression of these miRNAs reduces the levels of DNMT3B mRNA in hypermethylated breast cancer cell lines [72]. miR-30a-3p is a small noncoding RNA that regulates DNMT3A expression in A549 cells [74]. The expression of DNMTs increases in the transition from phase G1 to S. In phase G1, p53 interacts with Sp1, and pRB is dephosphorylated and bound to E2F. During replication, the levels of p53 decrease and CDKs expression increases, which results in the dissociation of pRB phosphorylated from E2F and p53 from Sp1; this facilitates DNMT1 transcription activation. DNMT1, DNMT3A, and DNMT3B promoters are enriched with Sp1, E2F, and c-Myc transcription factors (TFs) in several cell lines analyzed in the ENCODE project. (b) Post-transcriptional regulation of DNMTs mRNA occurs through the interaction of miRNAs with its 3 UTR end. The binding of miR-29, miR-152, and miR-148 leads to the degradation of DNMT1 mRNA. (c) Post-translational modifications of DNMT1 may favor its activity or degradation. Methylation of lysine 142 by the methyltransferase SET7 leads to the ubiquitination of DNMT1. This modification is mutually exclusive with the phosphorylation of serine 143 by AKT, which favors the stability and activity of DNMT1. The acetylation performed by TIP60 in several lysine residues in combination with the ubiquitination made by UHRF1 leads consequently to the DNMT1 degradation via proteasome. The binding of DNMT1 to DNA can be increased by the sumoylation performed UBC9 and SUMO1. Me, methylation; P, phosphorylation; Ub, ubiquitination; Su, sumoylation; Lys, lysine; Ser, serine. At the post-transcriptional level, the 3 -UTR of the DNMTs mRNA can be targeted by microRNAs (miRNAs) that recognize and interact with mRNA by base complementarity and contributing to their degradation (Figure 2b) [64][65][66]. For DNMT1 mRNA, the binding of miR-29b [67], miR-152, miR-185 [68], and miR-148 induces its degradation in myeloid leukemia and gliomas cells (Figure 2b). In the presence of miR-16c, miR-222, miR-1741, or miR-1632, DNMT3B expression decreases in ovarian cancer [69]. Similarly, miR-29s is decreased in lung cancer, and its expression inversely correlates with DNMT3A and DNMT3B expression [70]. miR-148 performs a differential regulation on DNMT3B isoform and only downregulates the canonical DNMT3B expression by the binding of its 3 UTR region [71]. Several miRNAs, including miR-26a, miR-26b, miR-26c, miR-203, and miR-222 regulate DNMT3B mRNA in breast cancer cell lines and Burkitt lymphoma [72,73]. The re-expression of these miRNAs reduces the levels of DNMT3B mRNA in hypermethylated breast cancer cell lines [72]. miR-30a-3p is a small noncoding RNA that regulates DNMT3A expression in A549 cells [74].
Post-translational modifications of DNMT1 may favor its stability or lead to their degradation [33]. For example, DNMT1 degradation can be induced by demethylation of lysine 1094 by LSD1 lysine-specific histone demethylase 1A (LSD1) [75]. Similarly, methylation of lysine 142 by SET7 methyltransferase also induces DNMT1 degradation [76]. This last mark is mutually exclusive with serine 143 phosphorylation [77] by AKT1 kinase; when serine 143 is phosphorylated, methylation in lysine 142 is prevented, which prevents the degradation of DNMT1. In some alterations, acetylation can also destabilize DNMT1. For example, Tip60, an acetyltransferase, can acetylate several lysine residues on DNMT1, and subsequently, DNMT1 is ubiquitinated by UHRF1 [78,79]. This mechanism is antagonized by deubiquitinase HAUSP in RKO (cancer) and HEK293 (normal) cells, which protects DNMT1 from proteasomal degradation [79]. Sumoylation is another modification that occurs on DNMT1 and can antagonize the function of ubiquitination. The UBC9 and SUMO1 are proteins that perform the sumoylation of DNMT1 and increase their binding to DNA and their catalytic activity ( Figure 2c) [80,81].

Maintenance of the Methylation Machinery
Although DNMTs directly interact with cytosines in DNA, consensus sequences have not been defined yet. Some proteins participate, directly or indirectly, in the recruitment of DNMT1 to maintain the methylation patterns in each cell division, including PCNA, PAF15, UHRF1, G9a, GLP, and Ligase 1 [82,83]. PCNA is a molecule of the DNA replication complex, first described to interact with DNMT1 in MRC-5 human cells (Medical Research Council cell strain 5). Binding occurs through 163-174 amino acids of DNMT1 and colocalizes with PCNA in the newly replicated DNA replication foci [36]. UHRF has an essential role in maintenance methylation, in embryonic stages of mice. UHRF1 knockout is lethal, and its elimination in mESC leads to the loss of DNA methylation. UHRF1 is a multidomain protein that contains an SET and RING-associated (SRA) domain that binds to hemimethylated DNA [84,85]. UHRF is also frequently bound to K9me2/3 of the tail of histone 3 (H3) through its Tandem Tudor Domain (TTD) [86]. Lysine methylation is a post-translational modification that favors the interaction between proteins. This mark is added by the histone methyltransferases G9a and GLP. Ligase 1 is a canonical protein of the replication complex and has a lysine in a similar motif to H3K9me2/3. Lysine 126 of the ligase 1 behaves physicochemical and structurally like lysine 9 of H3. Both residues (H3K9 and Lig1/K126) can be di-or trimethylated (me2/me3) by G9a and GLP histone methyltransferases, increasing the preference of the UHRF1 TTD by ligase 1 in HeLa cells [82]. Therefore, the location of DNMT1 in the replication foci occurs by the PCNA-LIGASE 1-UHRF1-DNMT1 interaction. This protein complex co-localized in the sites where DNA is synthesized and is responsible for the faithful copy of the information contained in the DNA methylation patterns (Figure 3). The DNMT1 recruitment to maintain the DNA methylation also requires both dual mono-ubiquitylation of PCNA-associated factor 15 (PAF15Ub2) in early S-phase and dual mono-ubiquitylation histone H3 (H3Ub2) by UHRF1 in late S-phase [83]. The puzzle of the maintenance methylation machinery is being solved with recent findings, although there is still a gap in knowledge about methylation of specific genomic sites. Looking deeper, DNMTs, methylation machinery, and cis elements are likely to be the answer.
Genes 2020, 11, x FOR PEER REVIEW 7 of 19 Figure 3. Recruitment of DNMT1 in maintenance methylation. During DNA replication in mammals, the information of the methylation patterns is copied faithfully from the template strand to the nascent strand. The protein complex for localization of DNMT1 is formed by PCNA, which is linked to ligase 1 (Lig 1). Di-or trimethylated ligase 1 in lysine 126 by G9a is tightly bound by the TTD domain of UHRF1, which is consequently linked to DNMT1. The interaction of motif VI of DNMT1 is important to stabilize the DNA-protein interaction. The IV domain of DNMT1 produces a nucleophilic attack on carbon 6, which causes a covalent bond that activates the carbon 5 atom towards the electrophilic attack, and the addition of the methyl group occurs. This is followed by the removal of 1 proton in carbon 5 and the resolution of the covalent interaction to result in the modification of cytosine to 5methylcytosine.

Elements that Influence DNA Methylation
DNMTs and the availability of the substrate S-adenosyl-L-methionine are the limiting factors for the methylation reaction of a CpG site. The location of such effectors is not well known, but there is evidence that DNMTs can be driven by a strong cis element or by the sum of several cis elements. In both normal and tumor cells, the specific methylation in several genes occurs in conserved DNA sequences as CGI. Additionally, in some genes, the methylation is deposited as a result of the interactions between DNMTs/TFs or DNMTs/lncRNAs. DNA methylation is potentiated by other repressive complexes such as proteins that bind to methylated DNA (Methyl-CpG-binding domain, MBDs), deacetylases of histones (HDACs) that increase the interaction DNA-histones for the chromatin condensation, and histone marks such as H3K27me3 (Figure 4). The functional link of these chromatin regulators with the methylation deposition is described below. Figure 3. Recruitment of DNMT1 in maintenance methylation. During DNA replication in mammals, the information of the methylation patterns is copied faithfully from the template strand to the nascent strand. The protein complex for localization of DNMT1 is formed by PCNA, which is linked to ligase 1 (Lig 1). Di-or trimethylated ligase 1 in lysine 126 by G9a is tightly bound by the TTD domain of UHRF1, which is consequently linked to DNMT1. The interaction of motif VI of DNMT1 is important to stabilize the DNA-protein interaction. The IV domain of DNMT1 produces a nucleophilic attack on carbon 6, which causes a covalent bond that activates the carbon 5 atom towards the electrophilic attack, and the addition of the methyl group occurs. This is followed by the removal of 1 proton in carbon 5 and the resolution of the covalent interaction to result in the modification of cytosine to 5-methylcytosine.

Elements that Influence DNA Methylation
DNMTs and the availability of the substrate S-adenosyl-L-methionine are the limiting factors for the methylation reaction of a CpG site. The location of such effectors is not well known, but there is evidence that DNMTs can be driven by a strong cis element or by the sum of several cis elements. In both normal and tumor cells, the specific methylation in several genes occurs in conserved DNA sequences as CGI. Additionally, in some genes, the methylation is deposited as a result of the interactions between DNMTs/TFs or DNMTs/lncRNAs. DNA methylation is potentiated by other repressive complexes such as proteins that bind to methylated DNA (Methyl-CpG-binding domain, MBDs), deacetylases of histones (HDACs) that increase the interaction DNA-histones for the chromatin condensation, and histone marks such as H3K27me3 (Figure 4). The functional link of these chromatin regulators with the methylation deposition is described below. The DNA methylation deposition in a specific promoter region can be ensured by several mechanisms involving repressors complex and cis elements: the presence of CpG islands (CGIs) give the region a propensity for the DNMTs' enrichment by its dense CpG content. Although little is known about the common motifs present in promoters, they are important for recapitulating specific methylation states in regulatory regions. The DNMTs' recruitment also occurs by TFs or lncRNAs, resulting in the DNA methylation in the target promoter. These elements, together or separately, regulate the deposition of DNA methylation and prevent active transcriptional machinery and proteins that erased the 5mC patterns (Tet1). (b) A wide repressive landscape caused by epigenetics mechanism. Cytosines in CpG context are methylated by cis elements influence and DNMTs/partners complex; as a result, this mark is read by repressor proteins such as methylated DNA binding proteins (MBDs) and histone deacetylase (HDACs). HDACs remove acetyl groups from histones to enhance their positive charges and increase the affinity between histones and DNA. In methylated regions, the interaction has been described between EZH2 and DNMTs, resulting in a deposition of the histone repressive mark H3K27me3. In a similar context, the H3K9me3 deposited by SUV39H1 is a frequently found mark in repeat regions that are highly methylated. This landscape is found in inactive chromatin and is a highly repressive level in target genes. The DNA methylation deposition in a specific promoter region can be ensured by several mechanisms involving repressors complex and cis elements: the presence of CpG islands (CGIs) give the region a propensity for the DNMTs' enrichment by its dense CpG content. Although little is known about the common motifs present in promoters, they are important for recapitulating specific methylation states in regulatory regions. The DNMTs' recruitment also occurs by TFs or lncRNAs, resulting in the DNA methylation in the target promoter. These elements, together or separately, regulate the deposition of DNA methylation and prevent active transcriptional machinery and proteins that erased the 5mC patterns (Tet1). (b) A wide repressive landscape caused by epigenetics mechanism. Cytosines in CpG context are methylated by cis elements influence and DNMTs/partners complex; as a result, this mark is read by repressor proteins such as methylated DNA binding proteins (MBDs) and histone deacetylase (HDACs). HDACs remove acetyl groups from histones to enhance their positive charges and increase the affinity between histones and DNA. In methylated regions, the interaction has been described between EZH2 and DNMTs, resulting in a deposition of the histone repressive mark H3K27me3. In a similar context, the H3K9me3 deposited by SUV39H1 is a frequently found mark in repeat regions that are highly methylated. This landscape is found in inactive chromatin and is a highly repressive level in target genes.

CpGs Content in Promoters
The structural organization of the genome ranges from topological domains to the small motifs of binding for a transcription factor. In this structure's context, the CGIs are conserved cis elements and are a large motif binding for several proteins in genome regulatory regions [11]. DNMTs lack specific binding sequences in the DNA, but they bind to DNA regions depending on their CpG content. CGIs overlap with the transcription initiation site in 60% of human genes and function as platforms of binding sites for TFs and proteins involved in gene transcription. In vertebrates, in a CpG content sense, promoters can be classified into two groups; promoters with a high content of CpGs (HCP) and with low CpGs content (LCP). Both groups are highly conserved in vertebrates [12,87,88]. In general, promoters with low CpG content are targets of both maintenance and de novo methylation, and their genes are repressed transcriptionally [87]. Conversely, promoters with high CpG content have low methylation rates, and they are widely distributed in highly expressed genes. Promoters with intermediate CpGs content are dynamically methylated. Their transcriptional state is regulated via methylation and depends on tissue, differentiation, and cell cycle [12,13,89]. In terms of genomic expression profiles, HCPs are associated with ubiquitously expressed genes, and LCPs were found in genes with specialized functions and expressed in specific cells [90]. Within HCP and LCP promoters, there are regions determining methylation, and these regions are sufficient and necessary in a DNA sequence to recapitulate their hypomethylated or hypermethylated state depending on the cellular context, assuming that they contain the necessary cis information to recapitulate their state of natural methylation in pluripotent or differentiated cells [13]. The content of CpGs has been evolutionarily conserved to carefully regulate the switches of gene expression between an embryonic state and a differentiated state. The subtle differences that result in loss of CpGs content are more frequently observed in LCP promoters due to the occurrence of mutations consistent as a result of transitions of methylated cytosines to thymine (5mC < T) [87].

Transcription Factors Involved in DNA Methylation
DNMTs are enzymes that bind to DNA in different regions of the genome, and it is believed that in some cases, enzymes are recruited by tissue-specific TFs that negatively regulate their target genes at promoter level (Table 1). TFs have been described as PML-RAR, whose function is the transcriptional repression of the retinoic acid receptor RARB2, and this occurs through the recruitment of DNMT1 and DNMT3a to the promoter of this receptor [91]. Another example is the TF STAT3, which interacts with DNMT1 and HDAC1 at the SHP-1 phosphatase promoter. SHP-1 is a negative regulator of cell signaling. Therefore, abnormal methylation of SHP-1 in leukemias is, in part, the consequence of DNMT1 recruitment to the promoter by STAT3 [92]. PU.1 is TF and a regulator of hematopoiesis that interacts directly with the ATRX domain of DNMT3A and DNMT3B. PU.1 binding sites gain methylation when DNMT3s are co-expressed with PU.1. One of the best-described targets for PU.1 is the p16INK4A promoter, which is methylated in NIH3T3 cells with overexpression of PU.1, resulting in a decrease in p16INK4A expression. The direct interaction of DNMT3A and DNMT3B was found in the −548 to −2 nt from first p16 ATG and the DNA methylation was sensed in the −480 to 18 of its promoter region [93]. ZHX1 is a zinc finger protein that functions as a TF and interacts with DNMT3B in vivo and in vitro. The N-terminal PWWP domain of DNMT3B is required for its interaction with motifs in the ZHX1 homeobox. Through luciferase assays, it was found that ZHX1 favors the transcriptional repression mediated by DNMT3B in target genes [94]. Through protein arrays for 103 TFs, 42 transcription factors that interact with DNMT3A and DNMT3B were identified, of which 27 interact specifically with DNMT3A and 10 interact exclusively with DNMT3B [95]. In a later work, TFs that interact with DNMT1 were identified. In a TFs array the binding of recombinant DNMT1 (DNMT1R) with 58 transcription factors assayed was found. This analysis confirmed several previously described interactions, such as the DNMT1/Sp1 or DNMT1/p53 interactions, and identified potential interactions not yet described. In situ, proximity ligation assay (P-LISA), Olink/Duolink experiments, and immunoprecipitation confirmed that DNMT1 interacts with Sp1, p53, C-EBPα, and YY1 but not with Sp4 or NFκB-p50 [96].
SALL4 is a stem cell TF that plays a vital role in maintaining the identity of stem cells and controlling self-renewal through transcriptional repression. By means of immunoprecipitation, Western blot, and analysis of the enzymatic activity in HEK293 cells, it was demonstrated that the SALL4 protein interacts directly with different DNA methyltransferases (DNMT1, 3A, 3B, and 3L) and influences the enzymatic activity of the purified DNMTs [97]. The RASSF1A regulation in A549 nonsmall cell lung cancer (NSCLC) cells occurs through the epigenetic silencing. In this context, DNMT3B overexpression is regulated by HOXB3 TF, and physical associations were found between DNMT3B, EZH2, and Myc in the RASSF1A promoter. The recruitment of DNMT3B/EZH2/Myc complex to the RASSF1A promoter is necessary to its epigenetic silencing, in particular, its DNA hypermethylation and consequently improves its function as a tumor suppressor [98]. In Glioma cells, CDKN1a promoter is methylated and decreases its expression as a result of co-recruitment of DNMT3A and Myc [95].

Common Sequences in Methylated Genes
Most CpG dinucleotides in mammals are methylated, but the methylation pattern is not uniform. DNMTs possess an intrinsic preference for particular sequences (Table 1), and this is supported by the notion that methyltransferases evolved from bacterial methyltransferases, which are sequence-specific enzymes [43]. An in vivo study conducted in 2005 determined that DNMT3B prefers the YCpGR sites (Y, pyrimidines; R, purines). These sequences were composed of nucleotides around the central CpG, where the binding affinity of DNMT3B was high for 5 -CTTGCGCAAG-3 and low for 5 -TGTTCGGTGG-3 sequences [99]. Sequences adjacent to CpGs are involved in the affinity of human DNMTs for de novo methylation. In the study of de novo motifs on regions commonly methylated by DNMT3B, an enrichment of the T residue in position −1 and residue G in position +1 in the motif found for these genes (NTCpGGN) was observed [100]. This motif was recently confirmed by a biochemical assay, and it was found that DNMT3B specifically recognizes DNA with CpGpG sites [55]. The analysis of the motifs for DNMT3A shows that CpGs are more frequently methylated with a T in position −2 and C in position +2 [101]. In another study, the methylation deposited by DNMT3B was observed on the CANAGCTG (N, any nucleotide) sequence. This sequence was systematically identified in the promoters of the genes methylated by DNMT3B [15]. The contribution of DNMTs to methylation in specific sequences has also been studied into the human hepatocellular carcinoma cell line SMMC-7721. In this context, DNMT1, DNMT3A, and DNMT3B show methylation preferential for particular sequences in the genome. These sequences were found on structural coding genes, repeated DNA sequences, and genes of unknown function. The size of the methylated sequences for DNMT1 was 340 bp, 325 bp for DNMT3A, and 440 bp for DNMT3B. The differential methylation analysis showed 46 fragments exclusively methylated for DNMT1, and their consensus motifs in these fragments were 5 -TAAAAATACAAAAA-3 and 5 -ATTAGCCGGG-3 , 42 fragments methylated for DNMT3A with consensus motif 5 -TTGCCGGGCT-3 , and 67 fragments methylated for DNMT3B that contain 5 -GCAGCCGGCAT-3 motif [100]. In a genome-wide study about protein-DNA interactions (PDIs), a specific binding motif for DNMT3A (CACATCTGGACAGATGTGGGCG) was found to be essential for interaction with DNMT3A [102].

Long Non-Coding RNAs
Long noncoding RNAs (lncRNAs) are part of the 98% of the noncoding genomic DNA [103]. A very important finding after the human genome sequencing was the number of lncRNAs found and their possible functions in gene regulation. According to its structure, the lncRNAs have different functional mechanisms. They can function as scaffolds or guides or interfere with the binding of RNAs and proteins such as DNMTs (Table 1) [104,105]. Genome-wide analysis demonstrated that lncRNAs are deregulated in cell lines and tumors and consequently mediate the overexpression or the recruitment of DNMTs to preferential genomic loci [106,107]. PARTICLE and DACOR1 are lncRNAs that regulate global levels of DNA methylation through regulation of metabolic compounds of the folate pathway and by direct interaction with DNMT1 in HCT116 and MDA-MB-361, respectively [106,108]. Both lncRNAs interact with DNMT1 and are enriched in differently methylated regions (DMRs) on several genes. Furthermore, these lncRNA are involved in the regulation of the DNA methylation cofactor. PARTICLE is implicated in global methylome enhancement through MAT2A (methionine adenosyltransferase 2A gene) methylation. MAT2A participates in synthesis of SAM (S-adenosyl methionine), a key group methyl donor [108]. DACOR1 is downregulated in colon cancer, and its induction in cancer cells results in the decreased expression of genes involved in amino acid metabolism as cystathionine β-synthase (CBS). The reduction of CBS results in the accumulation of homocysteine and an increase in methionine, the substrate needed to generate SAM cofactor [106]. HOTAIR is another lncRNA that regulates global levels of DNMTs. The knockdown of HOTAIR in hepatocellular carcinoma and small cell lung cancer line cells results in decreased expression of DNMT1, DNMT3A, and DNMT3B. The functional consequence is the reduction of methylation on target genes [109,110]. Specifically, there are lncRNAs that influence the recruitment of DNMTs in specific target genes. NEAT1 is an lncRNA overexpressed in osteosarcoma and favors the methylation of E-cadherin through a protein complex with DNMT1, SNAIL, and G9a [17]. LincRNA-p21 interacts with DNMT1 and promotes the methylation of promoters such as Nanog to avoid cell reprogramming [111]. Kcnq1ot1 is another lncRNA that recruits DNMT1 to its target genes in somatic cells [112]. Dum is an lncRNA whose target gene is Dppa2, which is repressed by the cis recruitment of DNMT1, DNMT3A, and DNMT3B [16]. There are also lncRNAs that prevent the interaction of DNMTs with the promoters of their target genes. The delicate regulation of the CEBP gene consists of the transcription of its own locus of the lncRNA ecCEBP, which binds to DNMT1 and prevents methylation of the CEBP promoter and genes adjacent to the locus [113]. Another lncRNA with a similar mechanism is DALI. This lncRNA was found to have target genes in trans and avoid methylation in its promoters by interaction with DNMT1. The depletion of DALI results in increased methylation of its target genes [114]. The lncRNAs can function as global and specific regulators of DNA methylation by DNMTs increasing expression or recruitment to target genes.  [106]. Dum [14] LincRNA-p21 [111] EcCEBP [114] Kcnq1ot1 [112] HOTAIR [109,110] PARTICLE [108] HOTAIR [109,110] Dum [14] HOTAIR [109,110] Dum [14]

Histone Methylation Patterns and DNMTs
There are 20 amino acids in histones that can undergo chemical modifications. The histone modifications are positive and negatively correlate with DNA methylation. The association between DNA methylation and histone methylation occurs by the interaction of either DNMTs with histone methylation complex or modified histone residues. The most studied modifications associated with DNA methylation are the trimethylation of lysine 27 in histone 3 (H3K27me3) and lysine 36 in histone 3 (H3K36me3). H3K27me3 is a repressive mark deposited by the PRC2 complex (polycomb repressive complex 2), which is enriched in DNA regions that display abundant DNA methylation and decreased gene transcription. In this context, PRC2 participates in the recruitment of DNMTs [18,115]. Moreover, H3K27me3 is abundant in genes aberrantly methylated in cancer, possibly by the recruitment of de novo methyltransferases [18]. Furthermore, in cells with DNMT3B overexpression, H3K27me3 coexists with DNA methylation in abnormally methylated genes [116]. The methylation of H3K36me3, established by SET, is considered a repressive mark that leads to the distribution of DNA methylation [117]. This mark not only interacts with the PWWP domain of DNMT3A enhancing its activity, but is also necessary for the binding of DNMT3A to the DNA [19]. The trimethylation of lysine 9 in histone 3 (H3K9me3) is a mark that frequently overlaps with pericentromeric regions, sequences highly repeated, regions enriched with heterochromatin, and regions that gain methylation by DNMT3A or DNMT3B. SUV39H is a histone methyltransferase that establishes H3K9me3. Heterochromatin protein 1 (HP1) binds to trimethylated lysine residues to form heterochromatinic subdomains. The SUV39H-HP1 complex and their mark, H3K9me3, are required for DNA methylation by DNMT3A and DNMT3B [118]. Furthermore, H3K9me2/3 is bound by UHRF, a protein essential for maintenance methylation deposited by DNMT1 [83]. Conversely, trimethylation of lysine 4 in histone 3 (H3K4me3) is mutually exclusive with DNA methylation and is enriched in active promoters. When methylated, H3K4 is not recognized by the ADD domain (DNMT3-DNMT3L binding domain) of de novo DNMTs [119]. Temporary histone marks associated with CpG rich genomic regions can be replaced or strengthened by DNA methylation to generate permanent transcriptional repression.

Perspectives (De Novo DNMTs and Methylation Editing)
Currently, global methylation profiles of most tissues and cells at different stages of development exist [120]. These genome-wide profiles are correlated with RNA expression, histone marks, and nucleosome positioning and determine their positive or negative influence on DNA methylation deposition. However, we are just starting to understand the intricate network of post-translational modifications (chromatin modifications and chromatin factors, including noncoding RNAs) that govern the activity and regulation of DNMTs and their impact on central cellular processes. DNA methylation is a key process in several human diseases, such as cancer and neurological disorders [121,122]. DNA methylation has a key implication in gene function, chromatin biology, cell reprogramming, and medical applications. Differential reprogrammable methylation in a sequence-specific site is now possible through several genome-editing tools based on DNA recognition domains such as transcription activator-like effectors (TALENs), zinc finger proteins (ZNFs), and the system of Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) and CRISPR-associated (Cas) proteins without genetic editions [123][124][125].
The silencing of specific genes based on DNA methylation is suitable through the fusion of de novo DNMTs' catalytic domain with catalytically inactive Cas9 (dCas9). dCas9-DNMT is targeted by co-expression of a guide RNA to any 20 bp DNA sequence followed by the NGG trinucleotide present in target genes [125,126]. The fusion dCas9-DNMTs can be guided by multiple sgRNAs to different regions. DNA methylation deposition on promoters impacts their expression and shapes transcription factors binding [127,128]. DNA methylation with dCas9-DNMT3A of IL6ST and BACH2 promoters is heritable across cell divisions and decreases their expression [128]. Novel experimental approaches are necessary to make the CRISPR system more efficient, but it is a very promising tool for epigenetic editing in human diseases.

Conclusions
The cytosine-methylation patterns in a CpG context are established by three conserved DNA methyltransferases in mammals, and several molecular elements have key roles in specific DNMT localization. DNA methylation contributes to spatiotemporal gene expression regulation. In genes regulated by DNA methylation, particular transcription factors are involved-lncRNAs, CpG content, common motifs, and histone modifications. DNA methylation is influenced in a sequence manner by CpG content, common motifs for DNMTs, and transcription factors. At the chromatin level, histone modifications are positive and negative regulators of DNA methylation. The lncRNAs are elements that regulate DNA methylation by specific interaction between gene, DNMTs, and lncRNA.
Deciphering the cytosine methylation code and the underlying basic mechanism will contribute to understanding early developmental stages, proper maintenance of somatic cells, and several human diseases.