Epigenetic Analysis through MSAP-NGS Coupled Technology: The Case Study of White Poplar Monoclonal Populations/Stands

Over the last several decades, several lines of evidence have shown that epigenetic modifications modulate phenotype and mediate an organism’s response to environmental stimuli. Plant DNA is normally highly methylated, although notable differences exist between species. Many biomolecular techniques based on PCR have been developed to analyse DNA methylation status, however a qualitative leap was made with the advent of next-generation sequencing (NGS). In the case of large, repetitive, or not-yet-sequenced genomes characterised by a high level of DNA methylation, the NGS analysis of bisulphite pre-treated DNA is expensive and time consuming, and moreover, in some cases data analysis is a major challenge. Methylation-sensitive amplification polymorphism (MSAP) analysis is a highly effective method to study DNA methylation. The method is based on the comparison of double DNA digestion profiles (EcoRI-HpaII and EcoRI-MspI) to reveal methylation pattern variations. These are often attributable to pedoclimatic and stress conditions which affect all organisms during their lifetime. In our study, five white poplar (Populus alba L.) specimens were collected from different monoclonal stands in the Maltese archipelago, and their DNA was processed by means of an innovative approach where MSAP analysis was followed by NGS. This allowed us to identify genes that were differentially methylated among the different specimens and link them to specific biochemical pathways. Many differentially methylated genes were found to encode transfer RNAs (tRNAs) related to photosynthesis or light reaction pathways. Our results clearly demonstrate that this combinatorial method is suitable for epigenetic studies of unsequenced genomes like P. alba (at the time of study), and to identify epigenetic variations related to stress, probably caused by different and changing pedoclimatic conditions, to which the poplar stands have been exposed.


Introduction
In the last twenty years, many studies have focused on epigenetic modifications in plants, particularly on DNA methylation status, and on the mechanisms driven by external stimuli (abiotic stresses, pedoclimatic conditions, climate changes etc.) that are exploited by plants to counteract mid-and/or long-term changes, with or without the involvement of signalling molecules (e.g., jasmonic acid, salicylic acid and reactive oxygen species).
In plants, DNA methylation commonly occurs within three main sequences: CG, CHG and CHH (where H is A, C or T); however, the symmetrical CpG and CpA(T)pG sites are reported to genetically analysed [15]. Previous studies have demonstrated that white poplar populations are constituted by large monoclonal stands [16,17], as commonly reported for poplars of the Populus section [18], and many other species which adopt vegetative propagation as a prevalent way of reproduction [19]. Therefore, the white poplar represented an excellent species to test the MSAP-NGS method, due to the incomplete knowledge of its large and complex genome.

Technical Overview of MSAP-NGS Methodology
In order to exploit the power of the MSAP protocol for obtaining information about the DNA methylation status of the monoclonal stands of white poplar on the island of Malta, we combined it with NGS, which provided the nucleotide sequences of the MSAP amplicons obtained by PCR. The combined MSAP-NGS procedure required several steps (see below) and specific biostatistics data analyses.
Step 1: The reads obtained by DNA NGS amplicons, obtained from each single digestion (EcoRI-MspI or EcoRI-HpaII dataset) of all the samples, were used to assemble de novo a reference genome constituted by 5532 contigs. The contigs obtained from the mapping of reads to the reference genome were analysed in order to assay the reproducibility of the analyses and to verify the number of contigs obtained for each sample. Two Venn diagrams were elaborated comparing contigs obtained from each sample and each enzyme ( Figure 1). for poplars of the Populus section [18], and many other species which adopt vegetative propagation as a prevalent way of reproduction [19]. Therefore, the white poplar represented an excellent species to test the MSAP-NGS method, due to the incomplete knowledge of its large and complex genome.

Technical Overview of MSAP-NGS Methodology
In order to exploit the power of the MSAP protocol for obtaining information about the DNA methylation status of the monoclonal stands of white poplar on the island of Malta, we combined it with NGS, which provided the nucleotide sequences of the MSAP amplicons obtained by PCR. The combined MSAP-NGS procedure required several steps (see below) and specific biostatistics data analyses.
Step 1: The reads obtained by DNA NGS amplicons, obtained from each single digestion (EcoRI-MspI or EcoRI-HpaII dataset) of all the samples, were used to assemble de novo a reference genome constituted by 5532 contigs. The contigs obtained from the mapping of reads to the reference genome were analysed in order to assay the reproducibility of the analyses and to verify the number of contigs obtained for each sample. Two Venn diagrams were elaborated comparing contigs obtained from each sample and each enzyme ( The number of contigs obtained by read mapping of fragments of both digestions was comparable ( Figure 1); when a single dataset was considered (EcoRI-MspI, or EcoRI-HpaII digestion), different white poplar samples shared a high number of contigs; moreover, each sample also showed private contigs. Comparing the data, the magnitude of the number of private contigs was similar in the two different datasets (MspI or HpaII digestions). In particular, sample Malta 5 showed the lowest number of private contigs but with the same magnitude for both datasets (67 for MspI and 66 for HpaII digestion), whilst Malta 4 had the highest one (777 in the case of MspI, and 527 in the case of HpaII). For a better understanding of the molecular background of the epigenetic changes, we considered the most relevant contigs to be those that consistently showed identical polymorphisms among samples. The number of contigs obtained by read mapping of fragments of both digestions was comparable ( Figure 1); when a single dataset was considered (EcoRI-MspI, or EcoRI-HpaII digestion), different white poplar samples shared a high number of contigs; moreover, each sample also showed private contigs. Comparing the data, the magnitude of the number of private contigs was similar in the two different datasets (MspI or HpaII digestions). In particular, sample Malta 5 showed the lowest number of private contigs but with the same magnitude for both datasets (67 for MspI and 66 for HpaII digestion), whilst Malta 4 had the highest one (777 in the case of MspI, and 527 in the case of HpaII). For a better understanding of the molecular background of the epigenetic changes, we considered the most relevant contigs to be those that consistently showed identical polymorphisms among samples.
Step 2: In order to detect differentially methylated sequences or known genes (Archive S1A), contigs of each sample and of each digestion were analysed using BLASTN GenBank/NCBI, adopting A. thaliana as the reference species.
Step 3: In order to estimate the DNA methylation status of the genes identified (on the basis of the principles that we adopted and described above) for each white poplar sample, a Venn diagram was singularly elaborated including genes derived from either EcoRI-MspI or EcoRI-HpaII datasets ( Figure 2).
Step 2: In order to detect differentially methylated sequences or known genes (Archive S1A), contigs of each sample and of each digestion were analysed using BLASTN GenBank/NCBI, adopting A. thaliana as the reference species.
Step 3: In order to estimate the DNA methylation status of the genes identified (on the basis of the principles that we adopted and described above) for each white poplar sample, a Venn diagram was singularly elaborated including genes derived from either EcoRI-MspI or EcoRI-HpaII datasets ( Figure 2).

Figure 2.
Comparison between the EcoRI-MspI and EcoRI-HpaII profiles of each sample. The numbers represent the genes identified from the contigs vs. A. thaliana. The list of private or shared genes is reported in Archive S1B.
The results reported in Figure 2 show that the magnitude of gene numbers obtained from each digestion was comparable among all the samples. In particular, the number of genes unaffected by DNA methylation (shared genes between MspI and HpaII datasets-M1-H1, Figure 2) ranged between 33 (Malta 5) and 60 (Malta 4); in the case of genes affected by double-strand methylation of inner cytosine or hemi-methylation of inner cytosine (genes amplified just after the MspI-M1-H0) the number was between 22 and 41, whilst, when the hemi-methylated CHG-sites (hemi-methylation of inner and outer cytosine-M0-H1) were considered, the number of genes ranged between 29 and 67. A list of the genes affected by different DNA methylation status is reported in Table S1A.
Step 4: On the basis of the BLASTN and Venn analyses results, a pathway analysis was performed (for each sample and for each DNA methylation status) to identify those pathways enriched for genes showing different DNA methylation status. In addition, genes underwent gene ontology analysis in order to identify the enriched pathways, including genes affected by diverse DNA methylation status.
Step 5: Another step of the data analysis aimed to identify genes with the same DNA methylation status, specific to each sample or shared among the five white poplar samples. Toward this purpose, a Venn analysis was elaborated on differently methylated fragments ( Figure 3). The results reported in Figure 2 show that the magnitude of gene numbers obtained from each digestion was comparable among all the samples. In particular, the number of genes unaffected by DNA methylation (shared genes between MspI and HpaII datasets-M1-H1, Figure 2) ranged between 33 (Malta 5) and 60 (Malta 4); in the case of genes affected by double-strand methylation of inner cytosine or hemi-methylation of inner cytosine (genes amplified just after the MspI-M1-H0) the number was between 22 and 41, whilst, when the hemi-methylated CHG-sites (hemi-methylation of inner and outer cytosine-M0-H1) were considered, the number of genes ranged between 29 and 67. A list of the genes affected by different DNA methylation status is reported in Table S1A.
Step 4: On the basis of the BLASTN and Venn analyses results, a pathway analysis was performed (for each sample and for each DNA methylation status) to identify those pathways enriched for genes showing different DNA methylation status. In addition, genes underwent gene ontology analysis in order to identify the enriched pathways, including genes affected by diverse DNA methylation status.
Step 5: Another step of the data analysis aimed to identify genes with the same DNA methylation status, specific to each sample or shared among the five white poplar samples. Toward this purpose, a Venn analysis was elaborated on differently methylated fragments ( Figure 3).  Figure 1). In particular, (A) reports the results of the comparison of the genes identified in each sample for the M1-H1 condition, (B) those for M1-H0 condition, and (C) those in M0-H1 condition. The list of shared or private genes is reported in Table S1C.

Gene Ontology Analyses
Malta 1 For sample Malta 1, different biological processes were affected by the diverse DNA methylation status ( Figure 4A). Table S1C reports only the results with false discovery rate < 0.05.  Figure 1). In particular, (A) reports the results of the comparison of the genes identified in each sample for the M1-H1 condition, (B) those for M1-H0 condition, and (C) those in M0-H1 condition. The list of shared or private genes is reported in Table S1C.

Gene Ontology Analyses
Malta 1 For sample Malta 1, different biological processes were affected by the diverse DNA methylation status ( Figure 4A). Table S1C reports only the results with false discovery rate < 0.05.
In the case of condition M1-H0 (double-strand methylation of inner cytosine or hemi-methylation of inner cytosine), the genes affected were only over-represented with a significant p-value (p < 0.05) in the pathway of the generation of precursor metabolites and energy (GO:0006091).

Malta 2
The GO analysis of the sample Malta 2 ( Figure 4B) revealed that in the case of "no methylation condition", no pathway was significantly enriched (Table S1B). In the case of M1-H0 (double-strand methylation of inner cytosine or hemi-methylation of inner cytosine), four pathways were significantly overrepresented (FDR p-value < 0.05): photosynthetic electron transport chain (GO:0009767), electron transport chain (GO:0022900), photosynthesis, light reaction (GO:0019684) and generation of precursor metabolites and energy (GO:0006091).
When the condition M0-H1 hemi-methylated CHG-sites (hemi-methylation of inner and outer cytosine) was considered, the number of overrepresented pathways significantly increased, particularly those related to ATP-synthesis-coupled proton transport (GO:0015986), energy-coupled proton transport, down electrochemical gradient (GO:0015985) and ATP biosynthetic process (GO:0006754).
Malta 5 The sample Malta 5 showed that, in the case of the no-methylation condition, there were no pathways significantly enriched (Table S1B). When the condition M1-H0 (double-strand methylation of inner cytosine or hemi-methylation of inner cytosine) was considered, the most enriched pathways were: generation of precursor metabolites and energy (GO:0006091) and photosynthesis (GO:0015979) (FDR p-value 10 −4 ) and photosynthetic electron transport chain (GO:0009767) and photosynthesis, light harvesting (GO:0009765; FDR p-value 10 −3 ).

Venn Analysis of Differently Methylated Fragments
The results of the Venn analyses showed that 20 genes unaffected by DNA methylation ( Figure 3A) were shared by all five samples. The shared genes were not significantly enriched for any biological process. However, among them, four encode transfer RNAs (tRNAs; Figure S1B).
Whilst eight, one and thirteen specific genes were identified in Malta 1, Malta3 and Malta 4, respectively, in contrast, the eight genes specific to Malta 1 were significantly enriched for the biological processes related to response to virus (p < 0.005) and to heat (p < 0.05). The unique gene specific of Malta 3, PSI P700 apoprotein A2 (psaB) was not suitable for an enrichment analyses, whilst the 13 identified as specific to Malta 5 were enriched for biological processes related to mitochondrial translation and mitochondrial gene expression (p < 0.0005).
In the case of the double-strand methylation of inner cytosine or hemi-methylation of inner cytosine ( Figure 3B), only a single gene was shared among all five samples, identified as cytochrome b6/f complex subunit 4 (petD), whilst several genes were specific for Malta 1 (4 genes), Malta 2 (3 genes), Malta 3 (15 genes), Malta 4 (10 genes) and Malta 5 (6 genes). However, as in the case of Malta 3 and 5, the genes identified, which were affected by double-strand methylation of inner cytosine or hemi-methylation of inner cytosine, were significantly enriched (p < 0.05) for some biological processes-mainly those related to metabolic processes. These processes are reported in Figure 5. In the case of Malta 5, the biological processes affected by this kind of methylation and with highly significant p-values were related to cellular respiration or energy derivation by the oxidation of organic compounds ( Figure 6). In the case of hemi-methylated CHG-sites (hemi-methylation of inner and outer cytosine; Figure  4C), the number of genes shared among all the samples was 19, whilst those specific for Malta 1, Malta 2, Malta 3, Malta 4 and Malta 5 were eleven, six, two, ten and five, respectively. The genes characterized by hemi-methylated CHG-sites (hemi-methylation of inner and outer cytosine) and shared among samples were not significantly enriched for any biological process, but many of them were recognized as tRNA-more than that expected ( Figure S1C).
When the genes specific for each sample were considered, as in the case of Malta 5, they were significantly enriched (p < 0.05) for different biological processes, as shown in Figure 7. In particular, those with low significant p-values were related to photosynthetic electron transport of photosystem II, photosynthesis, light reaction and generation of precursor metabolites and energy. In the case of Malta 5, the biological processes affected by this kind of methylation and with highly significant p-values were related to cellular respiration or energy derivation by the oxidation of organic compounds ( Figure 6). In the case of Malta 5, the biological processes affected by this kind of methylation and with highly significant p-values were related to cellular respiration or energy derivation by the oxidation of organic compounds ( Figure 6). In the case of hemi-methylated CHG-sites (hemi-methylation of inner and outer cytosine; Figure  4C), the number of genes shared among all the samples was 19, whilst those specific for Malta 1, Malta 2, Malta 3, Malta 4 and Malta 5 were eleven, six, two, ten and five, respectively. The genes characterized by hemi-methylated CHG-sites (hemi-methylation of inner and outer cytosine) and shared among samples were not significantly enriched for any biological process, but many of them were recognized as tRNA-more than that expected ( Figure S1C).
When the genes specific for each sample were considered, as in the case of Malta 5, they were significantly enriched (p < 0.05) for different biological processes, as shown in Figure 7. In particular, those with low significant p-values were related to photosynthetic electron transport of photosystem II, photosynthesis, light reaction and generation of precursor metabolites and energy. In the case of hemi-methylated CHG-sites (hemi-methylation of inner and outer cytosine; Figure 4C), the number of genes shared among all the samples was 19, whilst those specific for Malta 1, Malta 2, Malta 3, Malta 4 and Malta 5 were eleven, six, two, ten and five, respectively. The genes characterized by hemi-methylated CHG-sites (hemi-methylation of inner and outer cytosine) and shared among samples were not significantly enriched for any biological process, but many of them were recognized as tRNA-more than that expected ( Figure S1C).
When the genes specific for each sample were considered, as in the case of Malta 5, they were significantly enriched (p < 0.05) for different biological processes, as shown in Figure 7. In particular, those with low significant p-values were related to photosynthetic electron transport of photosystem II, photosynthesis, light reaction and generation of precursor metabolites and energy. Larger dots indicate the more highly significant p-values.

Discussion
As sessile organisms, plants have evolved sophisticated mechanisms of gene regulation that allow them to survive in different environmental conditions, with their extreme fluctuations, as in those related to climate change. Plants must be able to regulate gene expression quickly, in order to adapt to the environmental challenges. This rapid response includes, of course, an efficient sensing of signals, efficient transmission through a cascade of signal transduction pathways and, subsequently, synthesis and mobilization of the relevant transcription factors [10]. At the same time, plant responses to environmental changes and other stresses also involve epigenetic regulation (e.g., modifications of gene expression that occur without changes in the DNA sequence [20]). The main epigenetically driven modifications belong to three types of mechanisms: chromatin restructuring and post-translational modification of histones, DNA methylation changes, and post-transcriptional modifications regulated by non-coding RNAs (microRNA or long non-coding RNAs) [21]. These mechanisms can act synergistically to reinforce the overall chromatin packaging density, which in turn affects the transcriptional state of genes [22]. In plants, DNA methylation is a major epigenetic phenomenon that results in longer-lasting gene expression changes, and the mechanisms controlling DNA methylation inheritance are well established [23,24]. Given the high prevalence and functional implications of 5mC as an epigenetic marker, its detection has been widely investigated. Many experimental approaches have been employed to detect cytosine methylation status in plants [25][26][27]. In general, these methods include the bisulfite conversion, affinity enrichment and restrictionenzyme-mediated filtration, and more recently, next-generation sequencing (NGS). Another commonly employed molecular method for evaluating DNA methylation in the adaptation of nonmodel plants is MSAP, which is an easy, effective and low-cost approach. In plants, MSAP was used for the first time in rice [28]. After that, the method was employed for examining epigenetic variations in natural populations of various plant species, including herbaceous (viola, orchis, Dactylorhiza spp., giant reed, etc.) and woody species such as mangroves and poplars. Recently, the standard MSAP protocol has been enhanced by deep amplicon sequencing using NGS. Baranek et al. (2016) reported for the first time the combination of MSAP standard analysis (digestions of DNA with EcoRI and HpaII or MspI enzymes), followed by the NGS of PCR selective amplicons, to study two stable somatic mutants of wheat [29]. Analysing two wheat genotypes in combination with their somaclones, characterized by a changed heritable phenotype, the authors were able to study epigenetically induced changes and, at the same time, identify over 100 amplicons that were differentially methylated among the samples. Another MSAP-based approach combined with NGS is MSAP-Seq [14], which is also integrated with an automatic pipeline MSEQER. This method used conventional Larger dots indicate the more highly significant p-values.

Discussion
As sessile organisms, plants have evolved sophisticated mechanisms of gene regulation that allow them to survive in different environmental conditions, with their extreme fluctuations, as in those related to climate change. Plants must be able to regulate gene expression quickly, in order to adapt to the environmental challenges. This rapid response includes, of course, an efficient sensing of signals, efficient transmission through a cascade of signal transduction pathways and, subsequently, synthesis and mobilization of the relevant transcription factors [10]. At the same time, plant responses to environmental changes and other stresses also involve epigenetic regulation (e.g., modifications of gene expression that occur without changes in the DNA sequence [20]). The main epigenetically driven modifications belong to three types of mechanisms: chromatin restructuring and post-translational modification of histones, DNA methylation changes, and post-transcriptional modifications regulated by non-coding RNAs (microRNA or long non-coding RNAs) [21]. These mechanisms can act synergistically to reinforce the overall chromatin packaging density, which in turn affects the transcriptional state of genes [22]. In plants, DNA methylation is a major epigenetic phenomenon that results in longer-lasting gene expression changes, and the mechanisms controlling DNA methylation inheritance are well established [23,24]. Given the high prevalence and functional implications of 5mC as an epigenetic marker, its detection has been widely investigated. Many experimental approaches have been employed to detect cytosine methylation status in plants [25][26][27]. In general, these methods include the bisulfite conversion, affinity enrichment and restriction-enzyme-mediated filtration, and more recently, next-generation sequencing (NGS). Another commonly employed molecular method for evaluating DNA methylation in the adaptation of non-model plants is MSAP, which is an easy, effective and low-cost approach. In plants, MSAP was used for the first time in rice [28]. After that, the method was employed for examining epigenetic variations in natural populations of various plant species, including herbaceous (viola, orchis, Dactylorhiza spp., giant reed, etc.) and woody species such as mangroves and poplars. Recently, the standard MSAP protocol has been enhanced by deep amplicon sequencing using NGS. Baranek et al. (2016) reported for the first time the combination of MSAP standard analysis (digestions of DNA with EcoRI and HpaII or MspI enzymes), followed by the NGS of PCR selective amplicons, to study two stable somatic mutants of wheat [29]. Analysing two wheat genotypes in combination with their somaclones, characterized by a changed heritable phenotype, the authors were able to study epigenetically induced changes and, at the same time, identify over 100 amplicons that were differentially methylated among the samples. Another MSAP-based approach combined with NGS is MSAP-Seq [14], which is also integrated with an automatic pipeline MSEQER. This method used conventional MSAP analysis (digestion of DNA with EcoRI and HpaII enzymes) and replaced the gel separation of amplicons (from selective PCR) with NGS. To validate the procedure, the authors provided two case studies that measured the effects of different stresses on DNA methylation in barley. One of these studies was focused on the modulation of the leaf methylome in plants grown under water deficiency, followed by drought recovery. The new method allowed the authors to identify around 3000 different DNA sites that underwent methylation changes under water stress-many of which were located within genes or in repetitive elements. The second case study compared barley methylomes of different plant organs (leaf vs. root) during drought and the subsequent re-watering step. The analysis was a validation of the MSAP-Seq method for this kind of analysis, and interestingly revealed that, under stress conditions, some gene regions were subject to transient and reversible methylome modifications, while many repetitive elements underwent irreversible methylation or completely reversible demethylation. Recently, an adaptation of MSAP-Seq has been employed to probe the DNA methylation status in different tissues of Eucalyptus grandis W. Hill. and generate a reference genome of the species [30].
In our study, we combined MSAP with NGS to study the DNA methylation status of white poplar monoclonal stands from the island of Malta. We modified the MSAP method by reducing the PCR amplification steps (eliminating the second PCR step) and applying appropriate biostatistical analysis of NGS data. We employed the original MSAP protocol, where genomic DNA is divided into two aliquots and digested with the enzyme EcoRI, which has a restriction site that is only negligibly affected by DNA cytosine methylation, and one of two isoschizomers (MspI or HpaII) exhibiting differential sensitivity to cytosine methylation. According to the genome size and abundance of the restriction sites in the DNA, after ligation to two specific DNA adapters, compatible with EcoRIor MspI/HpaII-generated DNA ends, the DNA fragments were PCR amplified. By performing only one round of PCR amplification using pre-selective primers complementary to the adapter sequences, we were able to obtain a very large subset of fragments for the NGS step, whereas with conventional MSAP, only a few hundred (or, in a few cases, thousands) of amplicons are potentially scorable after capillary gel electrophoresis. In addition, performing NGS with MSAP-Seq avoids the formation of homoplastic bands, which is a common problem in gel-based MSAP studies [31]. As a result of these modifications, we were able to survey the entire white poplar genome. Moreover, the method described here is readily adaptable to other non-model organisms.
In our survey, all reads obtained were used to assemble a de novo reference genome for P. alba, and we evaluated the quality of the assembly in terms of read representation by mapping reads back to the assembled genome. The quality of the assembly, assessed by the number of contigs generated for both data sets (EcoRI-MspI and EcoRI-HpaII) was high, and confirmed the reproducibility of the MSAP method, as demonstrated by the good correlation among different clonal samples (e.g., biological replicates, or ramets). When the genome sequence of the species of interest is unavailable, genomic resources such as protein sequences and/or genomic scaffolds from a closely related species serve as reference scaffolds to align and realign de novo assembled contigs. For our analysis, each contig was identified/annotated by alignment against the Arabidopsis thaliana genome [32], because this species forms the most prominent basis for annotations for other plant species, more closely related genomes that have been published more recently [33].
The interpretation of MSAP data is based on the recognition of MspI or HpaII restriction sites, which can be modified by methylation, often due to pedoclimatic variations. We performed the comparison of the fragments derived by both digestion patterns, within each single sample and among all the analysed ones. This comparison allowed us to identify the number of unaffected genes (shared genes between MspI and HpaII datasets-M1-H1), and those affected by the DNA methylation (by double-strand methylation of inner cytosine or the hemi-methylation of inner cytosine (M1-H0); or by the hemi-methylation of CHG-sites (M0-H1)). The results obtained suggest that similar methylation polymorphisms were present in the clonal white poplar populations. Regarding the functional annotation, a pathway analysis was performed for each sample and for each DNA methylation status, in order to identify those pathways enriched for genes showing different DNA methylation status, and to discover the potential role of specific DNA sequences in epigenetically-driven processes in white poplar. When a single poplar population/stand was considered, a similar trend in the pathway analysis was observed, revealing a similar influence of the environment on gene regulation, which were mainly involved in chloroplast and mitochondria (essential in fully expanded leaves), photosynthesis, generation of precursor metabolites and energy, and oxidation-reduction processes.
It is not surprising that several genes involved in photosynthesis and electron transport were affected by methylation. These essential metabolic processes [34] require fine regulation in order to respond to natural environments with highly variable and dynamic conditions. The regulation of photosynthesis-related genes is aimed at satisfying plant metabolic demands and, at the same time, avoiding an over-stimulation of the electron transport chain and the generation of harmful reactive oxygen species. Moreover, photosynthetic processes strongly influence plant fitness. Photosynthesis-related processes must be regulated at the level of the individual leaves, and indeed, individual cells and organelles (chloroplasts and mitochondria). Our analyses were performed on fully expanded and photosynthetically active leaves. It is therefore not surprising to find these genes among our results, although their high prevalence was unexpected. It would be interesting to expand our analysis to leaves at different positions in a single tree, in order to investigate these regulatory mechanisms more deeply.
The category "generation of precursor metabolites and energy" includes proteins related to primary metabolism, such as carbon fixation, glycolysis, the citric acid cycle and the electron transport chain. Our results confirmed, once more, that primary metabolism can be generally defined as essential for growth, survival [35] and plant adaptation. Some of the detected DNA fragments were linked to the oxidation-reduction process, indicating that modulation of the cellular redox status is important in changing environmental conditions. Again, our observations highlight the key role of the fine-regulation of gene expression in leaves as paramount for the energy-acquiring photosynthetic process.
Comparisons made between all the samples identified genes with the same DNA methylation status that were specific to individual samples or shared between the five white poplar populations/stands. Among the genes unaffected by methylation, the shared ones (28) within samples were not significantly enriched for any biological process. Meanwhile, among the specific genes per sample, some expression pathways were enriched-for example, biological processes involved in response to viruses and to heat (in the case of Malta 1 population/stand) and those related to mitochondrial translation and mitochondrial gene expression (for Malta 5 population/stand). Mitochondria play a pivotal role in cellular energy metabolism. Specific changes in the rates of respiration are required to meet alterations in energy demands at specific stages in plant growth and development, or under different environmental conditions. The regulation of organelle biogenesis and activity requires numerous nuclear proteins to regulate/modulate transcription, splicing, trimming, editing and translation of organellar RNAs, which requires nucleus-to-organelle (anterograde) communication. Furthermore, all of these processes are mediated by numerous nuclear-encoded cofactors, which are only partially identified and deciphered.
Some genes affected by double-strand methylation of the inner cytosine or hemi-methylation of the inner cytosine were specifically identified in a single poplar population/stand, and were significantly enriched in some biological processes (e.g., those related to metabolic processes (Malta 3 and 5 populations/stands) or to cellular respiration and energy derivation by the oxidation of organic compounds (Malta 5 population/stand) confirming the relevance of epigenetic modulation for metabolism and energy pathways in white poplar.
In the case of hemi-methylated CHG-sites, the genes shared among all the samples were more abundant (19) than the specific ones (for Malta 1, Malta 2, Malta 3, Malta 4 and Malta 5 populations/stands were thirteen, six, two, ten and six, respectively). However, they were not significantly enriched for any biological process, but they included some genes encoding tRNAs. The tRNA molecules contain the most abundant post-transcriptional modifications, and are crucial for proper gene expression and protein biosynthesis. Moradi et al. (2019) found that cucumber plants resistant to the biotic stress caused by the powdery mildew pathogen showed demethylation in those DNA regions located within or near tRNA coding genes [36]. The authors hypothesized that the demethylation of tRNA genes might suppress the expression of genes that are essential for increasing the availability of certain amino acids present in defence proteins.
When the genes specific for each sample were considered, only in the case of the Malta 5 population/stand, pathways related with photosynthesis, light reaction, and the generation of precursor metabolites and energy were enriched. Malta 5 was collected in a more remote site than the others; moreover, from a geographical analysis it emerged that this stand lives in an area characterized by a different lithology ( Figure S1D). This is not sufficient to demonstrate the correlation between habitat and epigenetic response, but it suggests that many other features, both ecological and pedo-climatic, should be taken into account.
In summary, this study allowed us to further elaborate the nature of DNA methylation changes for populations and stands of white poplar, the existence of which we observed previously in two different species [8,9,37,38]. To extend these studies, it would be necessary to evaluate the extent and nature of DNA methylation changes not just among individuals of a clonal population in different environments, but also in organs at different positions on the same plant (e.g., leaves). Those differences sustain the hypothesis that DNA methylation modifications can be considered as an epigenetic marker that is strictly related to the habitat where plants live, and to the position of specific organs within the plant.

Plant Material
Well-expanded mature leaves from five white poplar trees were collected from natural populations/stands on the island of Malta ( Figure S1A). These samples, which were previously characterized by means of microsatellite markers [15], showed a very limited genetic biodiversity.

DNA Isolation and MSAP Analysis
Total genomic DNA was isolated from leaves (100 mg) using the DNeasy Plant Mini Kit (Qiagen, Hilden, DE). Purified DNA solutions were then processed by means of the MSAP method [12], as described in [8]. Briefly, we used the isoschizomers HpaII and MspI (methylation-sensitive restriction enzymes) as "frequent cutters" and EcoRI as a "rare cutter" enzyme (restriction enzyme source: Fermentas, Milano-IT). The different sensitivity of the two isoschizomers HpaII and MspI towards methylation is reported in Table 1, where 1 indicates the presence of fragment and 0 its absence. The fragments were ligated to adapters in order to allow a first PCR amplification (pre-selective). Usually, at this step, the MSAP protocol foresees a second PCR (selective PCR) that is more selective than the first one, due to the use of the same primers but with a longer tail (two/three DNA bp added at the 3 -end tail). The effect of this step is to reduce the number of amplicons, compared to those produced by the first PCR step. Potentially, this leads to a loss of information, but it permits the separation of the DNA fragments in a common automated DNA sequencer. In order to avoid reducing the number of amplicons, in our modified MSAP protocol we sequenced the amplicons that were obtained from the first, pre-selective PCR amplification.

Library Preparation for NGS Analysis of Amplicons
In order to sequence the large number of amplicons produced with the first MSAP-PCR, the following steps were performed. An end-repair step is needed when PCR amplicons contain sticky ends, as in the case of the EcoRI restriction sites. Briefly, 32 µL of the fragmented amplicon solution were added to 15 µL of end repair and adenylation buffer and 3 µL of end repair and adenylation enzyme mix (NEXTFLEX, PerkinElmer, UK). This mixture was incubated in a thermocycler under the following conditions: 22 • C for 20 min, 72 • C for 20 min and then 4 • C on hold. Next, five barcoded adapters were ligated to each sample by adding 47.5 µL of ligase enzyme mix (NEXTFLEX, PerkinElmer, Beaconsfield, UK) and 25 ng of DNA barcode adapter to 50 µL of end-repaired and adenylated DNA amplicons. Reactions were incubated in a thermocycler for 15 min at 22 • C. Ligated DNA amplicons were precipitated by adding 0.1 volumes of cold (4 • C) 3 M Sodium Acetate (NaAc) solution and two volumes 96% ethanol (stored at −20 • C) and left at −20 • C overnight. The precipitated DNA amplicons were centrifuged at 14 kRPM for 15 min. DNA pellets were washed with 70% ethanol and mixed in 50 µL of resuspension buffer. The ligation mixtures were used as template for PCR amplification. The PCR amplifications were performed in a final volume of 50 µL, containing 5 µL of ligation mixture and 12 µL of dedicated PCR master mix (NEXTFLEX, PerkinElmer-UK) under the following conditions: initial denaturation at 98 • C, followed by 6 cycles of 98 • C for 30 s, 65 • C for 30 s and 72 • C for 60 s, with a final elongation step at 72 • C for 4 min. Finally, the amplicons were precipitated with 0.1 volumes of 3 M NaAc and two volumes of absolute EtOH, centrifuged at 14 kRPM, and the DNA pellets were mixed in 21 µL of resuspension buffer. For the cluster generation, the libraries were sequenced using the Illumina HiSeq 1500 system (Illumina, San Diego, CA, USA).

NGS Data Analysis
Sequence quality was verified with the UNIX-based package fastQC-0.10.1 software. Reads were processed by CLC suite (CLC/Qiagen, Aarhus, Denmark) in order to filter, remove the unused adapters and polyA, and trim the sequences. Since the sequence of the entire genome of P. alba was not available at the time of the study, the reads obtained were analysed through de novo assembly and pooling all the reads relative to each sample (both MspI and HpaII fragments). In this way, a list of contigs was produced. Then, the fragments obtained from each double digestion (EcoRI-MspI and EcoRI-HpaII) were mapped using these contig lists. The sequences with less than two counts of reads were discarded. In order to check the robustness of the analyses, a Venn diagram was built separately for either MspI or HpaII profiles, considering the ID of the contigs found in each sample.
Contigs of each sample and for both MspI and HpaII fragments were analysed for similarity by BLASTN using as reference genomes those of Populus trichocarpa Torr. & A.Gray ex. Hook., Zea mays L. (results not shown for both) and Arabidopsis thaliana (L.) Heynh., which is also the basis of annotations of the newly published P. alba genomes [38].
Subsequently, in order to apply the data interpretation foreseen in the case of the MSAP protocol, consisting of the comparison between both MSAP profiles (MspI and HpaII) of the same analysed sample, a Venn diagram for each sample was elaborated including the genes obtained by MspI digestion vs. those obtained by HpaII digestion. On the basis of the presence or absence of the genes in one category or both categories (MspI and HpaII), the DNA methylation status of the identified genes was estimated as reported in Table 1. In order to identify pathways enriched for genes showing different DNA methylation, gene ontology analyses were performed separately for each sample and for each DNA methylation status, using the Panther classification system [39,40]. Only the results with a false discovery rate (FDR) < 0.05 were considered further.
Furthermore, in order to identify genes showing the same DNA methylation status that were specific for each sample or shared among the five samples, three Venn analyses were generated. The shared genes and the specific ones were also analysed using the ShinyGo [41] web-based software on gene ontology (GO) for annotation and gene ID, mapping plant genomes in Ensembl BioMart release 96.
Network analyses were performed using the GeneMania software [42], which identifies genes related to a set of input genes using a very large set of functional association data. Association data include protein and genetic interactions, pathways, co-expression, co-localization and protein domain similarity. The overall experimental pipeline and approach used for extraction of the final set of sequences are summarized in Figure 8. include protein and genetic interactions, pathways, co-expression, co-localization and protein domain similarity. The overall experimental pipeline and approach used for extraction of the final set of sequences are summarized in Figure 8. Venn analysis comparing the genes in the same DNA methylation status among all samples; to identify the shared genes and the private ones Gene ontology analyses were performed to identify those pathways enriched for genes with the same DNA methylation status.
Network analyses were performed through GeneMania software

Conclusions
This study on clonal white poplar populations/stands of the Maltese archipelago focused on the integration of two reliable and powerful analytical tools-namely, the cost-effective MSAP analysis and NGS-to identify genes and pathways affected by changes in DNA methylation. This approach is particularly well-suited to plant species for which the genome has not yet been sequenced. Using this combined approach, we showed that it is possible to exploit the NGS technology to sequence the amplicons obtained by the MSAP protocol and to utilize the same biostatistical methods as those employed for the analysis of DNA methylation status revealed by MSAP analysis. Furthermore, it allowed us to identify DNA sequences/genes in which the DNA methylation status was most likely determined by pedoclimatic changes and in response to external stimuli. Our results confirm that this combined approach is reliable and informative and is especially suitable for unannotated genomes.