A Comparison of Three Circular Mitochondrial Genomes of Fagus sylvatica from Germany and Poland Reveals Low Variation and Complete Identity of the Gene Space

: Similar to chloroplast loci, mitochondrial markers are frequently used for genotyping, phylogenetic studies, and population genetics, as they are easily ampliﬁed due to their multiple copies per cell. In a recent study, it was revealed that the chloroplast offers little variation for this purpose in central European populations of beech. Thus, it was the aim of this study to elucidate, if mitochondrial sequences might offer an alternative, or whether they are similarly conserved in central Europe. For this purpose, a circular mitochondrial genome sequence from the more than 300-year-old beech reference individual Bhaga from the German National Park Kellerwald-Edersee was assembled using long and short reads and compared to an individual from the Jamy Nature Reserve in Poland and a recently published mitochondrial genome from eastern Germany. The mitochondrial genome of Bhaga was 504,730 bp, while the mitochondrial genomes of the other two individuals were 15 bases shorter, due to seven indel locations, with four having more bases in Bhaga and three locations having one base less in Bhaga. In addition, 19 SNP locations were found, none of which were inside genes. In these SNP locations, 17 bases were different in Bhaga, as compared to the other two genomes, while 2 SNP locations had the same base in Bhaga and the Polish individual. While these ﬁgures are slightly higher than for the chloroplast genome, the comparison conﬁrms the low degree of genetic divergence in organelle DNA of beech in central Europe, suggesting the colonisation from a common gene pool after the Weichsel Glaciation. The mitochondrial genome might have limited use for population studies in central Europe, but once mitochondrial genomes from glacial refugia become available, it might be suitable to pinpoint the origin of migration for the re-colonising beech population.


Importance of European Beech and Its Population Structure
European beech is the climax tree species in most regions of Germany and adjacent areas [1]. Due to its valuable hardwood, it is also widely planted. Climate change has impacted some beech stands in drier regions of Germany, especially during the past few years, which were characterised by record heat and drought events [2,3], and might eventually lead to a shift to oak forests in those areas [4]. However, at the northern edge of its distribution, beech is also gaining ground, replacing oaks in north-eastern habitats [5]. The ecological importance and dominance of beech in central Europe and its peculiar post-glacial migration pattern, with an arrival in northern Europe well after the post-glacial optimum [6], have triggered several population genetics studies. Most of these relied on nuclear markers [7][8][9][10][11]. However, nuclear markers are inherited both maternally and paternally, obscuring local migration patterns due to pollen influx from more distant populations.

Maternally Inherited Markers Are Important for Tracking Migration Routes
As seeds of beech are rarely dispersed over long distances, markers derived from maternally inherited organelle genomes of the chloroplast and mitochondria are likely to resolve fine-scale migrations better than nuclear markers [12]. Some studies have considered chloroplast markers for tracing migration [13] and beech population genetics [14][15][16][17][18]. However, recently it was shown that chloroplast genomes vary hardly in central Europe, providing limited resolution for investigation [18]. Mitochondrial genomes are also maternally inherited [19] but have larger genomes in plants [20], thus, offering potentially more loci for analysis, as also mutation rates in mitochondria are higher than in chloroplasts [21]. Recently, a mitochondrial genome was reported for European Beech [22] for the same individual from whom a chloroplast sequence had been generated before.
Thus, it was the aim of this study to provide additional complete mitochondrial sequences from the two other individuals from which chloroplast sequences had been obtained previously [23]. These were the more than 300-year-old reference individual Bhaga from the National Park Kellerwald-Edersee in Germany [24], which corresponds to the centre of the current distribution range, and a younger individual, Jamy, sampled from the Jamy Nature Reserve in Poland, corresponding to the north-eastern area of the distribution range. The sequences from all three individuals were then compared to each other for detecting variable positions in the mitochondrial genomes, as these positions might be suitable for future population-scale migration studies focusing on maternally inherited mutations.

Data Acquisition and Initial Draft Assembly of the Mitochondrial Genome of Bhaga
DNA was obtained from the European beech reference individual Bhaga from the National Park Kellerwald-Edersee (51 • 10 09 N 8 • 57 47 E) in a previous study [24] and sequenced as outlined therein.
The mitochondrial genome was assembled using both short (Illumina) and long (PacBio) reads obtained previously [24]. The PacBio reads were double corrected by first performing a correction with Illumina reads using proovread [25], which was followed by self-correction using Canu [26]. Double-corrected PacBio reads were aligned with five different plant mitochondrial genomes using BLAST, as described previously [23]. All PacBio reads with a match over more than 50% of their length to any of the targets were used for the construction of an assembly using Canu. The SSpaceLong scaffolder [27] was used to link contigs using all initially used PacBio reads. Subsequently, reads matching to mitochondria with at least 10% of their length were added. The subsequent reassembly using Canu resulted in a circular contig and of 531,393 bp without ambiguous nucleotide positions. This contig was considered to be the initial mitochondrial genome assembly and was subjected to further refinements, as detailed below.

Manual Curation of the Assembly
Illumina and PacBio reads were aligned over the draft assembly to check for coverage continuity, and a stretch was found with half of the average coverage but otherwise complete identity, revealing a misassembly. Consequently, the duplication was removed, and the continuity of the resulting mitochondrial genome was verified using read mapping, which now revealed an even coverage without breaks. Pilon [28] was subsequently used for refinement of the final circular mitochondrial genome assembly with the raw data initially used.

Assembly of the Mitochondrial Genome of Jamy
The mitochondrial genome of the individual Jamy originating from the Jamy Nature Reserve, Poland (18 • 56 6.07 E, 53 • 35 9.67 N) was assembled on the basis of 18,306,393 (paired end, 150 bp) Illumina reads, using the de novo assembler NOVOPlasty v 4.2 [29] with AY453092.1 as seed sequence [30] and the Bhaga mitochondrial genome as reference for guiding the assembler in repeat regions.

Functional Annotation and Circular Representation
The sequence of the mitochondrial genome was annotated as described previously for the chloroplast genome of Bhaga [23], using GeSeq ChloroBox [31] but with the sequence MT446430.1 [22] as reference for CDS + rRNA prediction and manual corrections performed using Unipro UGENE [32] assisted by bwa-mem [33]. Duplicate annotations were removed from the GenBank file manually, and the resulting file was used to obtain a graphical representation using OGDraw of the GeSeq Chlorobox package. The mitochondrial genomes of Bhaga and Jamy were deposited in GenBank (https://www.ncbi.nlm.nih.gov/genbank/) under the accession numbers MW771358 and MW582695, respectively.

General Features of the Mitochondrial Genome Assemblies
The circular genome of the mitochondrion of the Fagus sylvatica reference individual, Bhaga, was 504,730 bp long (Figure 1), with an average coverage of 1145×, while that of Jamy was 504,715 bp long, with an average 160× coverage. The GC content was 45.8%, in both cases. In both Bhaga and Jamy 40.65 kb of the mitochondrial genome consisted of repeat elements. In total, 49 protein-coding genes, 17 tRNAs, and 5 rRNAs were annotated in the mitochondrial genomes of Bhaga and Jamy, while 35 protein-coding genes, 20 tRNA genes, and 3 rRNA genes had been reported in the recently published mitochondrial genome of Fagus sylvatica [22]. However, when the previously reported mitochondrial genome [22] was re-annotated using the approach described above, we found the same number of genes as in Bhaga and Jamy.

Comparison of Mitochondrial Genome Assemblies
Very few variable locations in the mitochondrial genome were found in the comparison of the mitochondrial genomes of Bhaga, Jamy, and a genome from eastern Germany recently published (GenBank accession MT446430) [22]. None of the variable positions were present within genes (Table 1). Interestingly, the mitochondrial genome of Jamy from north-central Poland and the previously published genome from eastern Germany were highly similar, differing only in two SNPs and no indel positions. In both of these SNPs, the alternate base in Jamy was identical to the one called in Bhaga. Bhaga differed from Jamy and the previously published mitochondrial genome in 17 and 19 SNPs, respectively. In addition, Bhaga differed in seven indel positions from Jamy and the individual from eastern Germany. The longest of these indels was a stretch of 7 bp present only in Bhaga (Table 1).

Why Variable Regions in Maternally Inherited Genomes Are Important to Identify in Beech
European Beech has received much research attention over the past decades since it has developed into one of the most important forestry trees in central Europe [34]. While European Beech is able to cope with significant climatic and edaphic differences, e.g., from the semi-arid Gargano area in southern Italy [35] to the perhumid areas of eastern Sweden [36], its migration by seed dispersal is slow [37], despite occasional distribution by seed predators [38], and probably a limiting factor in its distribution range expansion [39]. As a consequence, it was the last dominant broadleaf tree to colonise northern Europe after the Weichsel glaciation had ended, arriving in Sweden when Mycenaean Greece flourished [40]. This also means that genotypes fitting to a specific climatic niche in their future migration might not hold pace with predicted future warming, leading to a reduction in its distribution area [41] and probably ending its dominance in large parts of its current distribution range. Thus, for assessing the risk of decline of beech stands throughout the 21st century, it is important to understand its past migration and the contribution of pollen flow and seed dispersal in shaping current beech populations.
To disentangle these processes, maternally inherited markers that travel solely by seed dispersal and markers not bound to the female line need to be considered. While the nuclear genomic diversity can disperse via pollen, organelle DNA is, apart from rare events [42], bound to the female gametophyte and thus can be used to trace migration via seed dispersal. However, chloroplast genomes have only a very low diversity in the central to the northern current range of European Beech [18] and thus likely insufficient resolution for tracing fine-scale population dynamics. Thus, the current study was focused on the much larger mitochondrial genome, which is also maternally inherited, to identify potential additional markers for tracing migration by seed dispersal, including anthropogenic dispersal in the Holocene.

Utility of the Genomic Diversity of Organelle Genomes in Beech for Tracing Migration
The mitochondrial genome of the reference individual Bhaga from the National Park Kellerwald-Edersee in central Germany and the individual Jamy from the Jamy Nature Reserve in northern Poland were assembled and compared with the mitochondrial genome previously published for an individual from eastern Germany [22]. The same three individuals had previously been used for assessing chloroplast genome diversity in northern central Europe [23]. In line with the low diversity found among the three chloroplast genomes [18,23], also the mitochondrial genome showed little variation when comparing the three individuals. Apart from seven indel positions, 19 SNP positions were found, but none was located within a gene. This is noteworthy since it suggests a higher degree of conservation of the gene space than of the non-coding mitochondrial genome content. Interestingly, the mitochondrial genome previously reported from Brandenburg in eastern Germany [22] and that of Jamy from northern Poland are highly similar and distinguished only by two SNPs, suggesting a common ancestry for these populations. However, they differ from Bhaga in central Germany by 19 and 17 SNPs, respectively (Table 1). Whether this is best explained by Bhaga being a relic genotype in an extreme habitat or by migration history needs to be clarified in future studies. Although the number of SNP positions found in this study is not high, it might be sufficient to resolve the routes of migration to central Europe.
Based on population genetics studies conducted so far, including those based on organelle DNA (e.g., [17]), it can be expected that a much higher diversity of mitochondrial genomes will exist in Pleistocene refugia of Fagus sylvatica, since the diversity in organelle genomes is reflected by the diversity in nuclear markers in general (e.g., [43]). It thus seems likely that mitochondrial genomes will be useful in pinpointing the populations of European Beech that migrated to the previously glaciated areas of Europe. In addition, such studies might also help to estimate potential future migration and identify areas in which assisted migration of genotypes [44] could be used as a strategy to save beech forests for future generations, if identified limitations, such as potential maladaptations, e.g. with respect to prevailing pathogens, can be resolved [45,46].