Sequencing of Organellar Genomes of Nowellia curvifolia (Cephaloziaceae Jungermanniales) Revealed the Smallest Plastome with Complete Gene Set and High Intraspeciﬁc Variation Suggesting Cryptic Speciation

: The leafy liverwort Nowellia curvifolia is a widespread Holarctic species belonging to the family Cephaloziaceae. It is made up of a newly sequenced, assembled and annotated organellar genomes of two European specimens, which revealed the structure typical for liverworts, but also provided new insights into its microevolution. The plastome of N. curvifolia is the second smallest among photosynthetic liverworts, with the shortest known inverted repeats. Moreover, it is the smallest liverwort genome with a complete gene set, since two smaller genomes of Aneura mirabilis and Cololejeunea lanciloba are missing six and four protein-coding genes respectively. The reduction of plastome size in leafy liverworts seems to be mainly impacted by deletion within speciﬁc region between psb A and psb D genes. The comparative intraspeciﬁc analysis revealed single SNPs difference among European individuals and a low number of 35 mutations differentiating European and North American specimens. However, the genetic resources of Asian specimen enabled to identify 1335 SNPs in plastic protein-coding genes suggesting an advanced cryptic speciation within N. curvifolia or the presence of undescribed morphospecies in Asia. Newly sequenced mitogenomes from European specimens revealed identical gene content and structure to previously published and low intercontinental differentiation limited to one substitution and three indels. The RNA-seq based RNA editing analysis revealed 17 and 127 edited sites in plastome and mitogenome respectively including one non-canonical editing event in plastid chi L gene. The U to C editing is common in non-seed plants, but in liverwort plastome is reported for the ﬁrst time.


Introduction
Organellar genomes of early land plants are known from their stable structure and almost identical gene order present in all main evolutionary lineages [1][2][3][4][5], which include complex thalloids, simply thalloids, leafy and early divergent leafy liverworts from the order Treubiales and Haplomitriales. Despite the fact that the organellar genomes of model liverwort Marchantia polymorpha were among first sequenced plant plastomes [6] and mitogenomes [7], the number of complete genomes sequences was very limited, till the end of the second decade of XXI century, when several paper were published significantly expanding our knowledge about evolutionary dynamics of these molecules [2][3][4][5]. The current genomic resources comprise 61 mitogenomes and 59 plastomes complete sequences, but many liverwort families do not have their representatives.
Among all known liverwort chloroplast genomes, the typical structure is present including large single copy unit (LSC) and small single copy unit (SSC), which are sep-arated by two inverted repeats (IRs), however the plastome size and GC content differ among lineages.
The smallest plastome (108,007 bp) was found in mycoheterotrophic Aneura mirabilis [8], while the largest one (128,728 bp) in the early divergent Haplomitrium blumei [4]. The plastome of the latter species also shows the highest GC-content, with 44.4%, whereas the lowest GC-content (28.2%) was found in the plastid genome of the complex thalloid Conocephalum salebrosum [9].
The structural changes of plastomes are mainly limited to small rearrangements at borders of inverted repeats and single copy regions, which lead to IR expansion [4,7] or losing of six genes by a pseudogenization in mycoheterotrophic species [6]. The gene content of plastome units is conserved in almost all known liverworts, with exception of Conocephalum conicum where two genes, rps7 and rps12 were transferred from LSC to IR [9].
Apart from vascular plants the liverwort mitogenomes are structurally stable, with only one known exception of Gymnomitrion concinnatum, where two recombination events appear [10]. This stability seems to be associated with the lack of a repetitive sequence in the mitogenomes of early land plants, which are common in seed plants [1]. However, in the case of the species mentioned above, the number of repeats is lower than in other liverworts. The other differences in liverwort mitogenome structures include changes of an intron content in leafy liverwort lineage [3,11] and presence of the nad7 gene in early divergent Treubiales and Haplomitriales [3,12].
The leafy liverwort Nowellia curvifolia (Dicks.) Mitt. belongs to the family Cephaloziaceae, which includes 17 genera. The genus Nowellia has been resolved by phylogenetic studies as a sister to Cephalozia [13]. Nowellia is a small genus, which includes only two species: N. curvifolia widespread in Holarctic and N. wrightii endemic to Cuba. N. curvifolia grows on rotting logs in forests and produces up to 20 mm long shoots, which are green in the shade and red in the sunny condition. The morphology of N. curvifolia is very charateric by its inflated leaves with two long, narrow, incurved lobes ended with long, slender teeth [14]. The well-developed morphological features are one of the reason of overlooked cryptic diversity, which often lead to description of new species [15][16][17]. Despite recent publication of several papers dealing with liverwort plastomes [4,5,9,10,18], the GenBank entries didn't include any representative of the Cephaloziaceae family.
In the present study, we sequenced, assembled and characterized organellar genomes of two N. curvifolia specimens from Europe. Obtained genomes were compared to close relatives of Nowellia and to shed some light on intraspecific variation of these molecules. We also report the first complete plastome sequences of a member of the Cephaloziaceae family.

Materials and Methods
The specimen details, including herbarium number and locality are given in Supplementary Table S1. The DNA was extracted using the Qiagen Plant Mini-kit (Qiagen, Hilden, Germany) from a fresh material from two European locality collected in silica gel bags. One branched individual was ground with silica beds in a MiniBead-Beater homogenizer for 50 s and subsequently processed according to the manufacturer protocol. DNA quantity was estimated using the Qubit fluorometer and Qubit™ dsDNA BR Assay Kit (Invitrogen, Carsbad, NM, USA). DNA quality was checked by electrophoresis in 0.5% agarose gel stained with Euryx Simple Safe (Eurx, Gdańsk, Poland).
The genomic libraries were constructed with Qiagen FX DNA kit (Qiagen) and were sequenced using HiSeqX (Illumina) to generate 150 bp paired-end reads at Macrogen Inc. (Seoul, Korea) with 350 bp insert in size between paired-ends. Afterwards, sequencing reads were cleaned by removing the adaptor sequences and low quality reads with Trimmomatic v0.36 [19]. Trimmed reads were assembled using NOVOplasty 4.2 [20] with previously published Nowellia curvifolia (MK230952) mitochondrial, and Calypogeia integristipula chloroplast (MK293997) reference genomes, respectively. The GeSeq software was used for annotation of organellar genomes [21] and genome maps were drawn using OG-DRaw web server [22]. The junctions between chloroplast inverted repeats and single copy regions were visualized using IRscope software [23]. To assemble the third plastome of the individual used for MK230952 assembly, SSR8246023 archive was used. The specimens details, including GenBank accession numbers of newly assembled genomes are given in Supplementary Table S1.
As the backbone of phylogenetic analysis we used plastid CDS alignment from studies on Ptilidiales [24]. Existing dataset, which includes a set of plastid Nowellia curvifolia genes from Asian specimens, was expanded by adding CDS from three newly assembled plastomes (two from Europe and one form North America). To keep a congruence with a previous study, we applied Maximum Likelihood and Bayesian inference methods using RAxML [25] and MrBayes [26] with parameters used during studies on Ptilidiales [19].
The RNA-seq library of Nowelia curvifolia (Accession number SRR8202183) was mapped onto organellar genomes using Geneious 2020 software [27] and the alignment mismatches with variant frequency >0.50 were subsequently compared with the RNA editing sites predicted by PREPACT 3.12.0 [28].

The Characteristics of Nowellia curvifolia Plastome
Newly sequenced and assembled plastomes are 114,423-114,469 bp long and their structure is typical for other known liverwort genomes ( Figure 1). The overall length of Nowellia curvifolia plastome is the shortest among its closest relatives with known complete genome sequences: Scapania ampliata (NC_051002.1: 118,026 bp) and Douinia plicata (NC_051003.1: 118,797 bp).  The reduction of plastome length is splitted among all four regions (Supplementary Figure S1): detected IRs were 8086 bp (8870 bp and 9022 bp in Scapania, and Douinia, respectively), LSC is 79,341-79,293 bp (80,850 bp and 81,142 bp in Scapania and Douinia respectively) and SSC is 18,956 (19,436 bp and 19,611 bp in Scapania, and Douinia, respectively). Considering all known complete liverwort plastome sequences, Nowellia chloroplast genome is the third smallest one, with Cololejeunea lanciloba second (108,674 bp) and mycoheterotrophic Aneura mirabilis with the smallest genome among liverworts (108,007 bp). The biggest difference among these three species is the length of SSC, which is 13,974 bp and 14,268 bp in A. mirabilis and C. lanciloba respectively, in the latter caused by the deletion of the ndhF-ccsA region. The differences in LSC length are much smaller (77,553 bp in A. mirabilis and 78,204 bp in C. lanciloba), but the IRs of N. curvifolia plastome are the shortest among known liverwort chloroplast sequences, passing C. lanciloba by 15 bp.
The reduction of plastome length in leafy liverworts is often caused by deletions around the ycf 2 gene [10]. The region between psbA and psbD genes contains, besides the previously mentioned ycf 2 gene, four tRNAs: trnH-GUG, trnD-GUC, trnY-GUA and trnE-UCC, and in the case of Nowellia and Scapania this region is 1797 bp shorter in the former. Keeping in mind that the total difference in LSC length between these species is 1509 bp, the role of this region in plastome length reduction is leading.
As in the closest known relatives with a plastome sequence mentioned above, the plastome of N. curvifolia consists of 121 unique genes, including 81 protein-coding genes, 6 genes of unknown function (ycf genes), 4 ribosomal RNAs and 30 transfer RNAs (Supplementary Table S2). The gene order and content seem to be stable in the most of leafy and simple thalloid liverworts [4,5,10,18,29,30], with some exceptional gene deletions as in the mentioned above Cololejeunea lanciloba [4] or mycoheterotrophic A. mirabilis [8].

The Characteristics of Nowellia curvifolia Mitogenome
The newly sequenced Nowelia curvifolia mitogenomes are 148,199 bp long ( Figure 2) and are slightly longer than the closest related species with a known mitogenome structure: Scapania ampliata (143,664 bp) and Scapania ornithopodioides (142,992 bp). Both mitogenomes contain a typical liverwort set of genes, including 42 protein-coding genes, 25 tRNA and three rRNA genes. The gene order and intron content is the same as in Calypogeia species, but the closest relatives with known mitogenome sequences -representatives of the genus Scapania (S. ampliata and S. ornithopodioides) lack intron 8 (cox1i1116g1) of cox1 gene ( Figure 3). The losses of introns in leafy liverworts lineage seem to be common and it is probably done via retroprocessing [3,11].
However, the loss of an intron alone does not explain the almost 5k bp-difference in mitogenome length between Nowellia and Scapania, since the size of the intron cox1i1116g1 ranges 1.6-1.8 kb in Jungermanniales. The reduction or increase of mitogenome length in leafy liverworts is often caused by deletions between genes nad3 and rpl10. This region contains the pseudogenized nad7 gene and four tRNAs: trnV-UAC, trnD-GUC, trnS-GCU and is a main source of mitogenome size variation among in Nowellia and its relatives. In Nowellia mitogenome the distance between nad3 and rpl10 gene is 17,620 bp and in Scapania species ranged from 8350 bp to 8364 bp. Moreover, the length of this region in Nowellia is even longer than in the genus Calypogeia (ca. 14.5 kb), which mitogenomes are significantly bigger with ca. 163 kb [11]. The described patterns are similar to those found in results of comparative analysis of Nowellia and its close relatives.  ornithopodioides (142,992 bp). Both mitogenomes contain a typical liverwort set of genes, including 42 protein-coding genes, 25 tRNA and three rRNA genes. The gene order and intron content is the same as in Calypogeia species, but the closest relatives with known mitogenome sequences -representatives of the genus Scapania (S. ampliata and S. ornithopodioides) lack intron 8 (cox1i1116g1) of cox1 gene (Figure 3.). The losses of introns in leafy liverworts lineage seem to be common and it is probably done via retroprocessing [3,11].  However, the loss of an intron alone does not explain the almost 5k bp-difference in mitogenome length between Nowellia and Scapania, since the size of the intron cox1i1116g1 ranges 1.6-1.8 kb in Jungermanniales. The reduction or increase of mitogenome length in leafy liverworts is often caused by deletions between genes nad3 and rpl10. This region contains the pseudogenized nad7 gene and four tRNAs: trnV-UAC, trnD-GUC, trnS-GCU and is a main source of mitogenome size variation among in Nowellia and its relatives. In

Intraspecific Variation of Nowellia curvifolia Organellar Genomes
The variation of organellar genomes among tested European samples was low. The mitogenomes and plastomes of both sequenced samples were almost identical and differed in only one SNP in the intergenic spacer atp1-cox1 (mitochondrial) and rps4-trnM-CAU (plastid). Newly sequenced mitogenomes were longer than mitogenome of North American sample available in Genbank (accession number MK230952) due to three indels: 2 bp long in trnN-GUU and trnP-UGG spacer, 42 bp long in spacer between rps14 and rps8, 1 bp long in intron 2 of cob gene. Only one substitution was found in atp1-cox1 spacer in the case of the NCU specimen.
Comparative analyses of European and North American plastomes revealed presence of 35 SNPs and one 41bp long indel between ndhJ and ndhK genes. Almost half of detected SNPs were found in 12 protein-coding genes (Supplementary Table S2), but with exception of ycf 2 (4 SNPs), rps7 (2 SNPs) and chiL (2 SNPs) remaining genes differed by single substitution. Since no Asian complete plastome was available, the comparative analysis of CDS revealed presence of 1335 SNPs unique for this sample (Supplementary Table S2). Top-five most variable genes include rpoB (98 SNPs), ycf 2 (81 SNPs), ycf 1 (51), ndhD (51), ndhF (46) and ndhB (46). Four of these genes (ycf 1, ycf 2, ndhB, ndhD) were among most variable plastid genes of the leafy liverwort genus Calypogeia [5] and ycf 1 and ycf 2 were the most variable genes of simple thalloid Aneura pinguis complex [18]. Both ycf genes are often considered as promising barcodes in many taxa of vascular plants [31][32][33][34]. The phylogenetic position of Asian specimens reflects their divergence from European and North American specimens (Figure 4), but it does not impact a general tree layout as it is congruent with previously published one [24].
The comparative data for genus or species levels are very limited, since most sequenced genera consist of genome sequence of single species [4,10,29,30]. The number of SNPs between closely related Calypogeia species (C. muelleriana and C. integristipula) detected for the whole genome was one third of the amount detected for CDS for Nowellia specimens, and the intraspecific differentiation was limited to few SNPs, however all the samples had European origin [5]. Lower genetic differentiation was also detected at interand intraspecific level of complex thalloid liverwort genera Marchantia and Conocephalum [9], but this lineage is known from low genetic differentiation and slow evolutionary rate of organellar genomes [35]. The level of CDS polymorphism found among remaining Asian specimens is similar to those found among cryptic lineages of A. pinguis complex [18,36,37].
The numbers of detected SNPs reflect the genetic identity which amounted to 99.98% among European and North American samples, and 97.49% among Asian and European/North American ones. This value is rather characteristic for interspecific values, as mean genetic identity among ten Calypogeia species was 97.50% and 99.81% at infraspecific level [5]. Similar genetic identity were found among complex thalloid Conocephalum species with 97.67% at interspecific and 99.98% at infraspecific levels [9]. The high genetic differentiation could be explained by population origin or could suggest a cryptic speciation. The morphological differentiation in easily delimited liverwort taxa is often overlooked, especially if well-established features, like characteristic macromorphology or unique micromorphological characters, are present. With very characteristic morphology, N. curvifolia falls within this group of species. Detailed studies based on wide material collection often lead to description of new species [16,38].
Intraspecific differentiation of N. curvifolia was not studied using morphological or molecular methods, but genetic distances between European and Asian samples suggest presence of geographical races or cryptic species, which are known in liverworts [17,36,39]. Application of integrative taxonomy approach enabled description of new species within genera like Conocephalum [15], Calypogeia [16] and Marsupella [40]. However in some cryptic species morphological characters are too blurry despite strong molecular, physiological and environmental differentiation, but this concerns mainly simple thalloids [37,41,42]. The phylogenetic position of Asian specimens reflects their divergence from European and North American specimens (Figure 4), but it does not impact a general tree layout as it is congruent with previously published one [24]. The comparative data for genus or species levels are very limited, since most sequenced genera consist of genome sequence of single species [4,10,29,30]. The number of SNPs between closely related Calypogeia species (C. muelleriana and C. integristipula) detected for the whole genome was one third of the amount detected for CDS for Nowellia specimens, and the intraspecific differentiation was limited to few SNPs, however all the samples had European origin [5]. Lower genetic differentiation was also detected at interand intraspecific level of complex thalloid liverwort genera Marchantia and Conocephalum [9], but this lineage is known from low genetic differentiation and slow evolutionary rate of organellar genomes [35]. The level of CDS polymorphism found among remaining

RNA Editing in Organellar Genomes
RNA editing alters the identity of nucleotides in an RNA sequence so that the mature transcript differs from the template defined in the genome. This process has been observed in chloroplasts and mitochondria of both seed and early land plants, including liverworts. In plant organellar genomes two types of RNA edition are known: typical C to U and reverse U to C [43].
The RNA editing in liverwort plastomes is poorly explored, especially with direct use of RNA-seq data. In plastome of Nowellia curvifolia 17 RNA editing sites were confirmed by RNA sequencing (Table S3). Single nucleotide edition was confirmed for dehydrogenases (ndhB, ndhC, ndhF, ndhG, ndhI), genes of photosystem I and II (psaI, psbH, psbZ) and other like: rpoB, petB, ycf 4, ycf 66. Two edited nucleotides were found in genes petD and atpI, and one edition in the latter is obligatory to create a start codon. Among 16 typical C-U editing events, that are common in plant plastomes and mitogenomes, the reverse edition Diversity 2021, 13, 81 8 of 10 U-C was found in the plastid chiL gene. The reverse U-to-C RNA editing appears to be restricted to hornworts, some lycophytes, and ferns [44], but has not been confirmed in liverwort plastomes up to the date. The robust reverse editing was predicted for plastome of Gymnomitrion concinnatum [10], but in silico prediction tools seem to be incorrect in the case of U to C ending in the early land plants [2].
The RNA editing within mitochondrial transcripts was more frequent with 127 edited sites along 23 genes, but all of them were canonical C-U events (Supplementary Table S3). Among most edited were genes ccmC (15), cox1 (10), atp6, ccmB and ccmFN (9). The disproportion between editing events in plastome and mitogenome is bigger than an average for liverworts, where usually RNA edition is three times more frequent in mitogenome than plastome, but in few species from genera Bazzania, Lepidozia and Odonthoschizma edition in plastid transcript is more frequent than in mitochondrial transcript [45]. Overall RNA editing frequency of Nowellia curvifolia organellar genes is lower than the average for Jungermanniales (210).

Conclusions
Recently published papers significantly expand our knowledge of liverworts organellar genomes, but their infra-specific variation remained almost unexplored. The newly assembled plastomes of Nowellia curvifolia revealed the smallest size among species with a complete set of genes and the shortest IRs among all liverworts. However, the size reduction was not equally distributed along the genome, but mostly focused in the region between psbA and psbD genes. The comparative analysis of chloroplast genes enables identification of significant amounts of SNPs between American, European and Asian specimens suggesting advanced cryptic speciation in this morphologically defined species. However, further studies are required to better understand this process in Nowellia, including wider sampling and morphological studies with special focus on Asian populations. Detailed analysis of RNA-editing revealed one example of non-canonical, reverse editing in chloroplast chiL gene, which was not reported in liverworts to date. The previous studies reported the possibility of reverse editing, but RNA-seq datasets didn't confirm U to C edition in plastid genomes of liverworts, which is quite common in other non-seed plants lineages.