Comparative Analysis of Apopellia endiviifolia Plastomes Reveals a Strikingly High Level of Differentiation between Its Terrestrial and Water Form

Comparative Analysis of Apopellia endiviifolia Reveals a Strikingly High Level Abstract: The simple thalloid liverwort Apopellia endiviifolia is a widespread Holarctic species belonging to the family Pelliaceae. European populations of this species comprise two distinct evolutionary lineages named “species A”, known also as water form, and typical, mainly terrestrial forms named “species B”. Newly sequenced, assembled and annotated chloroplast genomes of six European specimens belonging to the two cryptic lineages occupying different microhabitats, revealed the structure typical for liverworts and previously sequenced reference. The plastomes of A. endiviifolia are 120,537–120,947 bp long with a structure typical for most plants, including a pair of IR regions (each of 9092–9207 bp) separated by LSC (82,506–82,609 bp) and SSC (19,854–19,924 bp) regions and consist of 121 unique genes, including 81 protein-coding genes, 6 genes of unknown function ( ycf genes), 4 ribosomal RNAs and 30 transfer RNAs. Comparative analysis of typical, terrestrial and water forms revealed 4971 molecular diagnostic characters (MDCs), which exceeds numbers found in many well recognized liverworts taxa. Moreover, beside the presence of evolutionary hotspots like ycf 1 and ycf 2 genes and several intergenic spacer like ndh B -psb M, rps 4 -ndh J and ndh C -atp E, the molecular identiﬁcation of Apopellia cryptic species was possible by almost 98% of 500 bp long frames simulating mini barcodes. The different ecological niches can be driven by different pressures of positive selection, which was detected in nine genes including ccs A, ndh D, ndh F, pet A, psb B, psb C, rpo B, ycf 1 and ycf 2. Despite clearly genetic differences and ecological preferences, the current obser-vation of morphological differentiation does not no allow to separate terrestrial and water forms into taxonomic species.


Introduction
The cryptic speciation is well known to occur in all major lineages of liverworts [1][2][3][4][5][6]. In recent years integrative taxonomy approach led to description of several species of complex thalloids and leafy liverworts, previously considered as cryptic lineages distinguishable only on the basis of molecular markers [3,7].
The simple thalloid liverwort genus Pellia Raddi sensu lato is a widespread Holarctic taxon belonging to the family Pelliaceae, defined by several morphological characters. Thallus of Pellia s.l. is up to 1 cm wide, dichotomously branched with a poorly defined midrib. The study of cryptic speciation of genus Pellia s.l. has a long history, with the first report published at the beginning of the 20th century [8]. Popularization of the isoenzyme electrophoresis methods enabled characterisation of cryptic evolutionary lineages within Pellia epiphylla (L.) Corda. and P. endiviifolia (Dicks.) Dumort. species [9,10], which were later confirmed by various markers based on direct DNA analyses [11][12][13]. Recently, an integrative approach based on morphology and molecular dataset enabled the split of the genus Pellia into Apopellia and Pellia sensu stricto [14]. The former comprises three taxonomic species, A. apicola, A. megaspora and A. endiviifolia, with different patterns of distribution. The European population of Apopellia is diverging into two evolutionary lineages, which, besides molecular characters, differ in types of microhabitats [8,9,12].
The plastid genomes are the most frequently used molecules in evolutionary studies on plants, including phylogenomics, population genetics and phylogeography [15,16]. Complete plastome sequences serve as superbarcodes enabling identification even recently divergent species [17,18] where typical single and multilocus barcodes fail. In opposition to vascular plants, infraspecific plastome diversity of liverworts are yet poorly explored.
Liverworts plastomes, as well as well-studied angiosperm plastomes, exhibit two copies of inverted repeats (IRa and IRb) separated by two single copy regions: the large single copy (LSC) and the small single copy (SSC). In contrast to vascular plants, where plastome structure variation is observed at the family or even at genus level [19,20], in liverworts the gene content and gene order in the plastome is conserved in all main liverwort lineages, including complex thalloid, simple thalloid, and leafy species [21][22][23], despite it being millions of years since their divergence [24].
Chloroplast genome of Apopellia endiviifolia was among the first sequenced liverworts plastomes [25]; however, up to date, detailed comparative study of plastomes of cryptic lineages within it was not performed. The recent advances in the next generation sequencing provided better datasets to study cryptic speciation of liverworts [26][27][28] and enabled detailed view into plastomes divergence between cryptic evolutionary lineages.
In this study we sequenced and analyzed six complete chloroplast genomes of two cryptic species of Apopellia endiviifolia with contrasting habitat preferences to determine infra and interspecific differentiation of plastomes. The potential application of chloroplast genomes as superbarcodes to identification of cryptic species within the A. endiviifolia complex was analyzed using different approaches allowing reliable delimitation of water and terrestrial forms.

Materials and Methods
DNA was isolated by the modified CTAB procedure. The liquid nitrogen-ground thalii were thoroughly mixed with 3 mL preheated CTAB isolation buffer (2% CTAB, 100 mM Tris-HCl, pH 8.0, 20 mM EDTA, 1.4 M NaCl and 2% β-mercaptoethanol) and incubated at 55 • C for 1 h. After three chloroform extractions, the DNA was precipitated and dissolved in sterile, deionized H 2 O. The purity of DNA samples was assessed spectrophotometrically and it reached 90-95%. DNA quantity was estimated using the Qubit fluorometer and Qubit™ dsDNA BR Assay Kit (Invitrogen, Carsbad, NM, USA). DNA quality was checked by electrophoresis in 0.5% agarose gel stained with Euryx Simple Safe (Eurx, Gdańsk, Poland). The extracted DNA prior to long-read sequencing was carefully examined and additionally cleaned using the Genomic DNA Clean and Concentrator kit (Zymo, Irvine, CA, USA).
The preliminary delimitation based on ecological preferences and macromorphological features was confirmed using ISJ2 (intron-exon splice junction) markers, which clearly separated both cryptic lineages [11]. The PCR reaction details and primer sequences were identical to previously published [29].
The genomic libraries were constructed with Qiagen FX DNA kit (Qiagen, Hilden, Germany) and were sequenced using HiSeqX (Illumina, San Diego, CA, USA) to generate 150 bp paired-end reads at Macrogen Inc. (Seoul, Korea) with 350 bp insert in size between paired-ends. Afterwards, sequencing reads were cleaned by removing the adaptor sequences and low-quality reads with Trimmomatic v0.36 [30]. The long-read libraries were constructed using Ligation Sequencing Kit SQK-LSK109 (Oxford Nanopore Technologies, Oxford, UK) and NEBNext ® Companion Module for Oxford Nanopore Technologies ® Ligation Sequencing (New England Biolabs, Ipswich, MA, USA) according to manufac-turer's protocol and sequenced using MinION MK1B portable device (ONT, Oxford, UK) and R.9.4.1 Flow Cell (ONT). The Flow Cell was prepared for sequencing with the Flow Cell Priming Kit EXP-FLP002 (ONT). Sequence reads were basecalled using high-accuracy guppy basecalling on the MinKNOW platform. The Apopellia chloroplast genomes were assembled using NOVOplasty [31] using previously sequenced A. endiviifolia chloroplast genome (NC_019628) as reference. The GeSeq 2.03 software was used for annotation [32] and genome maps were drawn using OGDRaw v1.1.1 web server [33]. The specimen details, including GenBank accession numbers of newly assembled genomes are given in Supplementary Table S1. The raw reads of all used libraries were submitted to the National Center for Biotechnology Information (NCBI, Bethesda, MD, USA) under Sequence Read Archive (SRA) number PRJNA785811.
The complete chloroplast genomes of Apopellia endiviifolia from this study and previously sequenced plastome from GenBank were compared using mVISTA 1.4 [34] with the LAGAN 2.0 alignment program [35]. NC_019628 accession (typical form) used as a reference.
Comparative analysis of chloroplast genomes was carried out in the Spider 1.1-5 program [36] based on inter-and intraspecific distances that were calculated using the Kimura 2-parameter model (K2P) of nucleotide substitution. Barcoding analyzes of entire Apopellia chloroplast genomes and their 500 bp-long fragments generated by sliding window were made in Spider. The sliding window nucleotide diversity analysis was performed using TASSEL 5.0 software [37]. Ten most variable regions were extracted, realigned using MUSCLE v5 [38] and analyzed using TASSEL 5.0.
The discrete molecular diagnostic characters (MDCs) for each species were calculated according to the Jörger and Schrödl [39] approach using FASTACHAR 0.2.3 software [40].
The Poisson Tree Processes (PTP) method was applied to delimitate species boundaries [41]. The analysis was performed using rooted ML (Maximum likelihood) and BI (Bayesian inference) trees, the MCMC algorithm was run for 1,000,000 generations, with 100 thinning and 0.2 burn-in.
As the backbone of phylogenetic analysis, we used plastid CDS (coding DNA sequence) alignment from previous study [23]. Existing dataset, which includes a set of plastid genes, was expanded by adding CDS from six newly assembled plastomes of Apopellia. To keep a congruence with a previous study, we applied Maximum Likelihood and Bayesian inference methods using RAxML 8.0 [42] and MrBayes 3.2 [43] with parameters used by Yu et al. [23].
Mixed Effects Model of Evolution (MEME) analysis was used to detect individual sites evolving under positive selection in a proportion of branches [44]. The BUSTED algorithm [45] was used to identify episodic positive selection in cryptic lineages. To test branch-specific positive selection, we used the aBSREL algorithm [46] which uses a model selection procedure for each branch to determine the proportion of sites per branch that have undergone positive selection. All the three tests were performed using Datamonkey server [47] with default settings.

Characteristics of Newly Sequenced Apopellia Chloroplast Genomes
Liverwort plastid genomes are known for their conserved structure and results obtained in this study confirm this observation. The Apopellia plastome was among the first sequenced liverwort plastomes [25] and the newly sequenced Apopellia specimens have identical gene order and content. The plastomes of A. endiviifolia are 120,537-120,947 bp long with a structure typical for most plants, including a pair of IR regions (each of 9092-9207 bp) separated by LSC (82,506-82,609 bp) and SSC (19,,924 bp) regions ( Figure 1). As in the most known liverworts plastome sequences [23], plastomes of Apopellia consist of 121 unique genes, including 81 protein-coding genes, 6 genes of unknown function (ycf genes), 4 ribosomal RNAs and 30 transfer RNAs. The most significant change in the water form of Apopellia is a length reduction of cysT and ycf 66 genes by 37 and 6 amino acids, respectively, due to the presence of termination codons. The total plastome size of a typical A. endiviifolia form ranged from 120,537 to 120,546 bp, while water forms fall within 120,944-120,946 bp range. The gene order and -content seem to be stable in liverworts, but some gene losses are scattered along different evolutionary lineages including loss of cysA and cysT genes in Ptilidium pulcherrimum and Cheilolejeunea xanthocarpa [21,23] and additional four genes (ndhF, rpl21, rps32 and ccsA) in Cololejeunea lanciloba [23]. Single genes are also missing in Aneura pinguis (ycf 66), Haplomitrium blumei (psaM) and Schistochila macrodonta (ndhF) [23,26]. Heterotrophic Aneura mirabilis has a reduced plastome due to the loss of six genes (ndhA-E-G-K-I and ycf 66) or pseudogenization (19 genes) of genes involved in the process of photosynthesis [48]. Two liverworts species increase the number of plastid genes by transferring them to IRs: Haplomitrium blumei transfered ndhF gene from SSC [23], while Conocephalum salebrosum transfered rps7 and rps12 from LSC [27].

Barcoding and Molecular Identification of Apopellia Cryptic Species
Application of organellar genomes as superbarcodes enabled identification of both forms of A. endiviifolia. Using the barcoding gap method, molecular delimitation was possible using 2375 out of 2427 generated 500-bp long frames. The mean number of diagnostic nucleotides was 20.7 per 500 bp, which is 4.41% of nucleotides. Number of diagnostic nucleotides per frame ranged from 0 to 51 in IR regions and from 1 to 144 in hotspots of single copy regions (Figure 2). Comparative analysis using the mVista program revealed that most of the variation among A. endiviifolia cryptic species was found in intergenic regions of LSC and SSC parts of the plastome (Supplementary Figure S1). Several intergenic spacers like ndhB-psbM, rps4-ndhJ and ndhC-atpE could be identified as hotspots of interspecific variation (Figure 3, Supplementarty Table S3). Among coding regions, ycf 1 and ycf 2 were revealed as the most variable which is congruent with data obtained from other liverworts genera like Aneura [26] and Calypogeia [49]. Ycf 1 and ycf 2 are also considered as promising barcodes in vascular plants [50,51].   The biggest interspecific variation within IRs was found in the fragment flanking SSC region.
Whole plastome analysis (excluding one copy of IR) revealed 4971 molecular diagnostic characters (MDCs) enabling molecular delimitation of typical and water forms, which comprised 4.4% of the plastome. The presence of significant numbers of MDCs is crucial for splitting and description of new species [52]. Low variation at the infraspecific level resulted in only six potential MDCs as defined in CAOS [53] and QUIDDICH [54]. The number of MDCs that differentiate cryptic species of A. endiviifolia is much higher than identified for good taxonomic species of leafy liverwort genus Calypogeia where chloroplast MDCs ranged from 159 to 1369 [49]. The lower number of MDCs was also identified for species of complex thalloid liverwort genera Marchantia and Conocephalum [27]. The liverwort model species Marchantia polymorpha and M. palaceae revealed 4076 MDCs, while recently described Conocephalum salebrosum [3] and C. conicum were characterized by 5878 MDCs.
The plastid superbarcoding approach is getting more attention as an effective tool for molecular delimitation of recently evolved species [17,19,[55][56][57]. With the genomics resources presented in this study, the researchers can identify cryptic species within Apopellia endiviifolia using any primers targeting LSC or SSC region, without need ordering special primes for this purpose.
Application of superbarcodes in bryophytes is poorly explored. Recent studies apply plastid superbarcodes for species identification of genus Calypogeia [49] and Conocephalum [27] and prove their usefulness in effective distinguishing even problematic, allopolyploid taxa.
Genetic distinctiveness of water and typical forms was also confirmed by PTP analysis. The fundamental assumption of PTP analysis is that the number of substitutions between species is significantly higher than the number of substitutions within species. The results obtained using this approach clearly support both forms as separated species with support varying from 0.509 to 0.522 for typical and water forms respectively. In comparison, species support in complex thalloid liverworts clade was often twice lower in the case of Riccia (0.188), Dumortiera (0.207), Reboulia (0.211) or Conocephalum (0.229) and similar to those found in the leafy liverworts like Nowellia (0.588), Delavayella (0.570) or Herbertus (0.535).
Obtained results based on complete plastome sequences are congruent with previous studies. The identification of cryptic species within Apopellia endiviifolia beginning in the 1980s using isoenzyme markers [10]. The isoenzymes of peroxidases, esterases and mhd enable molecular delimitation of A. endiviifolia cryptic lineages. The presence of cryptic speciation was also confirmed using various genome scanning DNA markers, like RAPD and ISJ [11,12]. These studies, based on a genetic similarity approach, revealed values between 0.2 and 0.5, which characterize separate biological species [29,58]. Later studies using rps4, rpl16 and trnL-trnF [14] confirmed monophyly of these Apopelia lineages.

Ecological, Geographical and Morphological Differentiation
The low complexity of simple thalloid liverworts reduces the potential number of morphological diagnostic characters, which are essential for taxonomic description of novel species, including separation of cryptic species lineages. Thalli of typical form usually show characteristic dichotomically branched outgrows on the thallus apex, which are absent in water forms [59]. However, these characteristics outgrows can appear only periodically, and do not form in in vitro conditions [11]. Morphological studies on A. endiviifolia revealed significant differences between land and water forms in quantitative treats like length of basal and marginal cells, even after six months of in vitro culture [11], but those quantitative characters are not preferred as main diagnostics features. Described as Apopellia megaspora in the early 1980s, one of the North American A. endiviifolia cryptic species beside quantitative differences of spore diameters differs from the remaining Apopellia species by lacking thallus proliferation in autumn [60]. The better-studied cryptic species of another simple thalloid liverwort, Aneura pinguis, despite the presence of various lineage-specific DNA markers [4,26,61] and many biochemical compounds [62], still lacks significant morpho-anatomical differentiation. Moreover, the cryptic lineages Apopellia endiviifolia and some of Aneura pinguis differ in the type of preferred microhabitat, which could be defined not only by general description [10,14,61], but also by specific parameters, like pH and concentration of Ca, Mg, K, and Na [63]. The earlier studies suggest that water form (named species A) is known mainly from mountains, with very restricted occurrence in the lowlands, which correspond to preferred microhabitats distribution [59]. The sites with clear and fast running shallow water, which this lineage depends on, are more common in the mountains. The geographic distribution of the typical form (named species B) is concentrated in lowlands, with rare occurrences in mountains. Typical form is independent of running water and prefers denuded soil and rock detritus [59].

Apopellia Chloroplast Genomes Variation at Infraspecific Level
The comparative analysis of plastomes revealed very low variation at cryptic species level (Supplementary Table S2). Three sequenced individuals of water form differ only by two substitutions and three indels found in intergenic regions (Supplementary Table S2). Among four accessions of typical form 21 substitutions and eight indels were found, including 15 in CDSs (rpoA, petA, psbJ, psbB, ndhF, ccsA and ycf 1). Most of them (10) were concentrated in the ccsA-ndhD region of SSC and had an impact on the length of both genes. Earlier studies on genus Apopellia revealed infraspecific variation in rps4, rpl16 and trnL-trnF plastid regions; however, both cryptic species (as genetic type EU and genetic type A1, for typical and water form respectively) were analyzed as one [14]. Considering water and typical forms as biological species [10] the potential of using plastome sequences for infraspecific studies like population genetics and phylogeography is very limited. In the studied dataset only the ccsA-ndhD region revealed variation big enough to identify every specimen of the typical form; however, this region was identical in all the three individuals of water form (Supplementary Figure S1).
Despite recent advances in sequencing and assembly liverwort plastomes [22,23,26,49,64,65] the information about intraspecific variation of these molecules is still limited and only few species, excluding cryptic species of Aneura pinguis [26] and Nowellia curvifolia [28], have two or more complete chloroplast genome sequences available. However, based on the results of this and previous studies, the liverworts plastome variation at intraspecific level seem to be rather low [27,49,66] in comparison to vascular plants [19,67,68]. The specimens of model species M. polymorpha ssp. ruderalis from Korea and Japan differed by only four SNPs [66] and the Riccia fluitans from Korea and Poland differed by 59 SNPs [27]. The variation of plastomes seems to be inconsistent even at the generic level as in Conocephalum, where samples of C. salebrosum were differentiated by a single SNP, while in C. conicum number of detected SNPs was 32 in samples from Central Europe [27].
Since these four species are complex thalloid liverworts, which are considered a slowly evolving group [24], the infraspecific variation of simple thalloid and leafy was expected to be higher. However, the comparative analyses of European Calypogeia species revealed a low level of intraspecific variation, limited to few SNPs [49]. Analysis of the complete chloroplast genomes of leafy Nowellia curvifolia from Europe revealed only one SNP, but the European and North American specimens differed by 35 SNPs [28]. Considering currently available complete genome sequences of liverworts, low interspecific differentiation of complex thalloid plastomes does not reflect infraspecific variation, which seems to be comparable to simple thalloid and leafy liverworts plastomes.

Genes Undergoing Positive Selection
Applied tests revealed positive selection in nine genes (ccsA, ndhD, ndhF, petA, psbB, psbC, rpoB, ycf 1 and ycf 2). For most of these genes only a single site under episodic diversifying selection was revealed, except for rpoB where two sites were identified. Among these genes, branch-site analysis (aBSREL) confirmed selection only in ccsA and ycf 2. The former was the only gene under selection according to the gene-wide test (BUSTED). The positive selection at genus level is poorly explored in liverworts. The ccsA, ndhF, petA and ycf 1 genes revealed signatures of positive selection in Aneura pinguis [26]. More evidence of positive selection at genus level could be found in wider studied vascular plants. Both ycf genes were detected under positive selection in genus Fagopyrum [69], Holcoglossum [70] and Salix [71], while ycf 1 in Caragana [51]. The ccsA gene, which was pointed out as undergoing positive selection by all three tests, plays an important role in cytochrome biogenesis [72]. Earlier studies revealed its potential role in adaptation to different habitats in the genus Cardamine [73].

Phylogeny
Phylogenetic position of Pellideae is congruent with previous studies based on plastid datasets [23,27]. Both forms of A. endiviifolia form monophyletic, well supported clade sister to Pallavicinia longispina (Figure 4, Supplementary Figure S2). This clade contains two, well supported (1.00 PP and 100% BS) subclades for each of the forms. Additionally, inside the clade ML analysis grouped typical forms specimens from Beskydy Mts and Germany were distinguished with a statistically supported clade (96% BS), which is congruent with the variation pattern revealed within this group (Supplementary Table S2). However, Bayesian Inference analysis resolved this clade as not statistically supported (0.77 PP). ML analysis also clustered together typical forms of A. endiviifolia from Tatra Mts and Pomeranian (Supplementary Figure S2) with moderate support (86% BS), but BI does not distinguish this clade (Figure 4).

Conclusions
Application of plastid superbarcodes for molecular delimitation of Apopelia cryptic species clearly separates two evolutionary lineages. The number of detected MDCs revealed genetic divergence at a level which in many cases exceeds values known from well-established taxonomic liverworts species. However, the base to comparison is rather low, since most complete plastome sequences of liverworts concern only single representatives of the species. Besides detected hotspots of variation, almost every 500 bp-long frame enabled identification of water and terrestrial forms. Recently published studies significantly expand our knowledge of liverworts chloroplast genomes, but their infraspecific variation remained still unexplored and was limited to five genera. Two cryptic species of Apopellia are confirmed to have low intraspecific plastome variation, which was especially low in the water form. Performed analysis revealed positive selection in nine out of 87 protein coding genes, but all the three tests were congruent only in the case of ccsA, which seems to play an important role in adaptation to different habitats also in other species.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/10 .3390/d13120674/s1, Figure S1: Comparative analysis of Apopellia plastomes using mVista, Figure S2: Phylogenetics relationships of liverworts based on protein-coding sequences obtained using ML analysis. Bootstrap values lower than 100 are given at the nodes. Table S1: Specimens and chloroplast genomes details, Table S2: Infraspecific variation of typical and water forms of A. endiviifolia, Table S3: Ten most variable loci based on comparative plastome analysis of Apopellia endiviifolia forms.