Chemical Fingerprinting of Cryptic Species and Genetic Lineages of Aneura pinguis (L.) Dumort. (Marchantiophyta, Metzgeriidae)

Aneura pinguis (L.) Dumort. is a representative of the simple thalloid liverworts, one of the three main types of liverwort gametophytes. According to classical taxonomy, A. pinguis represents one morphologically variable species; however, genetic data reveal that this species is a complex consisting of 10 cryptic species (named by letters from A to J), of which four are further subdivided into two or three evolutionary lineages. The objective of this work was to develop an efficient method for the characterisation of plant material using marker compounds. The volatile chemical constituents of cryptic species within the liverwort A. pinguis were analysed by GC-MS. The compounds were isolated from plant material using the HS-SPME technique. Of the 66 compounds examined, 40 were identified. Of these 40 compounds, nine were selected for use as marker compounds of individual cryptic species of A. pinguis. A guide was then developed that clarified how these markers could be used for the rapid identification of the genetic lineages of A. pinguis. Multivariate statistical analyses (principal component and cluster analysis) revealed that the chemical compounds in A. pinguis made it possible to distinguish individual cryptic species (including genetic lineages), with the exception of cryptic species G and H. The classification of samples based on the volatile compounds by cluster analysis reflected phylogenetic relationships between cryptic species and genetic lineages of A. pinguis revealed based on molecular data.


Introduction
The genus Aneura Dumort. (Metzgeriidae) is a representative of the liverworts, a phylum of small terrestrial plants that has played an important role in the evolution of early land plants. Liverworts were among the first plants to colonize land [1,2], and they have been documented in the fossil record as early as 470 million years ago [3,4]. Today, liverworts are an important component of many terrestrial ecosystems [5]. Liverworts are also the source of a large number of biologically active compounds, such as terpenoids and aromatic compounds, synthesized and accumulated in oil bodies [6][7][8]. Many of these compounds occur only in liverworts and exhibit interesting biological activities [9][10][11][12][13][14]. These compounds are also a valuable source of markers for resolving taxonomic problems and identifying individual species [15][16][17][18][19][20]. Revealing differences in the composition of chemical compounds between closely related species is important not only for the identification of species but also for the clarification of evolutionary relationships between them [19,[21][22][23].

Chemical Profile
Thirty seven samples of A. pinguis belonging to seven different genetic lineages were subjected to chemical analyses. Volatile compounds were determined for the following species and genetic lineages (C1, C2, E1, E2, G, H and I) recognized based on molecular studies [33]. Details on locality and collection time of the samples are shown in Table  1. In the multivariate statistical analysis, results obtained in the present studies were compared with all remaining genetic lineages from previous research on A. pinguis [34,35]. As in previous studies, a small amount of biological material that could be collected for individual genetic lineages of A. pinguis from a given habitat in different vegetation seasons was determined using the HS-SPME technique. Results of the chemical analyses are shown in Tables 2-5. The tables also contain compounds of no. 9, 11, 57 and 58 provided in a previous study [35]. These data facilitate comparisons of the current results with those previously described.  ----------8  Camphene  953  ----------9  Benzaldehyde  970  ----------10 β  Cuparene Longicamphenylone Deoxopinguisone 1563  Longiverbenone Dihydrobryopterin          To determine whether the chemical compounds can act as chemotaxonomic markers permitting the determination of all cryptic species of A. pinguis, the data obtained for samples in current studies (C1, C2, E1, E2, G, H, I) were compared in multivariate statistical analyses with data on the remaining genetic lineages of A. pinguis previously published [35]. Two classification methods were used: principal component analysis (PCA) and cluster analysis (CA). Both analyses were performed using all 66 volatile compounds detected in the examined A. pinguis samples.
In the cluster analysis, the classification of samples based on the volatile compounds largely reflected the phylogenetic relationships between cryptic species and genetic lineages of A. pinguis proposed by Bączkiewicz et al. [33] using molecular data. The chemical compounds found in A. pinguis make it possible to distinguish the vast majority of cryptic species and genetic lineages of the species (Figure 1). On the dendrogram constructed based on the Euclidean distance and Ward's linkage method, 11 clusters can be recognized based on the Mojena index, which basically matched with the genetic lineages of A. pinguis, with the exception of two cryptic species: G and H, which form one cluster. The analysed samples of A. pinguis were grouped into two large clusters: the first cluster included species B, C, F, G, H, I and lineage E2, and the second cluster included A and lineage E1. All genetic units analysed in the present study (C1, C2, E1, E2, G, H and I) differed from the other species of A. pinguis (A, B and F) in the composition of chemical compounds, accounting for the fact that species A and B are divided into the genetic lineages A1, A2, A3 and B1, B2, B3, respectively. Samples of species C form two closely related clusters corresponding to the genetic lineages C1 and C2 based on molecular studies [33], however, these clusters differ to a lesser extent, like B2, B3, thus, on the basis of the Mojena's index, they do not form separate clusters. In contrast, the two genetically closely related lines of species E (E1 and E2) show a high degree of distinctiveness in their chemical compounds [33]. Samples of lineage E1 formed a distinct group that was placed in one cluster with species A, which was similar to results of genetic studies; however, samples of E2 were in a second cluster that included species B, C, F, G, H, and I. The detected chemical compounds poorly differentiate between species G and H, whose distinction is supported by molecular studies [33].   To verify the patterns in diversity observed in the cluster analysis and express these data more simply, a PCA was conducted on the volatile compound data. PCA on the 66 compounds produced eight significant principal components (PCs) that explained 98.2% of the variation (R2X) and 75.4% the predicted variation (Q2). Figure 2 shows the threedimensional scatter plot of the first three principal components, which explained 91.9% of the total variance included in the analysed volatile compounds. The PCA of the examined samples of A. pinguis revealed the existence of 13 groups of samples corresponding to the genetic lineages identified by DNA markers, because species G and H could not be distinguished. The PC1 axis primarily divided species B (genotypes B1, B2, B3) and F from G, H, I, A3 and E1 based on the relative content of compound 52, which made the largest contribution in PC1. The PC2 axis separated lineage A1 and A2 from A3 and lineage B3 from B1, B2 and F, while the PC3 axis differentiated lineage A1 from A2 and lineages B1, B3 and F from I and E1, which was the most distinct lineage in the PCA score plot. In contrast, the E2 genotype did not show strong differentiation from A1 and A2 (Figure 2). In the PC2 axis, compound 40 made the largest contribution, while in the PC3 axis, compounds 60 and 61 made the largest contributions; thus, these compounds played the most important role in separating genotypes across PC2 and PC3. From both statistical analyses (CA and PCA) is evident, that most species and genetic lineages of A. pinguis can be distinguished on the basis of their chemical composition. The differentiation and the degree of affinity between them revealed on the basis of chemical studies is almost completely consistent with the results of molecular analyzes [33]. An interesting exception is species E, lineages E1 and E2, which in terms of DNA sequences are closely related and belong to one clade (genetic divergence is equal to 0.38%), showing great diversity in the composition of chemical compounds and form strongly distinct sets. On the other hand, for the pair of species G and H, we observed small differences in the composition of chemical compounds, despite a significant molecular difference (2.47%) between them [33]. The percentage of explained variance (R 2 X) was 81.4% for PC1, 6.2% for PC2, and 4.3% for PC3, and the predictive ability (Q 2 ) was 36.7%, 16.7%, and 19.9%, respectively. Species in this study were compared with the remaining species of A. pinguis examined in a previous study [35].
The 66 volatile compounds were determined based on chemical tests carried out on liverworts within A. pinguis, 40 of which were identified. The data collected in this paper and the work published by Wawrzyniak et al. [35] indicated that the composition of volatile compounds was primarily determined by genetics. Season and habitat appeared to have no effect on the chemical composition of compounds produced. The identified metabolites were compounds belonging to groups of oxygen derivatives of sesquiterpenes, sesquiterpenes, monoterpenes, aliphatic hydrocarbons and aromatic hydrocarbons. The observed differences in the composition of metabolites for individual A. pinguis genetic lineages were so salient that they could be used to identify biological material. Quite similar, Ludwiczuk et al. [21], demonstrated the use of volatile compounds as chemical markers for the identification of Conocephalum conicum cryptic species. The importance of the composition of volatile compounds as chemical markers is confirmed by the results of numerous studies of different liverwort species [8,[18][19][20][21][22][23].
To simplify identification via metabolite composition, nine compounds from the group of 66 compounds examined that best differentiated the analysed material were selected that permit plant material to be assigned to specific species and even genetic lineages. Data on the content of these compounds for individual genetic lineages and species in the radar charts are shown in Figure 3.

Guide for the Identification of Cryptic Species and Genetic Lineages
These results also allowed us to prepare a guide describing a procedure through which different genetic lineages could be identified based on the content of marker compounds. The first step for identifying material is to characterize the content of pinguisone. The content of this compound can be used to classify the tested material into four groups (below 20%, 20-50%, 50-70% and above 70%). The next step is to determine the content of dihydrobryopterin A, which identifies lineages A3 and E1. Other groups can be identified based on the content of deoxopinguisone (species I) and costunolide (lineages C1, C2, B1, and species G, H). The absence of the compound characterized by IR = 1934 permits the identification of E2. The content of the compound characterized by IR = 1989 permits the differentiation of A1 and A2. The identification of lineages B2, B3, C2, and species F is based on the content of compounds characterized by IR = 1533 and IR = 1633. As seen in Figure 4, species G and H cannot be distinguished based on their chemical composition. The quantitative differences in the detected compounds are obviously not enough for identification of species G or H (e.g., compound IR = 1533 in Table 5).

Plant Material
Plant material included 37 samples of Aneura pinguis (L.) Dumort. (Aneuraceae) collected in different regions of Poland (Table 1). Some of the analysed genotypes of A. pinguis have small thallus sizes and often grow in small colonies composed of only a few thalli [32]. Several selected samples were collected repeatedly from the same place in consecutive years and at different times of the year (Table 1). From each sample, a portion was dried and deposited as a voucher in the POZW Herbarium. The remaining portion of the sample was cleaned and used for analyses. Samples for DNA and chemical analyses were stored at −30 • C. Chemical analyses were performed on samples of A. pinguis identified to species and genetic lineage based on six DNA barcodes; sequences were deposited in GenBank [33]. These sequences can be obtained from GenBank using the POZW numbers provided in Table 1.

GC Analysis of Volatile Compounds
The analysis of the volatile compounds was performed using a previously described GC-MS method [34,35]. Conditions of sorption and desorption were optimized through selection of the type of stationary phases, coated fibres, the amount of biological material, time, and temperature. Frozen 10 mg samples of A. pinguis were placed in a screw-capped vial with a 1.7 mL silicone/Teflon membrane. Next, the vial was heated at 50 • C, and headspace solid-phase micro-extraction was carried out for 60 min. SPME-fused silica fibres coated with divinylbenzene/carboxen/polydimethylsiloxane (DVB/CAR/PDMS) stationary phases were used (Sigma-Aldrich, St. Louis, MO, USA). Desorption was performed at 250 • C for 10 min. The above operations were carried out by the autosampler TriPlus RSH (Thermo Fisher Scientific, Waltham, MA, USA). Compounds isolated from the biological material were analysed by a gas chromatograph Trace 1310 coupled with a mass spectrometer ISQ QD (Thermo Fisher Scientific). The mass selective detector was operated at 70 eV in the EI mode over the m/z range 30-550 and temperature transfer line 250 • C. The GC instrument was equipped with a capillary column coating with the silphenylene phase (Quadex 007-5MS: 30 m × 0.32 mm i.d., film thickness 0.25 µm; Bethany, CT, USA). Helium was used as the carrier gas at a flow rate of 1.0 mL/min. The oven temperature was programmed to change from 60 to 230 • C at 4 • C/min and then remain stable at 230 • C for 40 min. Each sample was analysed three times.
After compound determination, the desorption procedure and GC analysis were repeated one more time. The lack of any peaks indicated that the selected conditions allowed the efficiency of the desorption process to reach 100%. The constituents were identified by comparing their MS spectra with those from the literature, reference compounds, computer matching against the NIST 11, data obtained from NIST Chemistry WebBook databases, Mass Finder 4 library, Adams library databases and Pherobase databases [36][37][38][39]. The identification of compounds was verified by Kovats retention indices. Kovats retention indices were determined relative to a homologous series of n-alkanes (C8-C26) under the same operating conditions. The relative concentrations of the components were obtained by peak area normalization without applying correction factors. The GC-MS analysis was performed in a scan mode; compounds were recorded via a TIC (total ion chromatogram). Tables 2-5 show the data for compounds with concentrations greater than 0.01%. Recorded compounds below the 0.01% threshold were not included in the tables.

Statistical Analysis
Multivariate statistical data obtained in the current study for samples of lineages C1, C2, E1, E2, and species G, H, I were compared with data from the remaining genetic lineages of A. pinguis to determine whether the composition of volatile compounds differed within the group of cryptic species of A. pinguis that were studied [35]. Two classification methods were used: cluster analysis and principal component analysis. In cluster analysis, hierarchical clustering based on Euclidean distances according to Ward's linkage method was used [40]. Principal component analysis was based on the covariance matrix of all 66 compounds using the nonlinear iterative partial least squares algorithm for constructing the PCA model. The v-fold method was used to find the optimal number of principal components that reached the maximum Q2. Statistical significance of the principal components was assessed based on the rule, Q2 > Limit. Statistical analyses were performed using STATISTICA 13.3 (StatSoft, Kraków, Poland).

Conclusions
GC-MS analysis of the volatiles revealed variability in the chemical composition of the various cryptic species of A. pinguis. The composition of volatile compounds in this species is largely genetically determined. The current and previous studies suggest that the selection of marker compounds might permit the rapid identification of almost all cryptic species of A. pinguis. These compounds include pinguisone (52) and its derivatives: deoxopinguisone (40), dihydrobryopterin A (55), and methyl norpinguisonate (56). Based on the content of these compounds, lineages E1 can be distinguished from A3 and I. The content of these compounds also permits the remaining lineages to be distinguished. The compounds IR = 1533 (37) and IR = 1633 (46) permit the differentiation of lineages B2, B3, C2 and species F. Costunolide (60) appears to be a robust chemical marker for lineage B1, C1, C2, and species G, H. The compound IR = 1934 (61) identifies lineage E2. Compound IR = 1989 (65) identifies lineage A1 and A2. The role of chemical compounds as chemotaxonomic markers for cryptic species of A. pinguis was supported by two multivariate statistical methods (CA and PCA), which grouped the studied samples based on the detected chemical compounds and was largely consistent with the genetic determination based on DNA barcoding sequences.

Data Availability Statement:
The data presented in this study is available within the article.