Genotyping-by-Sequencing Analysis Shows That Siberian Lindens Are Nested within Tilia cordata Mill

: Tilia sibirica and T. nasczokinii are considered to be endemic Siberian linden species. They have very small distributions located hundreds to thousands of kilometers away from other lindens. It is unclear how closely these species are related to the widespread Tilia cordata : according to the current hypotheses, they could be pre-Pleistocene relicts or remnants of the recent continuous range of T. cordata that existed during the Holocene climatic optimum. Earlier studies detected signiﬁcant differences between T. sibirica , T. nasczokinii , and T. cordata in microsatellite loci, but not in plastid sequences. Here we performed a phylogenetic analysis of several linden species based on GBS data. The obtained GBS sequences were assembled to create phylogenetic trees based on 16,000–294,000 variable sites. We found that T. cordata and the two putative Siberian species formed a monophyletic group. It consisted of three clades: the basal clade containing specimens from the Caucasus, and two sister clades representing populations from the East European Plains+the Urals and Siberia, respectively. Neither of the Siberian species was related to the Far Eastern T. amurensis , as was hypothesized earlier. Our study suggests that the colonization of Europe and Siberia after the Last Glacial Maximum occurred from different glacial refugia.


Introduction
Tilia cordata Mill. has the largest distribution among Eurasian lindens, from the Atlantic coast of Europe to West Siberia ( Figure 1). In the east, its distribution becomes fragmented, with small populations scattered across swamps of the West Siberian Plain [1], up to the Novosibirsk [2] and Tomsk [3] Oblasts. These easternmost outposts of T. cordata are separated by ca. 600 km from the linden populations of the foothills of the Altai-Sayan mountain system, which are regarded as two separate species. A relatively big population from the Gornaya Shoria Mts (Kemerovo Oblast), the so-called Kuzedeevo Linden Island [4][5][6], was described as Tilia sibirica Fisch. ex Bayer. Numerous small populations from neighboring regions are also believed to belong to this species [7][8][9]. T. sibirica differs from T. cordata by having leaf blades on flowering shoots with strait or shallowly-cuneate base, not glaucescent beneath, as well as dense pubescence of ovary [10,11]. Two small adjacent populations growing near Krasnoyarsk were described as Tilia nasczokinii Stepanov. They were characterized as having asymmetrical, transversely cordate or basally truncated leaves, lanceolate lobes of stellate stigma, and few-flowered inflorescences [12]. However, many authors do not recognize these two taxa as separate species, because the diagnostic characters are highly variable so the distinctions can be traced only on a series of individuals. For example, the authors of the fundamental treatise «Woody plants of Asian Russia» considered both Siberian species as synonyms of T. cordata [13].
Several molecular studies attempted to clarify this issue. Logan et al. [14] employed 12 microsatellite loci to compare T. sibirica from the Kuzedeevo Linden Island to three T. cordata populations from Austria, Poland, and West Siberia. T. sibirica was found to differ  Table 1. Distributions of T. sibirica and T. nasczokinii are too small so they are only represented by location circles (locations 5,6 and 7,8, respectively). Distribution of T. cordata is shaded in green; distribution of T. taquetii and T. amurensis is shaded in red. Arrows indicate putative dispersal routes of T. cordata after the LGM.   Ekart et al. [16] performed a more extensive analysis using the same microsatellite loci as in [14]. Fst values between T. sibirica, T. nasczokinii, and T. cordata were significantly higher (0.202-0.549) than among T. cordata populations separated by thousands of km (0.057-0.152). It is noteworthy that Fst between T. nasczokinii and T. sibirica (0.473-0.549) was higher than between each of them and T. cordata (0.202-0.407; 0.275 on the average). This might indicate that the greater number of distinctive microsatellite alleles in Siberian species can result from the bottlenecks they must have passed given their small population number and does not reflect the divergence time between the species. It is desirable to complement microsatellite analysis with other molecular methods, especially genome-wide ones.
In this study we attempted to investigate the phylogeny among linden species currently recognized as Tilia sibirica, T. nasczokinii, T. cordata, T. amurensis Rupr., T. taquetii C.K. Schneid., and T. begoniifolia Steven using genotyping-by-sequencing (GBS). This method is based on genome-wide sequencing of loci adjacent to conservative restriction sites. It allows one to obtain thousands of homologous loci for a set of species with no prior genome data.

Materials and Methods
Linden specimens were collected by the authors in 2019-2021 (Table 1, Figure 1). The voucher specimens are stored in the NSK herbarium under the accessions NSK0089748-NSK0089760. Morphological identification was performed according to regional floras [9][10][11]17]. Total DNA was extracted from air-dried or ethanol-fixed leaves using CTAB as described in [18].
DNA was purified using AMPure XP (Beckman Coulter Life Sciences, Brea, CA, USA), quantified using Qubit DNA HS Assay Kit (Thermo Fisher, Waltham, MA, USA), and normalized to 20 ng/µL. 200 ng of DNA were digested by MspI (4 u) and PstI-HF (4 u) restriction enzymes. After enzyme inactivation at 70 • C, 18 µL of 2× Instant Sticky-end Ligase Master Mix (New England Biolabs, Ipswich, MA, USA), and 1 µL each of 25 nM double-stranded P5-GBS-PstI and P7-GBS-MspI adapters with modified bases were added to the mix. After 1 h ligation at 22 • C, DNA was sedimented using magnetic KAPA Beads (Roche, Switzerland) and sticky ends were blunted by BST polymerase. Long fragments and concatenates were removed using AMPureXP. Indexing PCR was performed with 6 nmoles of the universal S4_indPCR primer and 6 nmoles of the corresponding indexing primer with Phusion HotStart PCR MasterMix (NEB, Ipswich, MA, USA) with the following profile: 30 s at 98 • C; 15 cycles of 10 s at 98 • C, 30 s at 60 • C, and 5 s at 72 • C; final extension step for 10 min at 72 • C. Libraries prepared for different specimens were pooled in equimolar quantities; the 240-600 bp fraction was excised by BluePippin (Sage Sciences, Beverly, MA, USA). Sequencing was performed using 150 cycles NextSeq500/550 High Output Kit v2.5 (Illumina, San Diego, CA, USA), the Read1_GBS_PstI, and the custom indexing primer [19].
The obtained sequence reads were demultiplexed, trimmed to 118 bp, and assembled into orthologous loci using ipyrad v. 0.9.82 [20] and Stacks v. 2.55 [21]. Loci assembly was performed with a variety of parameters: for Stacks, ranging from the conservative -m 5 -M 3 -n 3 to -m 3 -M 10 -n 7 as proposed by Cariou et al. [22] for phylogeny reconstruction. Assembly in ipyrad was done with the default parameters as well as with somewhat more relaxed values (max_Hs_consens 0.15, max_Indels_locus 15, max_shared_Hs_locus 0.6). The output of these programs in phylip format was used for phylogenetic inference.
Phylogenetic trees were constructed using the Maximum Likelihood algorithm in RAxML v. 8.2.12 [23] and Bayesian inference in MrBayes 3.2.7a [24]. Maximum Likelihood analysis was done with the GTRCAT model as suggested by the program; 1000 bootstrap replicates were performed. For the Bayesian inference, we used the general time reversible model with gamma-distributed rate variation. We performed 1,000,000 generations; 25% were discarded as burn-in.

Results
We obtained 1,868,940 to 23,038,890 reads for different linden specimens. Assembly of GBS loci was done independently with two programs, Stacks and ipyrad. For Stacks, we performed multiple assemblies with a wide range of different parameters: minimum number of reads for locus assembly, maximum number of mismatches between loci, etc. Depending on the parameters, we obtained 16,253 to 170,243 variable sites for our alignment. Ipyrad yielded much more information, with 220,408-294,531 variable sites. Trees obtained using different alignments had similar topologies, but those based on Stacks alignments had lower resolution and low statistical support for the T. cordata clade, while ipyrad trees were well resolved. Thus, we will base our analysis on the ipyrad trees.
The phylogenetic trees ( Figure 2) revealed three high-level clades. One included the specimens of T. cordata and the Siberian linden species, the other the Far Eastern T. amurensis and T. taquetii, and the third one was formed by the Caucasian T. begoniifolia. The clade containing T. cordata was split into two subclades containing specimens from the southern (the Caucasus) and the northern parts of the range. The northern subclade fell into two groups encompassing the western (the East Siberian Plain and the Urals) and the eastern (Siberia) specimens. In the Siberian group, the two individuals of T. nasczokinii expectedly clustered with each other, as did the two T. sibirica specimens. A plant from Novosibirsk turned out to be closer to the T. nasczokinii subgroup.
We obtained 1,868,940 to 23,038,890 reads for different linden specimens. Assem of GBS loci was done independently with two programs, Stacks and ipyrad. For Stac we performed multiple assemblies with a wide range of different parameters: minimu number of reads for locus assembly, maximum number of mismatches between loci, e Depending on the parameters, we obtained 16,253 to 170,243 variable sites for o alignment. Ipyrad yielded much more information, with 220,408-294,531 variable sit Trees obtained using different alignments had similar topologies, but those based Stacks alignments had lower resolution and low statistical support for the T. cord clade, while ipyrad trees were well resolved. Thus, we will base our analysis on ipyrad trees.
The phylogenetic trees ( Figure 2) revealed three high-level clades. One included specimens of T. cordata and the Siberian linden species, the other the Far Eastern amurensis and T. taquetii, and the third one was formed by the Caucasian T. begoniifo The clade containing T. cordata was split into two subclades containing specimens fro the southern (the Caucasus) and the northern parts of the range. The northern subcla fell into two groups encompassing the western (the East Siberian Plain and the Urals) a the eastern (Siberia) specimens. In the Siberian group, the two individuals of T. nasczok expectedly clustered with each other, as did the two T. sibirica specimens. A plant fro Novosibirsk turned out to be closer to the T. nasczokinii subgroup. Figure 2. Phylogenetic tree of the studied specimens constructed using the ML method. Numb in brackets after specimen names refer to location numbers ( Figure 1, Table 1). Numbers at branches indicate ML bootstrap support/Bayesian bootstrap support. Figure 2. Phylogenetic tree of the studied specimens constructed using the ML method. Numbers in brackets after specimen names refer to location numbers ( Figure 1, Table 1). Numbers at the branches indicate ML bootstrap support/Bayesian bootstrap support.

Phylogeny of the T. cordata Clade
In this study we aimed to reconstruct phylogenetic relationships between the geographically remote specimens of linden from the East European Plain, the Caucasus, and the Far East using GBS. The status of T. sibirica and T. nasczokinii as separate species is questionable. On the one hand, they differ from the European T. cordata both morphologically [10][11][12] and genetically [14][15][16]. On the other hand, the morphological characters are highly variable, and the differences can be seen only on a series of individuals [13].
Our data confirmed the results of earlier molecular studies [14,16]: European populations of T. cordata indeed differ genetically from the Siberian taxa ( Figure 2). However, we found out that these differences are smaller than those between the Caucasian subclade Diversity 2022, 14, 256 5 of 8 of T. cordata and the rest of the clade. This divergence is expected from the disjunct distribution of T. cordata, with an isolate in the Caucasus (Figure 1). On the whole, genetic and geographical distances between the studied specimens appear to be well corresponded.
Based on morphology, some authors proposed that T. nasczokinii could be a sister species to the Far Eastern T. amurensis [12,15]. However, our analysis showed that these taxa are not closely related.
It is noteworthy that the study on European populations of T. cordata [25] found that a linden specimen from the East European Plain (the Voronezh Oblast of Russia) differed from the West European sample. Thus, it is possible that there could be additional genetic groups within T. cordata not detected in our study because we had no specimens from Western Europe.

Natural History of Siberian Lindens
The hypothesis that was mainstream throughout the 20th century and is still widespread is that the Siberian populations of linden are pre-glacial (Pliocene, «Tertiary») relics [4][5][6][26][27][28][29]. This viewpoint assumes that the age of their isolation exceeds 1.5 million years. This is in strong contradiction to paleoclimatic reconstructions, suggesting that southern Siberia was occupied by steppe-tundra during the Last Glacial Maximum, as well as in earlier strong glaciations [30][31][32]. It would be highly unlikely for the linden to survive in such a climate, since even now its Siberian populations, except for the Kuzedeevo Linden Island, are small (from a few trees to ca. 100 ha), often present as shrub forms, and obviously struggle for survival [33].
On the other hand, certain interglacial periods were even warmer than the current climate, and the distribution of linden was larger and uninterrupted. This was demonstrated even for the Holocene climatic optimum (the Atlantic period, ca. 5-9 kya) [30,[34][35][36]. According to palynological data, both European and Far Eastern Tilia species coexisted in the foothills of the Altai in Mid-Pleistocene [37].
Based on these generally accepted paleoclimatic data, it would be logical to suggest that current Siberian populations represent the remains of the continuous distribution of the Holocene climatic optimum, which persisted as small isolates in places with favourable microclimate. This hypothesis was first proposed by Reverdatto [38], and further endorsed by Grosset [39], and more recently and based on modern paleopalynological data, by Dubatolov and Kosterin [36]. It is supported by the fact that in the Kuzedeevo Linden Island, currently the largest patch of linden in West Siberia, abundant linden pollen was detected only in the Holocene deposits but not in lower soil horizons [6,40,41].
However, our GBS analysis and earlier molecular studies [14,16] demonstrated that T. sibirica and T. nasczokinii significantly differ from the studied T. cordata populations from its main part of the range. Moreover, molecular datings [14] suggest that the divergence time between T. sibirica and T. cordata lies in the range of 154-901 kya, which does not accord with either the pre-Pleistocene or with the Holocene hypotheses. This implies that Siberian linden populations either colonized West Siberia from another refugium than that gave rise to the current European populations, or they survived glaciations in situ. As said above, the latter hypothesis, i.e., the existence of broadleaved forest refugia in the steppe-tundra conditions, appears highly unlikely.
As an alternative, we could place the putative glaciation refugium of Siberian lindens into the elevations located to the south of West Siberia: the Kazakh Hilly Land (Central Kazakhstan), or, less probably, the foothills of Tian Shan [36] (Figure 1). These regions are currently rather arid (which is why no paleopalinological data available for them), but during the Pleistocene glaciations their climate is expected to be favourable for linden. It is possible that during interglacials, West Siberia had a continuous distribution of linden with coexisting European, Siberian, and Far Eastern taxa.

The Linden Specimen from Novosibirsk
One of our specimens was taken not from a natural population but from an alley in Novosibirsk Academy Town (location 4 on Figure 1). Our phylogenetic analysis placed it into the Siberian group. This finding was unexpected because it is generally assumed that T. cordata was used in landscaping of Novosibirsk [42]. However, we found the reports suggesting that about 2000 trees of T. sibirica from the Kuzedeevo Linden Island was planted in Novosibirsk in 1958 [40,43]. The lindens of Novosibirsk indeed resemble T. sibirica by morphological characters. Moreover, on the phylogenetic tree the specimen from Novosibirsk grouped with T. nasczokinii rather than T. sibirica. This can be explained by insufficient data on genetic diversity within T. sibirica involved into our analysis, so that trees stemming from other natural populations, not studied by us, could be planted in Novosibirsk. These populations would probably differ from those of the main Kuzedeevo island genetically, given that the distribution of this taxon is patchy.
The climate changes of the recent decades made the south of West Siberia much more suitable for broad-leaved trees. The lindens introduced by landscaping are thus becoming the source of invasions into the neighbouring birch and pine forests [44]. Therefore, we can currently observe an expansion of the autochthonous Siberian linden, probably similar to that observed in the Holocene climatic optimum, but human-assisted to a certain extent.

Taxonomic Implications
As said above, the taxonomic status of T. sibirica and T. nasczokinii is equivocal. Our study confirmed that T. nasczokinii and T. sibirica differ from T. cordata genetically, as was revealed in earlier studies [14,16]. On the other hand, recognition of these species makes T. cordata paraphyletic (Figure 2), while it is assumed that paraphyletic taxa should be avoided whenever possible [45,46]. In our case, this problem could be resolved either by considering T. cordata as a polytypic species comprising a set of subspecies, or by splitting it into several species. We suggest that the former way is more appropriate, taking into account that all subclades within the clade containing T. cordata have separate distributions, thus satisfying the requirements for subspecies proposed in [47]. The Caucasian populations, traditionally recognized as T. cordata [10,[47][48][49][50], therefore also deserve the subspecific status, based on their genetic and geographic distinctiveness from T. cordata from its main range. Therefore, we propose to recognize four subspecies within T. cordata: the type subspecies Tilia cordata subsp. cordata; the Caucasian subclade as the putative subspecies still to be described, the two Siberian subspecies Tilia cordata subsp. sibirica (Fisch. ex Bayer) Pigott and Tilia cordata subsp. nasczokinii (Stepanov)  We can make another taxonomic conclusion from our results. Zhu [51] treated Siberian lindens as Tilia amurensis var. sibirica (Fisch. ex Bayer) Y.C. Zhu, which was accepted in The World Checklist of Vascular Plants [52] and Plants of the World Online [53]. However, T. amurensis appears to be very distantly related to Siberian taxa, so this viewpoint is erroneous and should be rejected.