DNA-Based Identification of Eurasian Vicia Species Using Chloroplast and Nuclear DNA Barcodes

Many legume species of the Vicia L. genus (Fabaceae Lindl.) are key components of the Mediterranean diet and have an integral role in sustainable agriculture. Given the importance of the Vicia species for Eurasian culture, it is necessary to implement methodologies, such as DNA barcoding, that can enable the effective authentication and identification of species in the genus. In this study, we analysed the chloroplast trnL and rpoC1, as well as the nuclear ITS2 DNA barcoding regions, to identify 71 Vicia specimens of Eurasian descent. Both the trnL and ITS2 regions were highly effective in discriminating the analysed taxa, while the more conserved rpoC1 region could not identify all of the selected species due to high sequence conservation or non-annotated or absent rpoC1 species sequences in GenBank. A dendrographic representation of the generated trnL data showed sufficient clustering for most of the analysed taxa, although some topological discrepancies were observed. ITS2 and rpoC1 reconstructions were also used for resolving the topological discrepancies observed in the trnL tree. Our analysis suggests that a combination of DNA barcoding regions is essential for accurate species discrimination within the Vicia genus, while single-locus analyses do not provide the necessary resolution.


Introduction
Legumes are the second most important crop after cereals in terms of agricultural production, human nutrition, and animal feed [1]. Many leguminous species have been an integral part of the human diet since ancient times due to their low fat and high protein content, richness in fibre and other nutrients, as well as their anticarcinogenic and prebiotic potential [1][2][3][4][5]. Furthermore, legumes are a nutrient-rich feed for domesticated animals. Apart from the use of legumes for human or animal consumption, many species are also utilized in agricultural practices, such as crop rotation, given many species form symbiotic relationships with nitrogen-fixing bacteria, thus replenishing soil nitrogen levels. Finally, legumes are also an invaluable source of nutraceuticals and pharmaceuticals [4,6,7], and are important in the production of other industrial products [8]. Legume seeds can also be fractionated into high protein-, fibre-, and starch-based raw ingredients for the food, increasingly apparent that there are some limitations in single loci analyses, and the combination of multiple barcodes may provide higher discrimination accuracies, especially for closely related species [44].
The combination of the chloroplast regions trnL and rpoC1, as well as the nuclear ITS2 region, has been previously used for the discrimination of major Mediterranean legume species, with trnL and ITS2 showing the highest level of sensitivity [39]. More specifically for Vicia, the ITS2 and several regions of the chloroplast genome were previously used for the taxonomic classification of several species in the genus [45][46][47]. More recently, the combination of the barcoding regions ITS2, matK, and rbcL was also evaluated along with morphological taxonomic traits for their efficacy in discriminating among South Korean Vicia species [48].
Apart from species discrimination, DNA barcoding data have also been used for inferring and resolving phylogenetic relationships in several genera in the Fabaceae family, including Vicia. Phylogenetic analyses based on both morphological descriptors of diversity and DNA data showed that the Eurasian species are likely to have originated in the eastern Mediterranean or the western Asiatic regions and the New World species were derived from lineages that invaded America [1,41]. The divergence of the two groups occurred relatively recently [41] and was likely the result of three independent invasion events [12,49]. Nevertheless, Vicia species phylogeny is still inconclusive, owing to the taxonomic complexities that emerge from incongruent phylogenetic analyses and the high intraspecific genetic diversity observed in the genus [1], as well as the classification of several taxa as subspecies and races [50] or aggregates [50][51][52].
To verify the species identity of several Vicia specimens of Eurasian descent in local seed banks, we implemented DNA barcoding analyses using the trnL and rpoC1 plastid regions, as well as the nuclear ITS2 region. Species discrimination was carried out in terms of sequence similarity (>90%) with indicative GenBank entries, and Maximum Likelihood (ML) dendrograms were utilized for visualizing the sequence differences amongst the studied specimens. Our analysis indicates that the combination of both sequence similarity assessments and dendrographic representations allows for more accurate species discrimination in the Vicia genus.

Results
To support species identification for 71 indicative Eurasian Vicia specimens from local seed banks, DNA barcoding analysis was implemented using, initially, the trnL region. The trnL DNA barcoding region was successfully amplified and sequenced for all 71 specimens, generating reads of approximately 362 bp in length. A Basic Local Alignment Search Tool (BLAST) analysis of the generated trnL sequences against GenBank entries enabled us to confidently identify the Vicia specimens at the species level with high-percentage identity values above 90% for most of the specimens (Table 1).  To graphically represent the sequence differences amongst the studied specimens, a multiple sequence alignment was assembled for the generated trnL sequences of the 71 Vicia specimens, along with trnL reference sequences from GenBank. The resulting alignment generated a matrix of 492 sites in total, with 207 variable and 130 parsimony-informative sites (Table S1). When studying the generated trnL ML tree, most specimens of the same Vicia species showed distinct clustering ( Figure 1). To further verify broad tree topologies and resolve taxon discrepancies, the ITS2 and partial rpoC1 regions were also analysed for selected taxa. For these analyses, specimens were primarily selected for their misplacement on the trnL tree, especially those that did not form any apparent clustering, such as V. faba (Figure 1). Individual specimens with an unsuccessful amplification of the ITS2 and rpoC1 regions were excluded from the subsequent analyses, although alternative representatives of the species were used in this case. Vicia species. The tree was generated using the Maximum Likelihood method and the Tamura-Nei model with +G = 0.693. The tree with the highest log likelihood (−908.72) is shown. Branch labelling represents the percentage of replicate trees in which the corresponding taxa clustered together (1000 bootstrap replicates). The Trifolium repens trnL sequence (GenBank acc. number: JN617179) was used as an outgroup to root the tree. The Taxa are colour-coded at the species level for visualization purposes. The Vicia species analysed in this work are labelled with circle markers. Numbers in species labelling correspond to sample ID (Table 1). GenBank-derived reference sequences are indicated by square-shaped markers and accession numbers present in the taxon labelling.
However, there were some discrepancies observed in the topology of several species, especially for the V. faba samples, which were scattered throughout the tree ( Figure 1). Furthermore, individual specimens, such as V. sericocarpa (specimen 53) and V. lathyroides (specimen 38), did not cluster with other specimens of the same species ( Figure 1). Despite the apparent clustering of specimens in the clade formed by the V. aintabensis, V. hybrida, V. noeana, and V. peregrina species, there were very few differences in the trnL sequence in these species that could allow further sub-clustering; hence, bootstrap support is very low in this clade ( Figure 1). Similar cases for topology were also observed for the V. sativa and V. grandiflora species (Figure 1). Despite some taxon misplacements, broader patterns can be clearly observed. According to the trnL tree, V. dionysiensis shows an early divergence, followed by a clade that radiates to species with very similar trnL sequences (V. hybrida, V. aintabensis, V. noeana, and V. peregrina), a clade that splits to the V. anatolica and V. pannonica species, a clade that primarily contains the V. lutea, V. michauxii, and V. ervilia, and finally, a clade that diverges into the V. narbonensis, V. bithynica, V. lathyroides, and V. sativa/V. grandiflora species (Figure 1).
To further verify broad tree topologies and resolve taxon discrepancies, the ITS2 and partial rpoC1 regions were also analysed for selected taxa. For these analyses, specimens were primarily selected for their misplacement on the trnL tree, especially those that did not form any apparent clustering, such as V. faba ( Figure 1). Individual specimens with an unsuccessful amplification of the ITS2 and rpoC1 regions were excluded from the subsequent analyses, although alternative representatives of the species were used in this case. A BLAST analysis of the generated ITS2 sequences for the selected taxa showed high levels of similarity with the corresponding reference sequences from GenBank ( Table 2). The alignment of those sequences with reference sequences generated a matrix of 436 bp, with 200 variable and 37 parsimony-informative sites (Table S1). In contrast to trnL, the ITS2 tree resolved the topological discrepancies observed in the V. faba and V. lathyroides clades, with all the specimens of the same species displaying distinct clustering ( Figure 2). Despite the lack of strict clustering for the V. sericocarpa species, the ITS2 sequences in these specimens are still more similar with each other than those from the other specimens; hence, the V. sericocarpa branches did not interject the well-defined clusters on the ITS2 tree, in contrast to what has been observed in the trnL tree (Figures 1 and 2). Overall, in the context of the studied species, the overall ITS2 tree topology r the broad pattern observed in the trnL tree, with an earlier divergence of the V. ser species followed by V. anatolica, a clade diverging to V. lutea and V. pannonica, as we clade comprised of the rest of the species (Figure 2). It is noteworthy that the diver of V. faba, according to the ITS2 dendrogram, is placed before that of V. bithynica, a as those of V. sativa and V. grandiflora clades (Figure 2).
To further resolve the taxon discrepancies observed in the trnL tree with a mor served region, BLAST analysis and dendrographic reconstruction were also perfo Branch labelling represents the percentage of replicate trees in which the corresponding taxa clustered together (1000 bootstrap replicates). The Trifolium repens ITS2 sequence (GenBank acc. number: BSYJ31) was used as an outgroup to root the tree. Newly generated Vicia species sequences are labelled with circular markers. Number annotations in species labelling correspond to sample IDs ( Table 2). Reference sequences from GenBank with the corresponding accession numbers are marked with square-shaped markers.
Overall, in the context of the studied species, the overall ITS2 tree topology retains the broad pattern observed in the trnL tree, with an earlier divergence of the V. sericorpa species followed by V. anatolica, a clade diverging to V. lutea and V. pannonica, as well as a clade comprised of the rest of the species (Figure 2). It is noteworthy that the divergence of V. faba, according to the ITS2 dendrogram, is placed before that of V. bithynica, as well as those of V. sativa and V. grandiflora clades (Figure 2).
To further resolve the taxon discrepancies observed in the trnL tree with a more conserved region, BLAST analysis and dendrographic reconstruction were also performed using part of the rpoC1 region, which encodes for the beta subunit of the DNA-dependent RNA polymerase. Despite the successful amplification and sequencing of the rpoC1 region for all of the Vicia specimens examined (Table 3), there were no available data in the GenBank repository for most of the tested species. As such, verification of species identity by sequence similarity using this region was not feasible. Nevertheless, the two species for which data exist, namely V. faba and V. sativa, showed 100% similarity with high bit score values, as expected for a conserved DNA barcoding region (Table 3).  The alignment of the corresponding sequences with reference species sequences from GenBank generated an alignment matrix of 442 bp with 25 parsimony-informative and 37 variable sites (Table S1), which indicates very low variation amongst the analysed sequences, as expected for a rather conserved region. On the basis of the generated rpoC1 dendrogram, most of the recently diverged species, such as V. sativa, V. grandiflora, and V. faba, were clearly separated into distinct clades, showing topological agreement with the ITS2 tree (Figures 2 and 3).
However, the V. lathyroides, V. peregrina, and V. michauxii species showed much earlier divergence than that observed in the trnL and ITS2 trees (Figures 1 and 2). Especially for V. michauxii and V. peregrina, the rpoC1 tree shows an even earlier divergence than that of the V. dionysiensis clade (Figure 3), which was shown to have the earliest divergence amongst the analysed Vicia species in the trnL tree ( Figure 1). ITS2 tree (Figures 2 and 3).
However, the V. lathyroides, V. peregrina, and V. michauxii species showed much earlier divergence than that observed in the trnL and ITS2 trees (Figures 1 and 2). Especially for V. michauxii and V. peregrina, the rpoC1 tree shows an even earlier divergence than that of the V. dionysiensis clade (Figure 3), which was shown to have the earliest divergence amongst the analysed Vicia species in the trnL tree (Figure 1).

Figure 3.
Dendrogram based on the rpoC1 alignment matrix generated for 27 specimens across 10 Vicia species. The tree was generated using the Maximum Likelihood method and the Jukes-Cantor model with +G = 0.05. The tree with the highest log likelihood (−927.88) is shown. Branch labelling represents the percentage of replicate trees in which the corresponding taxa clustered together (1000 bootstrap replicates). The Pisum sativum rpoC1 sequence (GenBank acc. number: NC014057) was used as an outgroup to root the tree. The Vicia species sequences generated in this work are labelled with circular markers. Number annotations in species labelling correspond to sample IDs (Table 3). Reference sequences from GenBank with the corresponding accession numbers are labelled with square-shaped markers.

Discussion
Legume crops and their products are important elements of the human diet, an essential animal feed, as well as an integral component of sustainable agriculture. Many leguminous species of the Vicia genus have been domesticated and are commonly used in various Protected Designation of Origin (PDO) and Protected Geographical Indication (PGI) products [53], which necessitates the application of methodologies that will enable the effective authentication and identification of the product's composition. Furthermore, a comparative analysis of morphological, biochemical, karyotypic, and genetic characteristics of several Vicia species revealed inconsistencies in the resulting classifications within the genus [54][55][56][57][58], which indicates that more reliable methodologies using genetic markers with general taxon specificity are required for species identification.
DNA barcoding analysis was proven to be one of those methods capable of discriminating among species on the basis of the sequence of small DNA region(s) [30,37,38]. This methodology was successfully used not only for species identification and forensic analyses, but also for biodiversity and phylogenetic studies [59,60]. DNA barcoding in plants most commonly uses chloroplast sequences, which are species-specific [37,38]. Unlike the nuclear genome that follows Mendelian inheritance, chloroplasts do not undergo meiotic recombination and show maternal inheritance in the majority of the angiosperms [61]. Furthermore, the chloroplast genome is sufficiently rich in polymorphic sequences, which facilitates the conduct of evolution studies and species discrimination in plants [62][63][64]. Nevertheless, given the lack of a single universal barcode that can be effectively used for all plant species [65], the application of multi-loci approaches, utilizing both nuclear and chloroplast DNA barcodes, has significantly increased the identification accuracies, especially for closely related species [44].
To verify the accuracy of species identification for indicative Eurasian Vicia specimens deposited in local seed banks, we implemented DNA barcoding analyses using the highly variable trnL and ITS2 DNA barcoding regions. Both DNA barcodes have rich representation in the GenBank repository for many plant species, including species in the Vicia genus, thus allowing for a more accurate species discrimination using sequence similarity (>90%). The rather conserved chloroplastic rpoC1 region was also used, although it is not the most optimal barcode for species discrimination in the Fabaceae [39], given that it can facilitate a more optimal interspecific alignment and tree topology than the highly variable trnL and ITS2 regions.
More specifically, to facilitate species identification for the 71 Eurasian Vicia specimens studied in this work and verify their morphological identification prior to seed bank submission, we initially analysed the chloroplast trnL DNA barcoding region in these specimens. The trnL region was chosen as the primary barcode of our analysis, given that it has the desirable characteristics for DNA barcoding, especially for sample preparations with compromised DNA quality [32,35], and was also used successfully for species discrimination in the Fabaceae family, including Vicia [15,39]. Using the BLAST analysis of the generated trnL sequences, the majority of the specimens were successfully discriminated at the species level with notably high similarity values, which were higher than 90% identity (Table 1), further indicating that the trnL region can be reliably used for species discrimination in the Vicia genus. The derived trnL tree showed an overall agreement in the topology of the taxa with the corresponding reference entries, although some discrepancies were noted for several taxonomic groups (Figure 1). Although the generated dendrogram is not a phylogenetic tree per se, it can still be used to facilitate species discrimination, especially for inconclusive BLAST results. As such, pertaining to the trnL dendrogram, the most pronounced incongruences were observed for the V. sativa and V. grandiflora clades that are not clearly defined, as well as the V. faba and V. sericocarpa species, which do not form distinct clusters, with several individual specimens being placed throughout the tree (Figure 1). Very recently, Han et al. (2021) inferred similar topologies when testing the phylogeny of species in the Vicia genus, with V. sativa and V. faba showing the most recent divergence and the clades radiating to V. panonica, V. aintabensis, and V. peregrina having an earlier divergence [48].
In an attempt to further resolve the topological incongruences observed in the trnL tree, especially for the V. faba, V. grandiflora, V. lathyroides, V. sativa, and V. sericocarpa species, and to further support species discrimination amongst the specimens, the ITS2 region was also analysed for the selected taxa. The selection of taxa for the ITS2 analysis was primarily based on specimens that did not form any apparent clustering and were spread across the trnL tree, such as V. faba and V. sericocarpa (Figure 1). Several other taxa were also selected from various taxonomic groups, despite their placement on the trnL tree, in order to preserve the overall dendrographic topology, and broad topological comparisons can be made between the two barcoding regions. Kress et al. (2005) have reported the use of the nuclear ITS2 region as a suitable barcode for flowering plants, although it has been problematic for some species [37]. Furthermore, the ITS2 region was regarded as successful for species discrimination in the Fabaceae family [39,43] and more specifically in the Vicia genus [41,46,48,66]. BLAST analysis using the ITS2 region was also highly effective in discriminating the analysed taxa (Table 2), with remarkably high-percentage identity values (>98%) between the analysed taxa and GenBank reference sequences. This is consistent with previous findings that show both trnL and ITS2 as being equally effective in species discrimination for species in the Fabaceae family [39]. The overall topological discrepancies observed in the trnL tree have been resolved with high bootstrap support using the ITS2 region ( Figure 2). More specifically, the V. sativa and V. grandiflora clades were sufficiently resolved in distinct clusters (Figure 2). According to the ITS2 tree, the V. faba samples showed a more defined clustering, which positions this species divergence in between V. bythinica and V. lathyroides (Figure 2). This placement of V. faba is consistent with previous phylogenetic studies [41,48,[66][67][68][69], which further supports the accuracy of the ITS2 tree topology.
Although the ITS2 analysis greatly resolved the general topological incongruences observed in the trnL tree, despite being a highly variable region, the conserved rpoC1 region was also used for selected species, notwithstanding the reported low discrimination efficiency [32]. We argue that a more conserved region of the chloroplast genome would be more effective in discriminating species in the genus. However, in contrast with the other barcoding regions, rpoC1 could not identify the selected species apart from the V. sativa and V. faba species, which showed 100% identity (Table 3). This was likely the result of nonannotated or absent rpoC1 sequences in the GenBank repository. The generated rpoC1 tree showed consistency with the ITS2 tree pertaining to the topology of the recently diverging V. sativa, V. grandiflora, and V. faba taxa, whilst early diverging clades, such as V. lathyroides, V. peregrina, and V. michauxii, were not congruent between the two barcodes (Figures 2 and 3). Although all of the taxa show high intraspecific bootstrap support and clear intraspecies clustering, the majority of the analysed clades, except for V. sativa, V. grandiflora, and V. faba, show topological discrepancies compared to the rest of the trees present in this study ( Figure 3). This finding may be attributed to the fact that rpoC1 is a highly conserved region to allow for accurate species discrimination for at least some species in the Vicia genus. Moreover, the rpoC1 region was shown to be the least discriminatory when used for species identification amongst several coding and non-coding barcodes within the Fabaceae family [39,40]. Hence, species discrimination within the Vicia genus would be difficult to achieve using only this barcode and will require more thorough investigation with more specimens in these groups.

Plant Materials and DNA Extraction
Seeds of 71 specimens corresponding to 20 different species of the genus Vicia (Fabeae, Fabaceae) were provided by the Genetic Resources Unit at the International Center for Agricultural Research in the Dry Areas (ICARDA) in Aleppo, Syria, and by the General Commission for Scientific Agricultural Research (GCSAR) in Damascus, Syria. Species selection was based on the availability of Vicia species germplasms in the corresponding seed banks. Information on species names and sources of origin, which were derived from the seed bank records, is presented in Table 4.
A total of 5 seeds from each specimen were sown in a 0.5 L plastic pot containing turf (Euroveen B. V., Charleston, SC, USA). Seeds were allowed to germinate under laboratory conditions (21 • C) and grow for 1 month. DNA was extracted from 0.5 g leaf tissue of each specimen using the CTAB method according to Doyle and Doyle (1987) [70]. The DNA concentration was measured with standard spectrophotometric procedures using the Eppendorf BioPhotometer (Eppendorf, Hamburg, Germany). DNA quality and quantity were further assessed with gel electrophoresis on a 1% agarose gel. Samples were then diluted to 20 ng/µL working stocks.  Table S2. PCR cycling was carried out with an initial denaturation step at 94 • C for 3 min, followed by 35 cycles at 94 • C for 30s, 55 • C for 40s, and 72 • C for 1 min, with a final extension step at 72 • C for 2 min. The PCR products were sequenced in two directions with the Big Dye terminator v3.1 Cycle sequencing kit (PE Applied Biosystems, Foster City, CA, USA) in an automated ABI 3730 sequencer (PE Applied Biosystems). Sequence ambiguities were manually corrected using the CHROMAS software version 2.6.6 (Technelysium Pty Ltd., South Brisbane, Australia).

Basic Local Alignment Search Tool (BLAST) Analysis
Species identification, using the sequence similarity approach, was performed using the GenBank database (https://www.ncbi.nlm.nih.gov/genbank/, accessed on 13 May 2021). The default search parameters were used, optimizing for highly similar sequences (megablast). Species identification was regarded as successful when both the species label of the specimen deposited in the local seed banks and the molecular identity according to sequence similarity agreed by >90%. The generated DNA barcoding sequences were submitted to GenBank with accession numbers MZ334891-MZ334961 for trnL, accession numbers MZ338297-MZ338323 for ITS2, and accession numbers MZ285764-MZ285790 for rpoC1.

Multiple Sequence Alignment and Dendrographic Representation
Multiple sequence alignments were generated for the trnL, ITS2, and rpoC1 sequences of the samples used herein, along with representative reference sequences from GenBank for representative taxa. The Molecular Evolutionary Genetics Analysis X (MEGA X) software version 10.05 [71] with the MUSCLE algorithm was used for alignment generation. A Maximum Likelihood (ML) fitness assessment for 24 substitution models was carried out for each of the alignments using MEGA X (Table S3). The model with the lowest Bayesian Information Criterion (BIC) value for each alignment was selected to subsequently generate the corresponding ML trees with 1000 bootstrap replicates using MEGA X. The Tamura-Nei (T92) model [72] with discrete gamma distributions (+G = 0.693) was used for trnL, the Kimura 2-parameter (K2-P) model [73] with discrete gamma distributions (+G = 2.119) was used for ITS2, and the Jukes-Cantor model (JC) model [74] with discrete gamma distributions (+G = 0.050) was used for rpoC1 tree reconstructions. The Pisum and Trifolium species in the dendrograms were used as rooting tree outgroups.

Conclusions
Legumes, such as faba, peas, and lupins, have the potential to replace animal-based protein sources, such as meat, bone, or fish, as well as provide other important seed fractions, namely, starch and fibre, as raw materials for making a variety of formulated products in the food, feed, and cosmetics industries. Given the importance of several Vicia species in the Mediterranean diet and the value of such ingredients in local PDO and PGI products, we implemented DNA barcoding analysis using the trnL, ITS2, and rpoC1 regions to discriminate Eurasian Vicia species. Our analysis indicates that both the trnL and ITS2 can effectively discriminate Vicia species, whilst the use of the rpoC1 region was not capable of identifying most of the analysed species. The use of multi-loci analyses may be more powerful to provide higher discrimination accuracies, especially for the closely related species that single DNA barcodes fail to resolve. Broad dendrographic representations for the analysed taxa were also more congruent when considering topologies reconstructed from the trnL and ITS2 data, in contrast to the topologies obtained from rpoC1. We expect the work presented herein to enrich the available barcoding data for species discrimination in the Vicia genus, which will contribute significantly to the resolution of the still inconclusive phylogeny of the genus, as well as for agricultural and culinarian purposes, especially for the local Eurasian communities.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/ 10.3390/plants11070947/s1. Table S1: Multiple sequence alignment statistics, Table S2: Primers used in this study, Table S3: Best Maximum Likelihood model fit for the reconstruction of the trnL, ITS2, and rpoC1 trees. Funding: This research has been co-financed by the European Regional Development Fund of the European Union and Greek national funds through the operational program Competitiveness, Entrepreneurship and Innovation, under the call RESEARCH-CREATE-INNOVATE (project code: T1EDK-04448). Also, this research was funded by Chiang Mai University.
Institutional Review Board Statement: Not applicable.