Phylogenetic Structure Revealed through Combining DNA Barcodes with Multi-Gene Data for Agrodiaetus Blue Butterflies (Lepidoptera, Lycaenidae)

Simple Summary The species-rich subgenus Agrodiaetus Hübner, 1822 is a distinct monophyletic lineage within the diverse blue butterfly genus Polyommatus Latreille, 1804. Although the subgenus has attracted the attention of numerous researchers, a large number of unresolved taxonomic problems persist in Agrodiaetus. In our study, we analyzed the taxonomic structure of the subgenus via combining short mitochondrial DNA barcodes of several extremely rare species, for which multi-locus data were unavailable, with multi-gene mitochondrial and nuclear data for the other Agrodiaetus taxa. This approach resulted in a high phylogenetic resolution of the tree obtained, even for the clades, which were solely represented by DNA barcodes. The status and taxonomic position of the enigmatic species P. muellerae, P. afghanicus, P. frauvartianae, P. bogra and P. anticarmon from Afghanistan, Pakistan, Iran and Turkey are revealed and discussed. The complete list of species groups and species of the subgenus Agrodiaetus is presented. Abstract The need for multi-gene analysis in evolutionary and taxonomic studies is generally accepted. However, the sequencing of multiple genes is not always possible. For various reasons, short mitochondrial DNA barcodes are the only source of molecular information for some species in many genera, although multi-locus data are available for other species of the same genera. In particular, such situation exists in the species-rich butterfly subgenus Polyommatus (Agrodiaetus). Here, we analyzed the partitioning of this subgenus into species groups by using three data sets. The first data set was represented by short mitochondrial DNA barcodes for all analyzed samples. The second and third data sets were represented by a combination of short mitochondrial DNA barcodes for part of the taxa with longer mitochondrial sequences COI + tRNA-Leu + COII (data set 2) and with longer mitochondrial COI + tRNA-Leu + COII and nuclear 5.8S rDNA + ITS2 + 28S rDNA sequences (data set 3) for the remaining species. We showed that the DNA barcoding approach (data set 1) failed to reveal the phylogenetic structure, resulting in numerous polytomies in the tree obtained. Combined analysis of the mitochondrial and nuclear sequences (data sets 2 and 3) revealed the species groups and the position within these species groups, even for the taxa for which only short DNA barcodes were available.


Introduction
Ideally, the analysis of evolutionary history and taxonomic structure of living organisms requires a comparative analysis of data obtained from multiple sources of evidence (morphological, multi-locus molecular, ecological, karyological, etc.) [1][2][3][4][5].In practice, such comprehensive analysis is not always possible.Many species are extremely rare and represented in collections by a limited number of specimens.Usually, such museum material is hardly suitable for comprehensive multi-locus molecular analysis due to its old age-resulting in DNA degradation-and the view that unique samples (especially type specimens) should be preserved as important standard vouchers rather than destroyed in the course of molecular studies.
In this situation, massive single-locus sequencing studies, such as the DNA barcoding research presented in Refs [6,7], have become the only real means of obtaining regular molecular information, which is available for multi-species comparisons and can thus be incorporated into phylogenetic research and taxonomic revisions.A situation where, for some species of a genus, there are only mitochondrial DNA barcodes, and for other species of the same genus, there are multi-gene data, is ordinary [4].Recently, a novel approach has been suggested for phylogenetic analysis of such genera [8].This approach is based on the combined analysis of short mitochondrial DNA barcodes for all species of a genus with multi-locus data for several representative taxa of the same genus.
Mitochondrial DNA barcodes, as well as their combination with nuclear genes, have been widely used to solve taxonomic problems of varying complexity, e.g., in butterflies (Refs [9][10][11][12][13][14]).An increase in the length of the molecular matrix up to the phylogenomic and genome-wide data dramatically increases the resolution of phylogenetic and taxonomic analyses [15].However, combining short sequences for some species with long sequences for other species can theoretically be problematic because missing characters can negatively affect the topology and branch length estimation [16].A study by Talavera et al. [8] clearly demonstrated that increasing species sampling by adding short DNA barcodes to multi-locus matrices increases the phylogenetic resolution.This is probably due to a partial solution to the problems of incomplete species sampling and long branch attraction [8].
In our study, we applied this approach [8] to the analysis of phylogenetic structure in the species-rich butterfly subgenus Polyommatus (Agrodiaetus) Hübner, 1822 (Lepidoptera, Lycaenidae).This subgenus represents a distinct monophyletic lineage within the diverse genus Polyommatus Latreille, 1804 [4].The subgenus Agrodiaetus was estimated to have originated only about three million years ago [17] and radiated rapidly in the Western Palaearctic region [18].The last published review of the subgenus includes 120 valid species [19].Species within Agrodiaetus are extremely uniform and, with a few exceptions, exhibit few differences in characteristics traditionally used in classification, such as wing pattern and/or aspects of the male and female genitalia [17].At the same time, these species vary greatly in their karyotypes, with the haploid chromosome numbers (n) ranging from n = 10 to n = 134 [18].Therefore, it is not surprising that most of the work on the taxonomy of the subgenus is based on chromosomal data, and in recent years, with the involvement of data from molecular phylogenetics [17,18,20,21].
This subgenus has been studied relatively well with respect to molecular markers, and for many species, multi-locus molecular data are available, including such genes as mitochondrial COI, tRNA-leu, COII, cytochrome b and NADH dehydrogenase sequences and nuclear 5.8S rDNA, ITS2, 28S rDNA and EF1-α sequences [4,17,18,27,[29][30][31].At the same time, for many taxa, especially for rare species from Turkey, Iran, Pakistan and Afghanistan, only mitochondrial DNA barcodes are available [32], or the molecular data are absent.
In this work, we (1) obtain and analyze standard mitochondrial DNA barcodes for five deviated and most enigmatic taxa of the subgenus Agrodiaetus: P. muellerae Eckweiler, 1997 from Pakistan, P. afghanicus (Forster, 1973) and P. frauvartianae Balint, 1997 from Afghanistan, P. bogra Evans, 1932 from Afghanistan and Iran, and P. anticarmon (Koçak, 1983) (=charmeuxi Pages, 1984) from SE Turkey; (2) analyze the partitioning of the subgenus Agrodiaetus into species groups by using three data sets.The first data set is represented by short mitochondrial DNA barcodes for all analyzed samples.The second and third data sets are represented by a combination of short mitochondrial DNA barcodes for part of the taxa with longer mitochondrial sequences COI + tRNA-Leu + COII (data set 2) and with longer mitochondrial COI + tRNA-Leu + COII and nuclear 5.8S rDNA + ITS2 + 28S rDNA sequences (data set 3) for the remaining species; (3) show that the DNA barcoding approach (data set 1) failed to reveal the phylogenetic structure of the subgenus, whereas a combined analysis of the mitochondrial and nuclear sequences (data sets 2 and 3) revealed the species groups and the position within these species groups, even for taxa for which only mitochondrial sequences were available; (4) provide a list of the species groups of the subgenus Agrodiaetus; and (5) discuss the status and taxonomic position of P. muellerae, P. afghanicus, P. frauvartianae, P. bogra and P. anticarmon.

Materials and Methods
Standard mitochondrial DNA barcodes (658 bp fragments of the cytochrome c oxidase subunit I gene) were obtained for five samples of P. afghanicus, one sample of P. anticarmon (=charmeuxi), six samples of P. bogra, seven samples of P. frauvartianae and one sample of P. muellerae (Table 1).The specimens (except for samples BPAL2125-BPAL2128) were processed at the Department of Karyosystematics of the Zoological Institute of the Russian Academy of Sciences.DNA extraction from a single leg removed from each specimen was performed using the QIAamp DNA Investigator Kit (Qiagen, Venlo, The Netherlands) according to the manufacturer's protocol.Since standard lepidopteran barcode primers [7] failed to amplify a sufficient product, two self-designed forward primers (Nz_COI_b-TAC AAT TTA TCG CTT ATA AACTCA; DRD4F-TAGAAAATGGAGCAGGAA) and two reverse primers (MH-MR1 [33] and Nancy [34]) were used for DNA amplification and resulted in a 671 bp fragment of the mitochondrial cytochrome oxidase I gene (COI).The PCR amplifications were performed in a 50 µL reaction volume containing ca. 10-20 ng genomic DNA and 0.5 mM each of forward and reverse primer, 1 mM dNTPs, 10× PCR Buffer (0.01 mM Tris-HCl, 0.05 M KCl, 0.1% Triton X-100: pH 9.0), 1 unit Taq DNA Polymerase (Thermo Fisher Scientifics, Waltham, MA, USA), 5 mM MgCl 2 .The temperature profile was as follows: initial denaturation at 94 • C for 1 min, followed by 30 cycles of denaturation at 94 • C for 45 s, annealing at 50 • C for 45 s and extension at 72 • C for 1 min, with a final extension at 72 • C for 10 min.Amplified fragments were purified using GeneJET Gel Extraction Kit (Thermo Fisher Scientifics, Waltham, MA, USA).Purification was carried out according to the manufacturer's protocol.The success of PCR amplification and purification was evaluated through electrophoresis of the products in 1% agarose gel.The purified PCR product was used for direct sequencing.Sequencing of the double stranded product was carried out at the Research Resource Center for Molecular and Cell Technologies (St.Petersburg State University).Samples BPAL2125-BPAL2128 were processed at the Canadian Centre for DNA Barcoding (CCDB, Biodiversity Institute of Ontario, University of Guelph), using their standard high-throughput protocol described by deWaard et al. [35].
A comparison of the obtained COI barcodes revealed 11 unique haplotypes within the five species studied (Table 1).
For the other 130 species and well-differentiated subspecies of the subgenus Agrodiaetus, all available sequences of the mitochondrial (COI, leu-tRNA complete and COII partial) and nuclear (5.8S rDNA partial gene, ITS2 complete and 28S rDNA partial) loci were extracted from GenBank (see Supplementary Material S1 for the GenBank numbers).The sequences of each locus (gene) were aligned separately by using the ClustalW algorithm, and then, the alignments were checked and corrected manually using BioEdit [36].Since within Agrodiaetus, the previous phylogenetic analyses of the nuclear sequences 5.8S rDNA + ITS2 + 28S rDNA recovered clades, which were mostly congruent with those obtained from analyses of the mitochondrial genes COI + COII [25], the nuclear and mitochondrial sequences were concatenated for subsequent phylogenetic study.This concatenation was then combined with the 11 unique haplotypes revealed within the five species studied (Table 1).The representatives of the closely related subgenus Polyommatus (P.icarus Rottemburg, 1775, P. erotides Staudinger, 1892 and P. hunza Grum-Grshimailo, 1890) [4] were also included in the analysis.To root the tree, we used representatives of a more distant genus Lysandra (L.bellargus, L. punctifera and L. coridon) [4].The final matrix consisted of 147 taxa, of which 141 taxa represented our target species, and 6 taxa represented the outgroup.For 87 of these 147 taxa, the matrix contained data for both mitochondrial and nuclear genes.For 60 of these 147 taxa, only mitochondrial gene(s) were available.
Three data sets were prepared from the final concatenated alignment.In data set 1, for all 147 samples studied, only short COI barcodes were presented.In data set 2, for 13 samples (shown with a red asterisk (*) in Figures 1-3), only short COI barcodes were available, and for the remaining 134 samples, the longer mitochondrial sequence COI + tRNA-Leu + COII was presented.In data set 3, the mitochondrial matrix (data set 2) was supplemented with 5.8S rDNA + ITS2 + 28S rDNA nuclear sequences for 87 samples.Substitution models were inferred separately for each gene (locus) using jModeltest, version 2 [37].Bayesian analysis was conducted using MrBayes 3.2 [38] on four molecular (COI, COII, leu-tRNA and 5.8S rDNA + ITS2 + 28S rDNA genes) and one "standard" (binary) partitions using 20,000,000 generations.The command blocks for the first, second and third data sets are presented in Supplementary Material S2.

Results
The analysis of the DNA barcodes (Figure 1) did not reveal the structure of the subgenus Agrodiaetus.Only 33 statistically supported clades (posterior probability >= 0.9) were recovered, and the position of the majority of species-particularly of our target taxa P. muellerae, P. afghanicus, P. frauvartianae, P. bogra and P. anticarmon-was unresolved.This was manifested in the facts that (1) the relationships with sister species were not identified (P.muellerae); (2) the sister relationships had low support (P. afghanicus, P. frauvartianae, P. bogra); and (3) it was not clear which species groups the target species belonged to (P. muellerae and P. afghanicus).
The combined analysis of mitochondrial (Figure 2) and mitochondrial + nuclear sequences (Figure 3) resulted in a significant increase in the resolution of the phylogenetic tree, with 53 highly supported clades (posterior probability >= 0.9) for data set 2 and with 65 highly supported clades (posterior probability >= 0.9) for data set 3. Thus, adding additional mitochondrial and nuclear loci resulted in an increased number of highly supported clades, which was not unexpected.A more interesting observation is that this approach resulted in increased support and tree position detection for those branches for which additional mitochondrial and nuclear data were not available (shown with red asterisks on the trees).
It was established that the taxon P. frauvartianae was included in the same clade together with the species P. faramarzii Skala 2001, P. shahrami Skala, 2002 and P. achaemenes Skala, 2002 (Figure 3), although this relationship had extremely low support (0.54) on the DNA barcode tree (Figure 1).It was shown that P. anticarmon was not only a sister species to P. turcicus (Koçak, 1977) (Figure 1) but that both of these taxa were members of the same clade together with P. iphigenia (Herrich-Schäffer, 1847) (Figure 3).The position of the taxon P. australorossicus Lukhtanov and Dantchenko, 2017 on the DNA barcode tree (Figure 1) was unclear due to low support.On the combined tree (Figures 2 and 3), this species was placed with high support in the same clade along with P. hamadanensis (de Lesse, 1959).
Our analysis revealed 11 major lineages, shown with different colors (Figure 3).Two lineages were represented by a single species.Seven lineages were highly supported (posterior probability value from 0.91 to 1).The target species P. afghanicus (Figure 4A-D) appeared as a lineage distantly related to the lineage P. antidolus-P.iphidamon (species group 8); however, the sister relationship between them was weakly supported.Polyommatus muellerae (Figure 4E-H) appeared as a distinct lineage (species group 1).The target species P. frauvartianae (Figure 4I,J 12), P. bogra (Figure 4M-P) and P. anticarmon (Figure 4K-L) appeared as members of species groups 5 and 4.

Discussion
The methodology proposed by Talavera et al. [8] allowed us to identify the positions on the phylogenetic tree for the rare south Central Asian taxa, for which the molecular data were available only in the form of short DNA barcodes.The data obtained represent an empirical test of the previously proposed methodology [8] and provide the opportunity to discuss in more detail the taxonomy of the species studied.
Polyommatus frauvartianae from Afghanistan was described as a distinct species by Balint [39], but then, on the basis of external morphological similarity, it was interpreted as a subspecies of the Iranian-Turkmen species P. glaucias (Lederer, 1871) [19].Our data unequivocally show that this is an independent species, phylogenetically distant from P. glaucias, but undoubtedly having a relationship with P. faramarzii, P. shahrami and P. achaemenes, endemics of the Zagros Mts in Iran.Of the last three species, P. frauvartianae is well distinguished by the brown (not blue) coloration of the upper side of the wings in males.Thus, these data shed light on the origin of the enigmatic Iranian taxa P. faramarzii, P. shahrami and P. achaemenes, which, having dot-like distribution ranges in the Zagros, did not show close relationships with any other species of the subgenus Agrodiaetus.The new data show that the four species P. faramarzii, P. shahrami, P. achaemenes and P. frauvartianae are members of the same phylogenetic sub-lineage, spread over a wide area from western Iran to central Afghanistan.

Discussion
The methodology proposed by Talavera et al. [8] allowed us to identify the positions on the phylogenetic tree for the rare south Central Asian taxa, for which the molecular data were available only in the form of short DNA barcodes.The data obtained represent an empirical test of the previously proposed methodology [8] and provide the opportunity to discuss in more detail the taxonomy of the species studied.
Polyommatus frauvartianae from Afghanistan was described as a distinct species by Balint [39], but then, on the basis of external morphological similarity, it was interpreted as a subspecies of the Iranian-Turkmen species P. glaucias (Lederer, 1871) [19].Our data unequivocally show that this is an independent species, phylogenetically distant from P. glaucias, but undoubtedly having a relationship with P. faramarzii, P. shahrami and P. achaemenes, endemics of the Zagros Mts in Iran.Of the last three species, P. frauvartianae is well distinguished by the brown (not blue) coloration of the upper side of the wings in males.Thus, these data shed light on the origin of the enigmatic Iranian taxa P. faramarzii, P. shahrami and P. achaemenes, which, having dot-like distribution ranges in the Zagros, did not show close relationships with any other species of the subgenus Agrodiaetus.The new data show that the four species P. faramarzii, P. shahrami, P. achaemenes and P. frauvartianae are members of the same phylogenetic sub-lineage, spread over a wide area from western Iran to central Afghanistan.
The taxon P. anticarmon is also a subject of taxonomic debates [19].Butterflies of this taxon inhabiting SE Turkey are similar in appearance to P. turcicus from NE Turkey and Armenia, differing in larger size.In addition, there is a difference in ecological preferences between P. turcicus and P. anticarmon: while P. turcicus is an alpine species, P. anticarmon is found at relatively low altitudes.Our data show that P. anticarmon is indeed closely related to P. turcicus.According to Eckweiler and Bozano [19], the taxon P. charmeuxi described from SE Turkey is a synonym of P. anticarmon.
In addition, the phylogenetic analysis conducted allows us to discuss another very controversial issue of Agrodiaetus taxonomy, namely the division of the subgenus into groups of species.In most cases, species delimitation and recognition of monophyletic species groups within Agrodiaetus are difficult because of the low number of differentiated morphological characteristics.The morphology of male genitalia is uniform throughout most of the species, with a few exceptions [19,40,41].Some Agrodiaetus species show considerable variability in male wing color, both in visible and ultraviolet wavelength ranges [29].Despite this variation, it is difficult to use this characteristic for phylogenetic purposes, since in the great majority of species, it is represented by a plesiomorphic state (blue color), and the derived states are found mostly as unique apomorphies characterizing single species but not species groups [29].
The same can be said of chromosomal characteristics.Although the chromosome numbers in Polyommatus (Agrodiaetus) possess a strong phylogenetic signal [18], generally, the karyotypes are extremely variable on an inter-specific level, and they are found mostly as unique apomorphies characterizing single species but not species groups [18,20,21,30,[42][43][44][45].
de Lesse [20,21] divided Agrodiaetus into three species complexes based on male coloration and the presence/absence of well-developed tufts formed by androconial scales.In the classification by Hesselbarth et al. [46], Agrodiaetus was divided into eight species groups (actis, admetus, carmon, damon, damone, dolus, poseidon and transcaspicus) named after their oldest members.Eckweiler and Häuser [22] recognized the admetus and dolus groups but argued that the available evidence was too weak to support the remaining groups.Instead, they erected a more inclusive damon group, which combined the membership of Hesselbarth et al.'s [46] actis, carmon, damon, damone and transcaspicus groups with some species from the poseidon group.The remainder of the poseidon group was renamed as the dama group, and three additional species groups-the dagmara, erschoffii (= Paragrodiaetus Rose and Schurian, 1977) and iphigenides groups-were erected to accommodate species from eastern Iran and central Asia, which had not been considered by Hesselbarth et al. [46].
Here, using the analysis of additional taxa and additional molecular markers, we demonstrate that the subgenus Agrodiaetus consists of 11 lineages and 135 species (Appendix A).In particular, we also show that the enigmatic taxon P. (A.) muellerae from Pakistan represents a distinct evolutionary lineage and cannot be included in the previously recognized species groups.
Koçak and Kemal [47] divided Agrodiaetus into 13 subsections and proposed the following names for these subsections: Actisia, Admetusia, Antidolus, Dagmara, Damaia, Juldus, Musa, Paragrodiaetus, Peileia, Phyllisia, Transcaspius, Xerxesia and Agrodiaetus s.str.Here, we demonstrate that these subsections do not reflect the evolutionary and taxonomic structure revealed using molecular markers (see the list below).Three species groups discovered in our study (erschoffii, damocles and carmon) are represented by two (erschoffii, damocles groups) and even by four (carmon group) names from the list proposed by Koçak and Kemal [47], whereas five species groups have no names, and only five names are unambiguously associated with the species groups (one name corresponds to one species group).According to the Code of Zoological Nomenclature (ICZN 10.4) [48], "a uninominal name proposed for a genus-group division of a genus, even if proposed for a secondary (or further) subdivision, is deemed to be a subgeneric name even if the division is denoted by a term such as 'section' or 'division'".Thus, the names proposed by Koçak and Kemal [47] should be considered subgeneric names and therefore subjective junior synonyms of Agrodiaetus, since the subgeneric rank of Agrodiaetus is well founded [4].
Finally, we should note that although the approach used [8] markedly improved the resolution of the phylogenetic tree, some clades-in particular the carmon and magnificus species groups-have low support.Obviously, further studies based on the analysis of more genes are needed in order to obtain a fully resolved phylogeny of the subgenus Agrodiaetus.

Conclusions
1. We show that the DNA barcoding approach failed to reveal the phylogenetic structure of the subgenus Agrodiaetus, whereas a combined analysis of the mitochondrial and nuclear sequences revealed the species groups and the position within these species groups, even for taxa for which only short DNA barcodes were available.
2. The Afghani taxon Polyommatus frauvartianae is a distinct species, most closely related to west Iranian endemics P. faramarzii, P. shahrami and P. achaemenes.
5. The enigmatic Pakistani taxon P. muellerae represents a distinct evolutionary lineage and cannot be included in previously recognized species groups.

Figure 1 .
Figure 1.The Bayesian tree of Agrodiaetus species based on analysis of the short mitochondrial COI barcodes.Numbers at nodes indicate Bayesian posterior probabilities (BPP) (higher than 0.5).Red asterisks indicate samples for which longer mitochondrial and/or nuclear sequences were unavailable.The subgenus Polyommatus sensu stricto was found to be a statistically supported (BPP = 1) clade, sister to Agrodiaetus (not shown).The genus Lysandra (not shown) was used to root the tree.

Figure 1 .
Figure 1.The Bayesian tree of Agrodiaetus species based on analysis of the short mitochondrial COI barcodes.Numbers at nodes indicate Bayesian posterior probabilities (BPP) (higher than 0.5).Red asterisks indicate samples for which longer mitochondrial and/or nuclear sequences were unavailable.The subgenus Polyommatus sensu stricto was found to be a statistically supported (BPP = 1) clade, sister to Agrodiaetus (not shown).The genus Lysandra (not shown) was used to root the tree.

Figure 2 .
Figure 2. The Bayesian tree of Agrodiaetus species based on combined analysis of the mitochondrial COI+tRNA-Leu+COII sequences.Numbers at nodes indicate Bayesian posterior probability higher than 0.5.Red asterisks indicate species for which only short DNA barcodes were available.The subgenus Polyommatus sensu stricto was found to be a statistically supported (BPP = 1) clade, sister to Agrodiaetus (not shown).The genus Lysandra (not shown) was used to root the tree.

Figure 2 .
Figure 2. The Bayesian tree of Agrodiaetus species based on combined analysis of the mitochondrial COI + tRNA-Leu + COII sequences.Numbers at nodes indicate Bayesian posterior probability higher than 0.5.Red asterisks indicate species for which only short DNA barcodes were available.The subgenus Polyommatus sensu stricto was found to be a statistically supported (BPP = 1) clade, sister to Agrodiaetus (not shown).The genus Lysandra (not shown) was used to root the tree.

Figure 3 .
Figure 3.The Bayesian tree of Agrodiaetus species based on combined analysis of the mitochondrial COI+tRNA-Leu+COII and nuclear 5.8S rDNA + ITS2 +28S rDNA sequences.Numbers at nodes indicate Bayesian posterior probabilities higher than 0.5.Red asterisks indicate species for which only short DNA barcodes were available.The species groups are denoted as 1-11.The subgenus Polyommatus sensu stricto was found to be a statistically supported (BPP = 1) clade, sister to Agrodiaetus (not shown).The genus Lysandra (not shown) was used to root the tree.

Figure 3 .
Figure 3.The Bayesian tree of Agrodiaetus species based on combined analysis of the mitochondrial COI + tRNA-Leu + COII and nuclear 5.8S rDNA + ITS2 + 28S rDNA sequences.Numbers at nodes indicate Bayesian posterior probabilities higher than 0.5.Red asterisks indicate species for which only short DNA barcodes were available.The species groups are denoted as 1-11.The subgenus Polyommatus sensu stricto was found to be a statistically supported (BPP = 1) clade, sister to Agrodiaetus (not shown).The genus Lysandra (not shown) was used to root the tree.

Table 1 .
List of the studied samples and obtained COI sequences.