DNA Barcoding for Scorpion Species from New Valley Governorate in Egypt Reveals Different Degrees of Cryptic Speciation and Species Misnaming

: (1) Background: Scorpions (Arthropoda: Arachnida) represent a diverse group of inver-tebrates, accounting for a signiﬁcant proportion of earth’s predators and ecosystems’ modulators. Surviving mostly in hardly reachable nests, and representing key hazards to human health, they attracted major interest for characterizing their eco-, morpho-, and genotypes. (2) Methods: Four scorpion species were collected from the New Valley governorate in Upper Egypt, where a high level of scorpionism and related neurological symptoms are found, that were Leiurus quinquestriatus , Androctonus amoreuxi , Orthochirus innesi , Buthacus leptochelys. They were DNA barcoded, genetically and phylogenetically analyzed through PCR ampliﬁcation and sequencing of the mitochondrial cytochrome oxidase subunit 1 (COI) gene hypervariable 5 (cid:48) region. (3) Results: New, morphologically authenticated scorpion barcodes could be added to the barcoding databases. However, several discrepancies and barcode database inadequacies could be revealed. Moreover, taxon-speciﬁc patterns for nitrogenous bases’ distribution could be identiﬁed, resulting in a signiﬁcantly high percentage of COI barcode guanine in scorpionids, in comparison to araneids and opilions. (4) Conclusions: For a group of animals where both cryptic speciation and a high risk of human envenomation are evident, the ﬁndings of the current study strongly recommend continuous and comprehensive research efforts dealing with morphogenetic authentication for different species of scorpions.


Introduction
Scorpions are arachnid arthropods of key environmental and medical importance. They constitute one of the oldest animal groups in the world, with a fossil history dated to more than 300 million years ago. They inhabit mainly hot and dry environments, where they represent the most important taxa of terrestrial predators in terms of density and biomass [1]. Some early estimates for their biomass identified that they exceed most key terrestrial arthropod taxa, except ants and termites [2]. They control major ecosystem processes such as community structure and function via predation and feeding competition [1,3]. Since scorpions are predators of small arthropods and feed infrequently across multi-year lifespans, their high biomass could be attributed to the depressed metabolic rate [4]. They exhibit long life spans, (2-25 years, with a mean of 4-8 years), late maturation period (6 months to 7 years), long gestation period (1.5 months to 2 years), and a characteristic pattern of maternal care of young [5]. Moreover, scorpions are tolerant to extreme environmental conditions such as ionizing radiation, e.g., some species of the genus Androctonus tolerate levels of 400-800 grays, i.e., more than 100 times the dose that causes human blood and digestive problems; extreme temperatures, e.g., 40-50 • C, as well as drought, food scarcity, and infections [5]. However, these capabilities are species-specific. Their distribution is associated with areas of climatic, topographic, and geological complexity [6]. At local scales, scorpions' distribution is governed mainly by temperature, precipitation, substrate (soil hardness and texture; amount of stone or litter cover) and vegetation physiognomy [7,8]. All scorpions have a venomous sting and several thousands of people die every year from scorpion stings [9].
The scorpion fauna of Egypt is represented by four families: Buthidae, Euscorpiidae, Hemiscorpiidae and Scorpionidae [10]. Valuable, recent knowledge about scorpion fauna of Egypt and the region has been compiled, reviewed and edited by several authors (e.g., [10][11][12]. Moreover, numerous works were carried out to elucidate composition, ecology and biogeography of Middle Eastern scorpions, including Egypt (for examples, see [13][14][15][16][17]). These works pointed to the presence of a total of 35 scorpion species in Egypt, all of which were listed in [15,16].
The New Valley governorate in Egypt is the largest Egyptian governorate, and the one with the most extended desert areas. The state of genetic diversity of scorpions in this governorate, however, is still understudied. Further, extensive sampling of this biodiverse region is needed to recover the complete genetic phylogeographic pattern of scorpions in this area.
DNA barcoding is a rapidly expanding protocol for molecular identification of different animal species from different environments, even the ones with highly similar morphologies, or with variable degrees of integrity. The 5', inter-specifically hypervariable region in the barcode of life gene, i.e., the mitochondrial cytochrome oxidase subunit 1 (COI) gene, is typically included in barcoding studies as a global bio-identification system [18,19]. The efficiency of DNA barcoding is almost complete upon coupling it to a thorough morphological description of the species [20]. The present authors previously provided full morphological descriptions of four major scorpion species in the New Valley, Egypt: Leiurus quinquestriatus, Androctonus amoreuxi, Orthochirus innesi, Buthacus leptochelys [21,22]. The aim of the current study was to complete the morphogenetic authentication of these species using DNA barcoding, a task that seems crucial since some of these species are the most notorious in terms of scorpionism, especially in the New Valley governorate, and in Egypt in general [12,23]. However, there are still many deficiencies regarding the DNA barcoding and phylogenetic relationships of these species, which create many serious issues regarding the appropriate use of DNA barcoding databases for scorpion species identification.

Sample Collection and Preservation
Three to five samples from four different scorpion species were collected by professional hunters from El Kharga oasis (25. Figure 1). These samples belonged to four species: Leiurus quinquestriatus, Androctonus amoreuxi, Orthochirus Innesi, Buthacus leptochelys ( Figure 2). The samples were preserved in absolute ethanol, then transferred to the Molecular Biology and Biotechnology Laboratory in the Faculty of Science of Menoufia University (Shebeen El-Kom City, Egypt). About 100 mg of tail musculature were removed from each specimen and stored in absolute ethyl alcohol in a −20 • C freezer until being used in DNA extraction.

DNA Purification and COI-Based Polymerase Chain (PCR)
Total genomic DNA was purified from 30 mg of tail musculature of scorpions using the method described in Mohammed-Geba et al., (2016) [24]. Briefly, the samples were lysed individually using 200 µL of TNES-urea buffer [25] and 2.4 U mL −1 Proteinase K solution (Thermo Fischer Scientific, Waltham, MA, USA), with incubation at 55 • C for 30 min. Later on, 54 µL of 5 M NaCl was added, the tubes were thoroughly mixed by inversion, then centrifuged at 4000× g for 10 min. The supernatant from each sample was transferred to another 1.5 mL Eppendorf tube, and the DNA was then precipitated by adding 200 µL of cold isopropanol (at −20 • C) with shacking by inversion. The tubes were centrifuged at 11,000× g for 10 min, and the supernatant was completely removed. The DNA pellet was washed by 400 µL of 70% ethanol, centrifuged for 5 min at 11,000× g and poured completely from ethanol, then 30 µL of tris EDTA buffer (pH 8) was added for DNA pellet resuspension. DNA quality was checked by running 5 µL of the genomic DNA with 1 µL of 6 × DNA loading buffer (0.25% w/v bromophenol blue, 40% w/v sucrose), in 1% agarose gels stained by 0.5 µg mL-1 ethidium bromide (Thermo Fisher Scientific). DNA samples were used directly for amplification of partial barcode region of the COI gene by PCR. A 658 bp target from the 5 end of COI was amplified using the primer pair LCO1490(5 -GGTCAACAAATCATAAAGATATTGG-3 ) and HCO2198(5 -TAAACTTCAGGGTGACCAAAAAATCA-3 ) [26]. The amplification reactions were performed in a total volume of 25 µL. The reaction mixture consisted of 2 µL of template DNA (~50 ng), 0.5 µM of each primer, 25 µL of 2 × of My Taq red master mix (Bioline, London, UK), and completed to 50 µL with PCR-grade water. PCR amplifications were carried out in the thermal cycler TC512 (Techne, Stone, UK). The PCR program contained an initial denaturation step at 94.6 • C for 10 min, followed by 40 cycles of 94 • C for 1 m, 46 • C for 1 min, 72 • C for 1 min, and a final extension of 72 • C for 10 min. PCR products were electrophoresed in a 1% agarose gel. Adequately sized PCR products were sent to Macrogen Inc. (Seoul, South Korea) for sequencing.

Genetic and Phylogenetic Analyses for New Valley Scorpions
After careful revision of the obtained sequences to assure correct base recall, these sequences were translated into the primary amino acids sequence using MEGA7 software, as a way to assure that there were no premature stop codons in the obtained COI sequences, that usually mark the amplification of false nuclear copies of mitochondrial genes (NuMTs) [27]. Then, each sequence was individually compared to the GenBank database using BLAST (http://blast.ncbi.nlm.nih.gov/Blast.cgi (accessed on 25 June 2021)), and to the Barcode of Life Database (BOLD) using the BOLD Identification System IDS (http://www.boldsystems.org/index.php/IDS_OpenIdEngine (accessed on 25 June 2021)). In order to calculate the pairwise distances, and to construct the phylogenetic tree, COI sequences belonging to the same and closely related scorpion species (Table 1) were retrieved from the GenBank database. All sequences were aligned using CLUSTAL W Multiple alignment implemented in MEGA7 software. A prior identification of the best nucleotide substitution model was carried out using J Model Test software V. 2.1.10 [28]. This alignment, with the determined substitution model, was applied to calculate the genetic pairwise distances. Later, the alignment was uploaded to Mr Bayes 3.2.1 software [29] for constructing a Bayesian inference (BI)-based phylogenetic tree. Next, four Markov chain Monte Carlo (MCMC) chains were analyzed for 10 million generations, saving a tree each 1000 generations. The subsequent analyses were carried out after assuring an average standard deviation of split frequencies below 0.001. The number of burn-ins was identified using Tracer 1.7 [30]. Tracer 1.7 exhibited that 25% of the saved trees are to be discarded as burnins. This information was transferred to Mr Bayes 3.2.1. for constructing the summarized tree, which was then viewed using the Interactive Tree of Life online platform (iTOL: [31]). Finally, nucleotide compositions for the sequenced COI fragments were analyzed to test whether there are taxonomic trends among different arachnids. . These sequences were appended to the same alignment previously produced using MEGA7 for phylogenetic analysis. Then, MEGA7 software was also applied for calculation of percentages of composite nucleotides in the analyzed scorpions, aranean, and opilion species. These percentages were statistically analyzed among the three orders and were compared using one-way analysis of variance (ANOVA), applying LSD as post-hoc. Differences were considered significant at p < 0.05. All statistical analyses were carried out using Statgraphics Centurion XVI software.

Results
PCR amplification for the barcode region of the mitochondrial COI gene resulted in a 650 bp specific band in each of the assessed species, without any apparent length variation ( Figure 3). Sequencing of the produced PCR amplicons from all the four species' samples resulted in good quality sequence chromatograms ( Figure 4). Manual trimming of these sequences resulted in about 600 good quality nucleotide peaks in all the assessed species. In silico translation of the sequences using the invertebrates' mitochondrial DNA codons table resulted in a complete absence of premature stop codons, i.e., no NUMTs were amplified instead of the expected COI sequences. As all sequences from each species belonged to a single haplotype, the haplotype sequence of each species was deposited to GenBank database under the accession numbers MZ669859 for L. quinquestriatus MZ669860 for A. amoreuxi; MZ669861 for O. innesi; and MZ669862 for B. leptochelys. This latter one was the first barcode for this species ever in the GenBank database.

Results
PCR amplification for the barcode region of the mitochondrial COI gene resulted in a 650 bp specific band in each of the assessed species, without any apparent length variation ( Figure 3). Sequencing of the produced PCR amplicons from all the four species´ samples resulted in good quality sequence chromatograms ( Figure 4). Manual trimming of these sequences resulted in about 600 good quality nucleotide peaks in all the assessed species. In silico translation of the sequences using the invertebrates' mitochondrial DNA codons table resulted in a complete absence of premature stop codons, i.e., no NUMTs were amplified instead of the expected COI sequences. As all sequences from each species belonged to a single haplotype, the haplotype sequence of each species was deposited to GenBank database under the accession numbers MZ669859 for L. quinquestriatus MZ669860 for A. amoreuxi; MZ669861 for O. innesi; and MZ669862 for B. leptochelys. This latter one was the first barcode for this species ever in the GenBank database.   BLAST and BOLD comparisons exhibited the absence of accurate species-level barcodes for two of the four assessed species. Moreover, some discrepancies were found for the species nomenclature available in the GenBank. Leiurus quinquestriatus showed homogenous species identify levels (i.e., 98-100%) with samples for the same species that were previously collected from the South of Egypt and the North of Sudan and deposited into the GenBank under the accession numbers (acc. No.) KX648420, KX648421.1, and JQ514258.1. However, unexpected high identity level of COI sequence (99.40%) was found between L. quinquestriatus barcoded in the current study and a sample of Buthacus arenicola from Egypt (acc. No. MT636861.1).
For O. innesi, the closest similarity was 91% with a sample from the same species with acc. No. JQ514244.1, that was collected from a non-native pet shop. Buthacus leptochelys showed 99.6% sequence identity with Buthacus sp. specimen that was previously isolated from Egypt, DNA barcoded and registered in GenBank under the accession number KF548116. The closest species (90%) was with Buthacus macrocentrus, with acc. No. MT229838.1. BLAST and BOLD comparisons exhibited the absence of accurate species-level barcodes for two of the four assessed species. Moreover, some discrepancies were found for the species nomenclature available in the GenBank. Leiurus quinquestriatus showed homogenous species identify levels (i.e., 98-100%) with samples for the same species that were previously collected from the South of Egypt and the North of Sudan and deposited into the GenBank under the accession numbers (acc. No.) KX648420, KX648421.1, and JQ514258.1. However, unexpected high identity level of COI sequence (99.40%) was found between L. quinquestriatus barcoded in the current study and a sample of Buthacus arenicola from Egypt (acc. No. MT636861.1).
Androctonus amoreuxi collected in the current study showed 92.5-100% sequence identity with samples from the same species that were collected from different areas in Morocco. An increasing tendency for sequence similarity could be identified when the sample was collected more towards the North of Morocco, i.e., 92.51% similarity, with the sample whose acc. No. was KJ538282.1 and Latitude-Longitude (lat_lon) = "28.25 • N 9.33 • W"; 92.71% similarity with KJ538294 coming from lat_lon = " 28 The Phylogenetic analysis was concordant with the barcoding databases comparisons' results. In most cases, the scorpion samples analyzed in the current study existed in the same clades with their BOLD/GenBank references. However, there were several unexpected contradictions in the phylogenetic tree. First, A. amoreuxi appeared to be split into two completely separate clades, one of which was closer to the clade encompassing L. quinquestriatus and O. innesi samples and references, and the other one that was completely independent and encompassing the samples and the references for this species from Moroccoo, Tunisia and Algeria ( Figure 5). Second, the clade that encompassed O. innesi was more closely related to one of the two L. quinquestriatus clades, which also encompassed the sampled L. quinquestriatus in the current study, while another L. quinquestriatus clade appeared to be diverged from the first one ( Figure 5). Genetic pairwise distances agreed with these intraspecific phylogenetic divergences, being the distances ranged from 0 to 0.21 for L. quinquestriatus; 0 to 0.12 for A. amoreuxi; and 0 to 0.17 for Buthacus sp. (Supplementary Table S1).  Nitrogenous bases' composition for the barcode DNA fragment analyzed from the COI gene varied slightly among the four species, yet they were not the same among any of them. The order of abundance of these bases was fixed in all species, thymine (T) being the most abundant, followed by guanine (G), then fewer amounts of adenine (A) and cytosine (C). The percentages for T, G, A, C were 42.5%, 25 The average percentages in the scorpionids covered in the current study and these whose sequences are available in the GenBank database were 43% T, 25% G, 17% of A, and 14% for C. These percentages were similar among different scorpion species that were covered by the phylogenetic analysis in the current study. However, they differed in comparison to the other free living arachnid orders ( Figure 6, Table 2). The average nucleotide composition for the same COI fragment in different araneids were 41% T, 21% for both A and G, and 16% for C. The opilion's nucleotide composition percentages were 41% T, 26% A, 17% of C, and 16% for G ( Table 2). Pairwise comparisons, using Student's t-test, exhibited that the differences in percentage of purines (A and G) were highly significant (p = 0.0) between scorpionids and araneids. The percentage of A was higher in scorpionids than in araneids, while A was lower in scorpionids than in araneids. Cytosine was significantly lower in scorpionids than in arnaeids (p = 0.04). Thymine percentages did not vary significantly between the two groups ( Table 2). ANOVA exhibited that pyrimidines (T, C) did not vary significantly among scorpionids, araneids, and opiliones. However, purines (A, G) were significantly different among the three arachnid groups (p = 0.0). Guanine in scorpionids was the highest among the three orders, while adenine was the least abundant among the three of them.
Nitrogenous bases' composition for the barcode DNA fragment analyzed from the COI gene varied slightly among the four species, yet they were not the same among any of them. The order of abundance of these bases was fixed in all species, thymine (T) being the most abundant, followed by guanine (G), then fewer amounts of adenine (A) and cytosine (C). The percentages for T, G, A, C were 42.5%, 25. 9%, 17.2%, and 14.4%, for L. quinquestriatus; 43.1%, 25.6%, 16.9%, and 14.4%, for A. amoreuxi; 42.8%, 24.1%, 19.1%, and 14.1% for O. innesi and 43.8%, 25%, 16.6%, and 14.7% for B. leptochelys, respectively. The average percentages in the scorpionids covered in the current study and these whose sequences are available in the GenBank database were 43% T, 25% G, 17% of A, and 14% for C. These percentages were similar among different scorpion species that were covered by the phylogenetic analysis in the current study. However, they differed in comparison to the other free living arachnid orders ( Figure 6, Table 2). The average nucleotide composition for the same COI fragment in different araneids were 41% T, 21% for both A and G, and 16% for C. The opilion's nucleotide composition percentages were 41% T, 26% A, 17% of C, and 16% for G ( Table 2). Pairwise comparisons, using Student's t-test, exhibited that the differences in percentage of purines (A and G) were highly significant (p = 0.0) between scorpionids and araneids. The percentage of A was higher in scorpionids than in araneids, while A was lower in scorpionids than in araneids. Cytosine was significantly lower in scorpionids than in arnaeids (p = 0.04). Thymine percentages did not vary significantly between the two groups ( Table 2). ANOVA exhibited that pyrimidines (T, C) did not vary significantly among scorpionids, araneids, and opiliones. However, purines (A, G) were significantly different among the three arachnid groups (p = 0.0). Guanine in scorpionids was the highest among the three orders, while adenine was the least abundant among the three of them.

Discussion
The current work provided the molecular pillar for morphogenetic authentication of four scorpion species, i.e., L. quinquestriatus, A. amoreuxi, O. innesi, and B. leptochelys. This work came next to previous, detailed and descriptive analyses that were provided by our research group for elucidating the main morphological aspects for these species [22,23]. Herein, it was possible to provide new DNA barcodes for these four species from the New Valley governorate. This governorate is one of the key Egyptian locations where abundant diversity of scorpions is present. Additionally, it suffers from elevated levels of scorpionism [32,33].
Scorpionism in Upper Egypt is the cause of many cases of human systemic symptoms, among which the reported prevalence of neurological manifestation after envenomation was about 78-85% [33,34]. Androctonus and Leiurus, that were subjected to DNA barcoding in the current study, and morphological description in the previous ones, are among few representative scorpion genera that represent a real hazard for human health [33,35,36]. Internationally, annual cases of more than 1.2 million stings are reported, of which Middle Eastern and North African scorpion stings account for 42% of the total cases [9]. Such percentages of scorpionism raised the attention towards the fundamentality of providing accurate clues for scorpion species identification.
However, genetic and morphological discrimination of scorpion species are not only limited to toxicological response, but also a major component of interest is diverted to species-specific toxins' structures and variations, ecological niche delimitation, conservation, and evolution. DNA barcoding and molecular identification of species could resolve some problems related to the taxonomy of scorpions. Several species appear morphologically and ecologically similar, at least in some age classes such as juveniles, but genetically distinct [37][38][39]. Additionally, subspecies-dependent variations in toxin production and structures were found in several scorpion species before, including L. quinquestriatus [33,40].
The limited dispersal capability of scorpions leads to limited genetic diversity patterns and reduced gene flow among the species [39]. For instance, European scorpions were thought to be highly distinct from their counterparts from the North African populations [41]. More in-depth genetic analyses and diversified sampling for these populations produced better knowledge regarding the presence of cryptic and unknown species that exhibit clear genetic separation [38]. Likewise, in Egypt, the scorpion fauna distribution is highly discontinuous due to the patchy habitat distribution. This results from wide areas of highly arid deserts, interrupted by two terrestrial corridors of lower aridity, that are the Nile River plain and the Mediterranean belt [10]. DNA barcoding could provide an excellent tracking tool for these variations and hidden diversities within and among different taxa. For example, high genetic diversity and possible cryptic speciation within L. quinquestriatus populations in Egypt could be identified [12]. Application of DNA barcoding could detect a clear interspecific phylogenetic relationship among different species of the genus Buthacus (Scorpiones: Buthidae) in Egypt and Saudi Arabia [42]. Genetic and morphological analyses elucidated that the Australian endemic scorpion Urodacus yaschenkoi (Scorpiones: Urodacidae) is a species complex [43]. Morphogenetic identification and ecological niche modeling were proven crucial to delimit the boundary of the Chinese Przewalski's scorpion Mesobuthus martensii (Scorpiones: Buthidae) in arid regions of China and Mongolia [44].
In the current study, comparison of our morpho-genetically authenticated samples and the references in the GenBank database exhibited several discrepancies. For instance, a sequence of Buthacus arenicola showed a high identity level with our L. quinquestriatus. Additionally, our O. inessi exhibited 91% similarity with a sample deposited in the GenBank under the same species designation. Improper scorpion species identification has led to the presence of misidentified species with GenBank accessions (for example, see Reference [45]). Furthermore, the unexpected phylogenetic placements of L. quinquestriatus, A. amoreuxi, and O. innesi samples can either detect the presence of cryptic species, or some morphological identification errors during sampling owing to external morphological similarities. In general, misidentification and errors in classification/identification are very common in scorpions, even in the specialized literature [46]. This can eventually lead to confusion and inadequacy in the treatment of problems caused by dangerous scorpion species [46]. Several studies reported misidentifications of the species analyzed in the current study, despite some of them being notorious, highly toxic species. These misidentifications seem to affect both species records and barcode databases' registration. Hendrixson et al. (2006) [47] reported a misidentification of A. amoreuxi from Medina in Saudi Arabia early in the 20th century. Lourenço (2020) [48] referred to misidentified samples of L. quinquestriatus that were collected from Mali and Algeria, i.e., out of the natural range of that species. Orthochirus innesi, whose geographical range is naturally in the North of Africa in close relation to oases, has been referred to in some works as a Sudanese species, despite not having been confirmed [49].
Moreover, we could identify a trend among species for taxon-specific patterns of purine-containing nucleotide, which can be suggested as a taxonomic criterion for characterizing different orders belonging to the class Arachnida. Scorpions exhibited the highest percentage of guanine in the COI barcode. Despite this being the first time, to the best of the authors' knowledge, that this issue in scorpions has been reported, some reports in other animal groups identified this type of deviation as a reflection of separate divergence of these groups from their ancestors and sister groups [50][51][52]. More future work is expected to ensue from this finding in order to assess the taxonomic significance of this guanine-skewed composition of COI barcode region in scorpions among all arachnid taxa.

Conclusions
In conclusion, we provided morphologically authenticated DNA barcodes for four scorpion species in Egypt, two of which are of direct medical importance and health hazards. The provided barcodes can be considered of high international importance for appropriate calibration of DNA barcoding databases, especially secondary to some discrepancies that could be identified there. Possibilities of cryptic speciation leading to morphological misidentifications can be strongly suggested in light of the present findings. Appearance of guanine as the most prevalent base within the COI gene barcode region and in a manner that was significantly different among the three free-living arachnid orders suggests the need for more future work in identifying its taxonomic significance. A direct recommendation for the current study is to provide more comprehensive studies for morphogenetic authentication of different scorpion species, especially in a region of the world where high diversity of this animal group is present.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/ 10.3390/conservation1030018/s1, Table S1: Genetic pairwise distances between different samples analyzed in the current study and their BOLD/GenBank references.  Data Availability Statement: All data are available upon request.

Conflicts of Interest:
The authors declare no conflict of interest.