Revisited Molecular Phylogeny of the Genus Sphaerotheca (Anura: Dicroglossidae): The Biogeographic Status of Northernmost Populations and Further Taxonomic Changes

: Heretofore, the populations of the genus Sphaerotheca Günther, 1859 (Dicroglossidae) in their northern and western borders laying in Pakistan have been assigned to two species, S. breviceps (Schneider, 1799) and S. strachani (Murray, 1884). The genus originated in the Oriental zoogeographic region and comes to close geographic proximity with the Palearctic region in Pakistan. The recent molecular studies have on one hand restricted the distribution range of S. breviceps to the eastern coastal plains of India and, on the other hand, revealed the northern- and westernmost population in India as a separate species, S. pashchima Padhye et al., 2017. This species has recently been synonymized with S . maskeyi (Schleich and Anders, 1998). These taxonomic changes, however, warranted the need for validation of Pakistani Sphaerotheca based on genetic data. We sequenced and analyzed 16S rRNA mitochondrial gene from specimens originating from the Himalayan foothills of Pakistan and compared these with all available GenBank sequences of the genus. Based on this data, we conclude that the Himalayan foothills of Pakistan are occupied by S. maskeyi . Simultaneously, we bring the ﬁrst record of this species for the Palearctic region. We further suggest that more genetic material from across Pakistan is required to ascertain the validity of S. strachani and for the phylogeographic evaluation of western and northern border populations of the genus. Our geographically wide and revisited molecular phylogeny shows that the genus exhibits genetic diversity suggesting further taxonomic changes. The low level of genetic divergences between S . breviceps and S . magadha Prasad et al., 2019 compared to other species of the genus, indicates that the taxonomic status of S . magadha is questionable. Moreover, we found that S . magadha and S . swani (Mayers and Leviton, 1956) are genetically conspeciﬁc with S . breviceps and both should be thus considered its junior synonyms. On the other hand, S . dobsonii and populations from Myanmar need further detailed investigations.


Introduction
The genus Sphaerotheca Günther, 1859 represents medium-sized frogs of the family Dicroglossidae and comprises currently ten morphologically similar taxa distributed in Pakistan, India, Nepal, Bhutan, Myanmar, Bangladesh, Sri Lanka, and probably Maldives [1]. In Pakistan, they are known from the Hab and Malir River valleys in Karachi, the Himalayan foothills and Potwar Plateau, and the Cholistan Desert [2,3]. The genus originated in the Oriental zoogeographical region and the territory of Pakistan represents the northern-and westernmost limit of the overall range of the genus [2,3]. Sphaerotheca frogs have high species diversity in the Indian subcontinent [4] and, due to descriptions of various taxa, changing taxonomy, and high genetic diversity with an unclear distribution, they are objects of several studies [5][6][7][8].
The species status of the genus in the Pakistani territory (the northern and western range borders) is unclear with the presumption that these populations are not conspecific to the species reported from the country. Khan [2] and Masroor [3] mentioned the taxon Sphaerotheca breviceps (Schneider, 1799) for the Pakistani territory with its type locality in Tranquebar, Tamil Nadu, India [9]. Moreover, Pakistan is also known to harbor S. strachani (Murray, 1884) from the type locality Sindh (Mulleer = Malir, near Karachi), with so far, unclear taxonomic status. However, the overall distribution and the diversity of the genus based on genetic data from India and Nepal suggest that the Pakistani populations should not be assigned under previously mentioned taxa [4,5,7,8]. Therefore, we bring the first genetic data of the genus Sphaerotheca from northern Pakistan ( Figure 1) to provide arguments for its genetic affiliation that could resolve the taxonomy. Simultaneously, our most updated and revised phylogeny of the genus provides a novel view on relationships among these frogs with further taxonomic consequences.

Materials and Methods
Two adult specimens (DJ9470-71) were collected in mid-September 2019 during a night survey in Shah Alam Baba, Khyber Pakhtunkhwa province, Pakistan (34.73 • N, 72.09 • E, 883 m a.s.l. and 34.72 • N, 79.10 • E, 1013 m a.s.l., respectively; Figures 1 and 2). These frogs were found in the agricultural area close to the direction of a dry river valley. The specimens were photographed alive (Nikon D810, and 105 mm macro lens) and subsequently euthanized and preserved in 70% ethanol. Before preservation, a tongue was taken as a tissue sample and transferred into 96% ethanol, kept at ambient temperature during the time of the fieldwork, and later stored at −20 • C.  Table S1. DNA was extracted using the DNeasy Tissue Kit (QIAGEN) following the manufacturer's protocols. A fragment of the mitochondrial 16S region (~600 bp) was amplified using the primers 16sar (5 -CGCCTGTTTATCAAAAACAT-3 ) and 16br (5 -CCGGTCTGAACT CAGATCACGT-3 ). Purified PCR products were sequenced in both directions by Macrogen (Amsterdam, the Netherlands). Sequences were revised manually and aligned along with sequences from the NCBI GenBank based on their secondary structures using RNAsalsa 0.8.1 [10] and the ribosomal structure model of Bos taurus downloaded from www.zfmk.de/ en/research/research-centres-and-groups/rnasalsa (accessed on 1 March 2021) (for Gen-Bank accession numbers, see Supplementary Table S1). To reconstruct the phylogeny of the genus Sphaerotheca, we included all available 16S sequences from the NCBI GenBank database (~560 bp; Table S1) and made maximumlikelihood (ML) and Bayesian inference (BI) trees using RAxML v.8.2.12 [11] and MrBayes v.3.2.6 [12], respectively. We ran RAxML with the GTRGAMMA model and 1000 bootstrap replicates. In the Bayesian analysis, we assigned the doublet model (16 × 16) proposed by [13] to the rRNA stem regions. For this procedure, unambiguous stem pairs were derived based on the consensus structure from RNAsalsa and specified in the MrBayes input file. For the analysis of the remaining positions, the standard 4 × 4 option was applied using a GTR evolutionary model. MrBayes was run for 15 million generations, with four parallel Markov chain Monte Carlo simulations and four chains (one cold and three heated), sampling trees every 500th generation and using a random tree as a starting point. Inspection of the standard deviation of split frequencies after the final run as well as the effective sample size value of the traces using Tracer v. 1.7.1 [14] indicated convergence of Markov chains. The first 25% of the samples of each run were discarded as burn-in. Based on the sampled trees, a consensus tree was produced using the sumt command in MrBayes. To support the tree phylogeny and show phylogenetic clustering of defined clades, we performed a Principal Component Analysis (PCA) based on allelic data (as frequencies) of the same dataset we used for tree analysis (without outgroups) using R package Adegenet [15] implemented in the R statistical environment [16]. Prior to the PCA, missing data were replaced by "NA".
To assess the evolutionary distance between our samples and the Sphaerotheca clades recovered from the phylogenetic analysis, uncorrected p-distances were calculated between the clades using Mega-X v.11.0 [17] with the pairwise deletion option, 1000 bootstrap replications, and by considering both transitions and transversions.
Finally, a median-joining haplotype network was created with PopART [18].

Results
Mitochondrial 16S rRNA data confirmed that both the analyzed sequences from the Pakistani territory cluster with the clade associated to S. maskeyi (Schleich and Anders, 1998) (Figures 2 and 3). This represents the first published record of the species from the Palearctic region and the first-ever published and genetically supported record for the Khyber Pakhtunkhwa province, Pakistan. This clade is distributed from western, central, and northeastern India, and Nepal with the northwestern-most recorded distribution in Pakistan ( Figure 2A). It forms a genetically diversified clade, showing significant intra-clade variability ( Figure 3). However, this variability is not high (0.28%; Figure 2B, Table S2). The haplotype network also showed little variation within sequences of S. maskeyi, whereas Pakistani sequences share the haplotype with populations of the species from northern India and Nepal. This haplotype is separated by only one mutation step from the central haplotype of all other investigated populations of S. maskeyi ( Figure 2B).
Overall, in the 16S phylogenies (ML and BI) of the analyzed sequences, we detected six well-supported phylogenetic clades, named as follows ( (Dubois, 1983)) and southern and central India; Sphaerotheca sp., from Myanmar, probably representing an unknown taxon (p-distance more than 7% from other clades; Table 1); dobsonii from the western Indian coast; bengaluru, represented by sequence data from the type locality of S. bengaluru Deepak et al., 2020 (Table 1, Figure 3 and Figures S1 and S2)). The topology of the studied sequences is similar in both analyses, but it is not well-resolved in the ML tree (see the position of S. dobsonii in Figure S1). In the BI tree, two main groups of clades are inferred, forming sister position (i) pluvialis + Sphaerotheca sp. + dobsonii + breviceps, and (ii) bengaluru + maskeyi. These detected sequence groups (except Sphaerotheca sp.) present an admixed phylogeographic pattern ( Figure 2). The highest number of clades was found in central and southern India and their overall interclade genetic p-distances reached more than 10% (Table 1 and Table S2). The resulting topology and clade support ( Figure 3 and Figure S1) confirmed the validity of the following species: S. bengaluru, S. breviceps, S. dobsonii, S. maskeyi, and S. pluvialis. In accordance with the detected phylogeny, the PCA analysis revealed six distinct clusters ( Figure S2), with S. dobsonii showing a sequence out of the confidence interval. The sequence with accession number GU191122 from GenBank, which had been assigned as S. rolandae, is nested in the clade of S. maskeyi (genetic distance~1%, Table 1) and is apparently erroneously identified as S. maskeyi. Sequences of S. magadha and S. swani (breviceps clade) formed an intra-clade lineage of S. breviceps with a p-distance of less than 2% and should be thus treated as its junior synonyms. Similarly, according to our analysis, the recently described S. pashchima is conspecific with S. maskeyi.

Discussion
Herein, we provide a mitochondrial phylogeny and the first genetic data of the genus Sphaerotheca from Pakistan. We found that the population inhabiting the Himalayan foothills is assigned to S. maskeyi and, for the first time, we provide evidence for the presence of the species in the Palearctic region ( Figure 2) and in the Khyber Pakhtunkhwa province of Pakistan (although the collection of the Pakistan Museum of Natural History contains specimens from Mardan [PMNH 171] and Mansehra [PMNH 2160] of that province, collected before this study). The distribution and genetic pattern of S. maskeyi clade provide indications in support of a possible colonization through the Indian subcontinent with the northernmost geographic limits in Pakistan. If true, this would be in contrast with the Oriental genus Microhyla (Microhylidae), which is limited, according to the current knowledge, only to Oriental parts of Pakistan [19]. On the other hand, both these genera show a similar phylogeographic pattern displaying low genetic variability in river plains of northern Indian subcontinent. This might be the result of missing geographic barriers in the lowlands of northern India and southeastern Pakistan as previously suggested [2,19]. However, current sampling remains insufficient to draw final conclusions on the biogeography of S. maskeyi but urges further studies for a future understanding of the species' evolutionary history.
In the literature, two taxa of the genus have been reported from Pakistan, i.e., S. breviceps and S. strachani. The former has been reported from the Oriental part of the Himalayan foothills, Cholistan desert, and around Karachi [2], whereas the latter only from southern Pakistan [20]. The widely distributed species S. breviceps has long been considered a species complex [9,21]. Consequently, the western Indian population of the genus Sphaerotheca from Maharashtra, Gujarat, and Karnataka has been recently described as S. pashchima [5]. These authors also suggested the presence of this taxon in the state of Rajasthan. Recent molecular studies have thus unraveled the long-standing mystery in restricting the distribution range of S. breviceps to the east coastal plains of India [4,7]. However, a new study [8] from the material originating from type localities of two taxa from Nepal (S. maskeyi, S. swani) showed that S. pashchima is conspecific with S. maskeyi and thus this former taxon should be its junior synonym. From the viewpoint of Pakistani localities, the northern limit of the distribution range of S. maskeyi has now extended from Dehradun, Uttarakhand, in the Himalayan foothills [7], to northwestern Pakistan and eastern Nepal, whereas the species' southwestern limit is restricted to the region in Maharashtra and Gujarat, where it needs further confirmation [4,7].
The other little-known taxon (S. strachani) was described from Mulleer (Malir), Sindh, Pakistan [20]. Although Murray [22] was inclined to assign his specimens to S. breviceps, he was later convinced to describe them as S. strachani based on several morphological characteristics including the short longitudinal plaits on the back and the fold across the body behind the forelegs. Although several studies suggest that the taxon S. strachani is a synonym of S. breviceps [2,9,20,23], the validity of the taxon S. strachani and its possible distributional boundaries should be tested through genetic data. Nevertheless, according to current data, S. maskeyi is the species that has the westernmost distribution reaching close to the Himalayan foothills and the Sindh region of Pakistan (see the map in [3] and Figure 2 in this study). Since there are no major geographical barriers between western Indian records of S. maskeyi and the type locality of S. strachani around Karachi (see Figure 2A), we rather expect that S. strachani could be a phenotypic variant of S. maskeyi. The external variation is known throughout populations of the species range [7,8] and Figure 1 in this study. However, genetic material from Karachi, as well as other parts of southern Pakistan and northwestern India, will be of great importance in ascertaining the specific status of S. strachani.
A high level of genetic distances was detected between the clades, suggesting the existence of unrecognized taxa (e.g., Sphaerotheca sp., Figure 3). On the other hand, the GenBank sequence of S. rolandae (GU191122), which clusters within the maskeyi clade, was probably morphologically misidentified with S. maskeyi, originating from the unknown locality of India. The type locality of S. rolandae is in Kurunegala, Sri Lanka, and there are no records of this species from the Indian subcontinent (e.g., [4,7]). The recently described S. magadha [6] and S. pashchima [5] represent the intraspecific variability of S. breviceps and S. maskeyi, respectively (see Figure 3 and Figure S1,~1-2% in p-distances, Table 1,  Table S2). Thus, in the molecular phylogenetic context and regarding the genetic distances of other members of the genus, these taxa, including S. swani (breviceps clade), should not be considered separate species [8]. However, the genus requires further integrative research.
Given the current taxonomy of the genus and the fact that some species and populations might be threatened, especially near mountain ranges or at the edge of the distribution range (e.g., Pakistan), a thorough taxonomic revision-including a wide DNA sampling covering all taxa and coupled with morphological and bioacoustic data-is needed to better understand species delineation and their geographic distribution.  Table S1: List of Sphaerotheca species used in the present study, including sample ID or voucher numbers, sample localities, and GenBank accession numbers. For the locality identifier see Figure 1. Coordinates are given in decimal degrees. § = sample of photographed specimen (see Figure 1). NCBI = National Center for Biotechnology Information. Table S2: Complete pairwise p-distance matrix for all samples used in this study.   Table S1. List of Sphaerotheca species used in the present study, including sample ID or voucher numbers, sample localities and GenBank accession numbers. Locality identifier (no) refer to the Fig. 1. Coordinates are given in decimal degrees. § = sample of photographed specimen (see Fig. 1).