On the Challenge to Correctly Identify Rasboras (Teleostei: Cyprinidae: Danioninae) Inhabiting the Mesangat Wetlands, East Kalimantan, Indonesia

: Within the subfamily Danioninae, rasborine cyprinids are known as a ‘catch-all’ group, diagnosed by only a few characteristics. Most species closely resemble each other in morphology. Species identiﬁcation is therefore often challenging. In this study, we attempted to determine the number of rasborine species occurring in samples from the Mesangat wetlands in East Kalimantan, Indonesia, by using different approaches. Morphological identiﬁcation resulted in the distinction of ﬁve species ( Trigonopoma sp., Rasbora cf. hubbsi Brittan, 1954, R. rutteni Weber and de Beaufort, 1916, R . trilineata Steindachner, 1870, and R. vaillantii , Popta 1905). However, genetic species delimitation methods (Poisson tree processes (PTP) and multi-rate PTP (mPTP)) based on DNA barcodes and principal component analysis (PCA) based on homologous geometric morphometric landmarks, revealed a single cluster for Trigonopoma sp. and R . trilineata , respectively, whereas the remaining traditionally identiﬁed species were distinguished neither by DNA barcodes nor by the morphometry approach. A k-mean clustering based on the homologous landmarks divided the sample into 13 clusters and was thus found to be inappropriate for landmark data from species extremely resembling each other in morphology. Due to inconsistent results between the applied methods we refer to the traditional identiﬁcations and distinguish ﬁve rasborine species for the Mesangat wetlands.


Introduction
Sundaland (Southeastern Asia) is one of the global biodiversity hotspots [1]. About 400 freshwater fish species are endemic to the region [2], and species determination remains a major challenge especially in species-rich groups of freshwater fishes [3]. Reliable identification is crucial for management of biodiversity, and freshwater ecosystems are no exception in this respect [2,[4][5][6].
The Danau Mesangat-Kenohan Suhuwi wetlands in East Kalimantan, Indonesia is part of the Mahakam River drainage and expands over approximately 18,500 hectares. It is characterized by annual fluctuations in water level, as well as heterogeneous habitats such as peat swamps, flooded forests, and small rivers entering the Mesangat Lake [7][8][9][10]. More information is given by Staniewicz et al. [7] as well as in Sudrajat and Saleh [8].
The fish fauna of the Mahakam River, the second-largest river on Borneo, is also only poorly known. Some fish records were made in the beginning of the 20th century by

Sampling Design
The sample consisted of 946 rasborines, collected at 16 sampling sites during a survey in the Mesangat wetlands in September 2014 ( Table 1). Maps of the region are available in Staniewicz et al. [7] as well as in Sudrajat and Saleh [8]. Methods applied included dipand gill-netting, beach-seining, fish-trapping, and electrofishing. Fishes were euthanized with chlorobutanol, and fixed directly in 80% ethanol or 4% formaldehyde, when tissue samples were taken separately. Vouchers were stored in the collection of the Zoologisches Forschungsmuseum Alexander Koenig (ZFMK), Bonn, Germany, and in the collection of the Museum Zoologicum Bogoriense (MZB), Bogor, Indonesia.

Material Examined
Collection numbers (will be added to the proofs) GenBank accession numbers (will be added to the proofs)

DNA Barcoding
Genomic DNA from four to ten specimens of each species identified by classical determination was extracted from fin clips using Macherey and Nagel NucleoSpin ® Tissue kits following the manufacturer's protocol on an Eppendorf EpMotion ® pipettingroboter with vacuum manifold. The standard vertebrate DNA barcode region of the COI (cytochrome c oxidase subunit 1) was amplified using a M13 tailed primer cocktail including FishF2_t1 (5 TGTAAAACGACGGCCAGTCGACTAATCATAAAGATATCGGCAC), FishR2_t1 5 CAGGAAACAGCTATGACACTTCAGGGTGACCGAAGAATCAGAA), VF2_t1 (5 TGTAAAACGACGGCCAGTCAACCAACCACAAAGACATTGGCAC) and FR1d_t1 (5 CAGGAAACAGCTATGACACCTCAGGGTGTCCGAARAAYCARAA) [41]. PCR were performed using Qiagen Multiplex ® taq as follows: 15 min at 95 • C; 10 cycles of 35 s at 94 • C, 90 s at 52-49 • C ('touch-down') and 90 s at 72 • C followed by 25 cycles of 35 s at 94 • C, 90 s at 55 • C and 90 s at 72 • C with final elongation for 10 min at 72 • C and hold at 10 • C. Sequencing of the ExoSAP-IT (USB) purified PCR product in both directions was conducted at Macrogen Europe Laboratories with forward sequencing primer M13F (5 GTAAAACGACGGCCAGT) and reverse sequencing primer M13R-pUC (5 CAGGAAACAGCTATGAC).
In addition to 43 newly generated DNA barcodes we included 101 DNA sequences from NCBI GenBank representing 31 additional Rasbora and Trigonopoma species from the broader geographic and taxonomic region of interest. As outgroup taxon to root the phylogenetic tree we chose to include two Danio rerio specimens (JQ667529 and JQ667530). Although there are several hundred DNA barcodes available on NCBI GenBank that are labelled as Rasbora spp., we included only available sequences of those species occurring in the Mahakam drainage and those belonging to the corresponding 'species groups' by Liao et al. [31] of the a priori identified species. Data processing and sequence assembly was done with the software Geneious Pro and the Muscle algorithm used to align the DNA barcodes after manually screening for unexpected indels or stop codons [42,43]. Modeltest implemented in the MEGA 6 software was used to determine the most appropriate sequence evolution model for the given data, removing missing data with the complete deletion option [44,45]. The model with the lowest BIC scores (Bayesian information criterion) was considered to best describe the substitution pattern. According to Modeltest, the Hasegawa-Kishino-Yano model (HKY) distinguishing between the rate of transitions and transversions and allowing for unequal base frequencies best explained the resulting multiple sequence alignment [46]. A discrete Gamma distribution was applied to model rate differences among sites with five categories (+G, parameter = 0.9407) and invariable ((+I), 54.3558%) sites were allowed. All codon positions were included and positions with less than 95% site coverage were eliminated resulting in a total of 573 analysed positions in the final dataset. We generated neighbour-joining, and maximum-likelihood (ML) phylogenetic trees with 1000 bootstrap replicates to explore the phylogenetic affinities of the mitochondrial lineages [47]. The Species Delimitation Plugin for Geneious Pro was used for summarizing measures of genetic K2P distances to provide readily comparable data with other studies using this standard DNA barcoding approach [42,48]. In addition, we used the reconstructed ML-based hypothesis of the mitochondrial relationships as input for a species delimitation approach using Poisson tree processes (PTP) and the refined multi-rate PTP (mPTP) version [49][50][51]. In both versions, the aim is to find a group delimitation that maximizes the likelihood of the partition of branch lengths, in PTP using a uniform evolutionary rate (lambda) and assuming different rates for each group (species) in the newer mPTP model. The null model assumes no delimitation with all tips of the tree belonging to a single species. In PTP a p-value test decides, whether to keep the null model or reject it and use the maximum likelihood delimitation instead. Since mPTP compares models with different numbers of parameters (separate lambdas for each species), the p-value test cannot be applied and instead the Akaike information criterion is used to decide which number of groups best fits the given topology and branch lengths. In theory, this approach avoids over-splitting into too many groups [49].

Morphometric Analysis
For the morphometric analysis, only adult, perfectly fixated, and most straight specimen in a comparable number per species identified by classical determination was chosen. Thus, 159 appropriate specimens were included in the geometric morphometric analysis (Trigonopoma sp. (n = 38), Rasbora cf. hubbsi (n = 49), R. rutteni (n = 13), R. trilineata (n = 16), and R. vaillantii (n = 40)). Radiographs were made in a plane position using a Faxitron LX-60 Cabinet X-ray System. Fifteen homologous landmarks ( Figure 1) were placed on each specimen using TPSdig v 2.22 following the criteria of Zelditch et al. [52,53]. Outliers were identified using the Find outliers . . . function in MorphoJ v. 1.06d [54]. A generalized least squares Procrustes superimposition (GLS) was applied on the raw landmark data in MorphoJ v. 1.06d [54]. Pooled regressions within species were computed between GLS results and the respective log centroid size to reduce the effect of allometric growth using MorphoJ v. 1.06d [54,55]. A principal component analysis (PCA) was run in MorphoJ v. 1.06d with the residuals of the correlation to detect interspecific shape differences. Principal components (PC) describing a total variation > 5.0% were tested for significance using a one-way analysis of variance (ANOVA) and Tukey s pairwise post-hoc test in PAST v. 3.09 [54,56]. MorphoJ v. 1.06d [54,55]. A principal component analysis (PCA) was run in MorphoJ v. 1.06d with the residuals of the correlation to detect interspecific shape differences. Principal components (PC) describing a total variation > 5.0% were tested for significance using a one-way analysis of variance (ANOVA) and Tukey´s pairwise post-hoc test in PAST v. 3.09 [54,56]. To determine the number of clusters within the size-corrected landmark data, a kmean cluster analysis including all PCs describing >99.0% of the cumulative variance was computed using the find.clusters function of the 'adegenet' package in R v. 3.2.2 [57][58][59][60].
The best-supported model corresponded to the lowest Bayesian information criterion (BIC) score [59]. To calculate the membership probability of individuals to the clusters, we used a discriminant analysis of principal components (DAPC). The DAPC was computed with all PCs describing >99.0% of the cumulative variance using the dapc function of the 'adegenet' package [57][58][59][60].

Classical Species Determination
The classical identification of the collected rasborines resulted in five species: Trigonopoma sp., Rasbora cf. hubbsi Brittan, 1954, R. rutteni Weber and de Beaufort, 1916, R. trilineata Steindachner, 1870, and R. vaillantii Popta, 1905. A characteristic for Trigonopoma sp. is the narrow lateral stripe beginning at the posterior margin of the eye (versus the wide stripe in T. gracile (Kottelat, 1991)), the absence of pigments between the pectoraland anal-fin base (versus present in T. gracile), a black lateral stripe in life (versus red in life in T. pauciferatum (Weber and de Beaufort, 1916)), which reaches to the end of the median caudal-fin rays (versus to the caudal-fin base only in T. pauciperforatum), and red membranes of dorsal and caudal fin (versus hyaline in T. gracile and T. pauciperforatum) [24,29,34,61,62]. Rasbora cf. hubbsi differs from the nominal species in having a faded dark edge along the caudal fin (versus hyaline in R. hubbsi), occasional ½4/1/2½ scales along the transverse line (versus ½4/1/3½ in R. hubbsi), and an occasionally yellow coloration on dorsal and/or anal fin (vs. hyaline in R. hubbsi) ( Table 2) [29,34]. Within the present sample R. cf. hubbsi (431 specimens) and R. vaillantii (438) were the dominant species, whereas Trigonopoma sp. (44), R. rutteni (14), and R. trilineata (19) were rather rare. To determine the number of clusters within the size-corrected landmark data, a kmean cluster analysis including all PCs describing >99.0% of the cumulative variance was computed using the find.clusters function of the 'adegenet' package in R v. 3.2.2 [57][58][59][60].
The best-supported model corresponded to the lowest Bayesian information criterion (BIC) score [59]. To calculate the membership probability of individuals to the clusters, we used a discriminant analysis of principal components (DAPC). The DAPC was computed with all PCs describing >99.0% of the cumulative variance using the dapc function of the 'adegenet' package [57][58][59][60].

DNA Barcoding
Analyses of the COI barcode region distinguish only three genetic clusters among 43 Mesangat rasborines, taken from all the five morphological species groups (Figure 2). Specimens identified as Rasbora trilineata and Trigonopoma sp. form a respectively distinct clade, and both show significant evolutionary distances to the rasborine species included in our analysis and available from GenBank: R. trilineata differs by a nearest neighbour distance (NND) of 4.2% K2P from the Rasbora trilineata specimen with GenBank accession EF452883 used in the study of Mayden et al. [63] (Figure 2) and Trigonopoma sp. from the Mesangat area differs by a NND of 15.9% K2P from Trigonopoma gracilis (Figure 2). One third of the mitochondrial clades is comprised of specimens identified as R. cf. hubbsi, R. rutteni, and R. vaillantii, with no pattern pointing towards species distinction (Figure 2). With 3.4% K2P NND, the complex is in our analysis closest to specimens of Rasbora argyrotaenia from Java. The model-based species delimitation approaches delivered two different estimates for the total species number present in the data: PTP detected 33 entities (p = 0.001, null-model score: 309.844573, best score for single coalescent rate: 376.530240), mPTP a total of 19 putative species (null-model score: 309.844573, best score for multi coalescent rate: 309.844573) (Figure 2). Concerning the Mesangat rasborines, PTP groups all R. trilineata into one unit (and the remaining ones into four), Trigonopoma sp. into one, and, as expected, R. cf. hubbsi, R. rutteni, and R. vaillantii together into one entity. With respect to the focus group, the mPTP outcome differs in that all included R. trilineata are grouped into one species, and that the unit with R. cf. hubbsi, R. rutteni, and R. vaillantii now also contains R. borapetensis, R. dusonensis, and R. argyrotaenia.

Morphometric Analysis
The PCA differentiates Trigonopoma sp. (n = 38) and Rasbora trilineata (n = 16) from other species (Figure 3). There are no significant differences between R. cf. hubbsi (n = 49), R. rutteni (n = 13), and R. vaillantii (n = 40). The first PC (35.53%) reflects a bending bias, and the deviation from averaged shape in PC4 (6.77%) reveals no significant differences in the Tukey s pairwise post-hoc test. Thus, PC1 and PC4 are not considered in further results. The second PC (20.88%) differentiates R. trilineata from all other species by having a relatively larger head as well as a more slender body (Figure 3). According to PC3 (13.32%) and PC5 (5.13%), Trigonopoma sp. can be distinguished from the four Rasbora in having a relatively smaller head and a shorter, slender caudal peduncle (PC3) as well as in having a larger distance between the pectoral-and pelvic-fin bases (PC5) (Figure 3).
In contrast, the k-mean clustering resulted in 13 groups as the most likely division of the sample according to the BIC score ( Figure 4). This divides Trigonopoma sp. into six clusters ( Figure 5), whereby 35 out of 38 specimens are spread over four clusters only represented by that species. The remaining three individuals are separated into two clusters. One is shared with R. cf. hubbsi, R. rutteni, and R. vaillantii. The second is shared with the former three species and T. trilineata. The assignment probabilities for single specimens of that species range from 87.0 to 100.0% (98.84 ± 3.17%) to its allocated cluster. Rasbora trilineata is split into four clusters ( Figure 5). Twelve out of 16 specimens build two groups consisting of six specimens from only that species, respectively. Additional two specimens are assigned to two other clusters. One cluster is shared with R. cf. hubbsi and the other is shared with all species included in that analysis. The assignment probability for all specimens of R. trilineata to its assigned cluster is 100.0%, except for one single individual with 43.92% (96.49 ± 13.57%). Specimens from R. cf. hubbsi are spread over seven clusters ( Figure 5). All these clusters are shared with other species (six clusters with R. rutteni, five clusters with R. vaillantii, two clusters with R. trilineata, and two clusters with Trigonopoma sp.). The assignment probability of specimens from R. cf. hubbsi is the lowest among the a priori-defined species (90.40 ± 17.63%). Rasbora rutteni is divided into six clusters ( Figure 5). All clusters are shared with specimens from R. cf. hubbsi. Among these clusters four clusters are shared with R. vaillantii, two with Trigonopoma sp. and two with R. trilineata. The assignment probability ranges from 88.23 to 100.00% (97.34 ± 3.56%) for single specimens of R. cf. hubbsi. Specimens of predefined R. vaillantii are split into six clusters ( Figure 5). All these clusters are shared with R. cf. hubbsi followed by five clusters shared with R. rutteni. A further two clusters are shared with Trigonopoma sp. and one cluster with R. trilineata. The assignment probability for individuals of R. vaillantii ranges from 64.91 to 100.00% (93.96 ± 9.60%).

Morphometric Analysis
The PCA differentiates Trigonopoma sp. (n = 38) and Rasbora trilineata (n = 16) from other species (Figure 3). There are no significant differences between R. cf. hubbsi (n = 49), R. rutteni (n = 13), and R. vaillantii (n = 40). The first PC (35.53%) reflects a bending bias, and the deviation from averaged shape in PC4 (6.77%) reveals no significant differences in the Tukey´s pairwise post-hoc test. Thus, PC1 and PC4 are not considered in further results. The second PC (20.88%) differentiates R. trilineata from all other species by having a relatively larger head as well as a more slender body (Figure 3). According to PC3 (13.32%) and PC5 (5.13%), Trigonopoma sp. can be distinguished from the four Rasbora in having a relatively smaller head and a shorter, slender caudal peduncle (PC3) as well as in having a larger distance between the pectoral-and pelvic-fin bases (PC5) (Figure 3). ters ( Figure 5). All clusters are shared with specimens from R. cf. hubbsi. Among these clusters four clusters are shared with R. vaillantii, two with Trigonopoma sp. and two with R. trilineata. The assignment probability ranges from 88.23 to 100.00% (97.34 ± 3.56%) for single specimens of R. cf. hubbsi. Specimens of predefined R. vaillantii are split into six clusters ( Figure 5). All these clusters are shared with R. cf. hubbsi followed by five clusters shared with R. rutteni. A further two clusters are shared with Trigonopoma sp. and one cluster with R. trilineata. The assignment probability for individuals of R. vaillantii ranges from 64.91 to 100.00% (93.96 ± 9.60%).

Discussion
Based on the traditional identification, five rasborines (Trigonopoma sp., Rasbora cf. hubbsi, R. rutteni, R. trilineata, and R. vaillantii) are present in the Mesangat wetlands. Trigonopoma sp., R. cf. hubbsi, and R. rutteni are first records for the Mahakam River drainages. The differentiation of R. cf. hubbsi from the nominal species is based on caudal fin colora-

Discussion
Based on the traditional identification, five rasborines (Trigonopoma sp., Rasbora cf. hubbsi, R. rutteni, R. trilineata, and R. vaillantii) are present in the Mesangat wetlands. Trigonopoma sp., R. cf. hubbsi, and R. rutteni are first records for the Mahakam River drainages. The differentiation of R. cf. hubbsi from the nominal species is based on caudal fin coloration (faded dark edge along the caudal-fin and yellowish vs. hyaline) and transverse scale count ( 1 2 4/1/2 1 2 vs. 1 2 4/1/3 1 2 ). Further morphological and/or genetic investigations including type or topotypic material of R. hubbsi are necessary to clarify its taxonomic status. Trigonopoma sp. is probably a new species differing from its congeners in coloration (reddish fins vs. hyaline fins) and the lateral stripe (broader in T. gracile; extending to median caudal-fin rays vs. ending at caudal-fin base in T. pauciperforatum) ( Figure 6) [24,29,34,61,62]. In addition, Trigonopoma sp. shows a substantial mitochondrial genetic distance to the other two Trigonopoma species (Figure 2). The identity of R. trilineata is supported by the traditional identification and its close molecular distance to other R. trilineata of unknown origin (e.g., material from Mayden et al. [63]). Although R. cf. hubbsi, R. rutteni, and R. vaillantii can be distinguished by morphology (Table 2), these species share the same COI haplotype (Figure 2). In addition, a canonical variate analysisbased on the geometric morphometric data including the three species supported caudal peduncle length and dorsohypural distance as suitable characters for discrimination, but also detected differences in head shape and pectoral fin position (Result S1). Possible explanations for the incongruence between morphological and genetic assignment are: (i) lacking taxonomic resolution of the DNA barcoding marker due to recent speciation (e.g., Ward [64]), (ii) hybridization leading to the admixture of mtDNA haplotypes (e.g., Herder et al. [65] and April et al. [66]), and (iii) the individuals identified here as R. cf. hubbsi, R. rutteni, and R. vaillantii are conspecific and show strong variation within the population of the Mesangat wetlands. This implies that the validity of all three taxa needs to be verified by analyzing specimens from the respective type localities, which are in reasonable distance to the Mesangat wetlands: Rivulet near Bontang, East Kalimantan, East Borneo (approx. 95 km) (R. rutteni) [35], Boh River, East Kalimantan, East Borneo (approx. 185 km) (R. vaillantii) [17], and Lahad Datu River, Sabah, North Borneo (approx. 540 km) (R. hubbsi) [29].
The k-mean clustering is a tool commonly used for objectively finding subsets [67,68]. The number of clusters revealed by k-mean is hypothetical and might be far from the real number [59,69]. Given the way lower species numbers are estimated by traditional species discrimination and genetic analyses, the number of clusters resulting from the k-mean approach detected here ( Figure 4) appears a clear case of over-splitting. It suggests that k-mean clustering might be inappropriate for distinguishing very similar species based on morphometric data-in contrast to its suitability for genetic data (e.g., Gasch and Eisen [70], Kapil et al. [68], and Wu et al. [71]).
No shape differences caused by sex or size between clusters were detected during a careful examination of vouchers by the authors. Thus, there are no indications that signal of e.g., sex or size, might affect DAPC-groups. The unclear discrimination of clusters by a DAPC rather points to possible admixture or to possible limitations of this method to barely differentiated morphometric data [59].
In general, distinguishing rasborine diversity is not a simple task; traditional determination, DNA barcoding, and recent morphometric approaches provide conflicting estimates of potential species entities. The "true" number of rasborine species inhabiting the Mesangat wetlands hence remains unclear. Until additional data or more appropriate methods are available, we keep the results of the traditional identification approach as proof for the presence of five rasborine species in the Mesangat wetlands. We are aware, that the subtle differences distinguishing R. cf. hubbsi, R. rutteni, and R. vaillantii in their descriptions (Table 2), might be based on extreme phenotypes of the same species or result from a recent, radiating species complex-issues that require additional systematic work.
[65] and April et al. [66]), and (iii) the individuals identified here as R. cf. hubbsi, R. rutteni, and R. vaillantii are conspecific and show strong variation within the population of the Mesangat wetlands. This implies that the validity of all three taxa needs to be verified by analyzing specimens from the respective type localities, which are in reasonable distance to the Mesangat wetlands: Rivulet near Bontang, East Kalimantan, East Borneo (approx. 95 km) (R. rutteni) [35], Boh River, East Kalimantan, East Borneo (approx. 185 km) (R. vaillantii) [17], and Lahad Datu River, Sabah, North Borneo (approx. 540 km) (R. hubbsi) [29]. Figure 6. (a) Trigonopoma sp. from the Mesangat wetlands differs from its congeners in coloration (reddish fins vs. hyaline fins) and the lateral stripe (thinner than in T. gracile; extending to median caudal-fin rays vs. ending at caudal-fin base in T. pauciperforatum). Picture is mirrored. (b) Rasbora cf. hubbsi from the Mesangat wetlands differs in coloration (faded dark edges along the entire caudal-fin margin and a distinct yellowish caudal fin vs. hyaline) from the nominal species. Picture is mirrored.
The k-mean clustering is a tool commonly used for objectively finding subsets [67,68]. The number of clusters revealed by k-mean is hypothetical and might be far from the real number [59,69]. Given the way lower species numbers are estimated by traditional species discrimination and genetic analyses, the number of clusters resulting from the k-mean approach detected here ( Figure 4) appears a clear case of over-splitting. It suggests that kmean clustering might be inappropriate for distinguishing very similar species based on morphometric data-in contrast to its suitability for genetic data (e.g., Gasch and Eisen [70], Kapil et al. [68], and Wu et al. [71]). Figure 6. (a) Trigonopoma sp. from the Mesangat wetlands differs from its congeners in coloration (reddish fins vs. hyaline fins) and the lateral stripe (thinner than in T. gracile; extending to median caudal-fin rays vs. ending at caudal-fin base in T. pauciperforatum). Picture is mirrored. (b) Rasbora cf. hubbsi from the Mesangat wetlands differs in coloration (faded dark edges along the entire caudal-fin margin and a distinct yellowish caudal fin vs. hyaline) from the nominal species. Picture is mirrored.

Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author. The data are not publicly available due to privacy.