Revisiting the Diversity of Barbonymus (Cypriniformes, Cyprinidae) in Sundaland Using DNA-Based Species Delimitation Methods

Biodiversity hotspots often suffer from a lack of taxonomic knowledge, particularly those in tropical regions. However, accurate taxonomic knowledge is needed to support sustainable management of biodiversity, especially when it is harvested for human sustenance. Sundaland, the biodiversity hotspot encompassing the islands of Java, Sumatra, Borneo, and Peninsular Malaysia, is one of those. With more than 900 species, its freshwater ichthyofauna includes a large number of mediumto large-size species, which are targeted by inland fisheries. Stock assessment requires accurate taxonomy; however, several species groups targeted by inland fisheries are still poorly known. One of those cases is the cyprinid genus Barbonymus. For this study, we assembled a consolidated DNA barcode reference library for Barbonymus spp. of Sundaland, consisting of mined sequences from BOLD, as well as newly generated sequences for hitherto under-sampled islands such as Borneo. A total of 173 sequences were analyzed using several DNA-based species delimitation methods. We unambiguously detected a total of 6 Molecular Operational Taxonomic Units (MOTUs) and were able to resolve several conflicting assignments to the species level. Furthermore, we clarified the identity of MOTUs occurring in Java.


Introduction
Sundaland, comprising the islands of Java, Bali, Sumatra, Borneo, and peninsular Malaysia, constitutes one of the world's largest biodiversity hotspots [1,2]. With circa 900 freshwater fish species, half of which are endemic, the ichthyofauna of this biogeographical region is particularly rich, with a density of 0.8 species per km 2 , a value twice as large as that observed in Brazil and the Democratic Republic of Congo [3]. This large diversity is critically threatened, mostly due to the alarming rate of deforestation over the past few decades [4][5][6], in conjunction with pollution [7] and watershed fragmentation through the development of dams for irrigation and hydroelectric power [8]. Furthermore, the diversity of freshwater fishes in Sundaland is still not sufficiently understood [9], hampering conservation efforts. DNA barcoding, the use of cytochrome oxidase I (COI) as a species tag for automated identification, opened new perspectives for the characterization of Sundaland's ichthyofauna by helping to clarify taxonomic confusion within several groups [15,21,22], by identifying discrepancies in historical species records [9] and by detecting a substantial amount of morphologically similar, yet highly divergent lineages (i.e., cryptic diversity) within numerous species [9,15,[21][22][23][24][25][26][27][28]. Several molecular studies that aimed at characterizing patterns of genetic diversity in Barbonymus led to conflicting species identities associated with sequences submitted to international repositories [9,16,[29][30][31][32][33][34][35][36].
As part of an ongoing project that seeks to build a DNA barcode reference library for the ichthyofauna of Sundaland [9,22], we generated new barcode records for Barbonymus species, which, together with previously published sequences, cover the diversity of the genus in the region. The objective of the present study is to re-examine Barbonymus species boundaries and their distribution ranges using DNA-based species delimitation methods. By including DNA barcode records of specimens collected near type localities, we are also reappraising published Barbonymus sequences.

Sampling and Collection Management
Specimens were captured using various methods including electrofishing, seine nets, cast nets and gill nets, as well as by visiting fish markets in Sundaland ( Figure 1; DS-BARBONYM, dx.doi.org/10.5883/DS-BARBONYM, accessed on 15 January 2021). Specimens were photographed and individually labeled, and voucher specimens were preserved in a 5% formalin solution. A fin clip or a muscle biopsy was taken for each specimen and fixed in a 96% ethanol solution for further genetic analyses. Both tissue and voucher specimens were deposited in the national collections at the Museum Zoologicum Bogoriense (MZB) in the Research Centre for Biology (RCB) of the Indonesian Institute of Sciences (LIPI).

Sampling and Collection Management
Specimens were captured using various methods including electrofishing, seine nets, cast nets and gill nets, as well as by visiting fish markets in Sundaland ( Figure 1; DS-BARBONYM, dx.doi.org/10.5883/DS-BARBONYM, accessed on 15 January 2021). Specimens were photographed and individually labeled, and voucher specimens were preserved in a 5% formalin solution. A fin clip or a muscle biopsy was taken for each specimen and fixed in a 96% ethanol solution for further genetic analyses. Both tissue and voucher specimens were deposited in the national collections at the Museum Zoologicum Bogoriense (MZB) in the Research Centre for Biology (RCB) of the Indonesian Institute of Sciences (LIPI).

Sequencing and International Repositories
Genomic DNA was extracted from the muscle tissue samples using a Qiagen DNeasy 96 tissue extraction kit following manufacturer's specifications. A 651 bp segment from the 5′ region of the cytochrome oxidase I gene (COI) was amplified using the primer cocktail C_FishF1t1/C_FishR1t1 [37]. Polymerase Chain Reaction (PCR) amplifications were done on a Veriti 96-well Fast thermocycler (ABI-Applied Biosystems) with a final

Sequencing and International Repositories
Genomic DNA was extracted from the muscle tissue samples using a Qiagen DNeasy 96 tissue extraction kit following manufacturer's specifications. A 651 bp segment from the 5 region of the cytochrome oxidase I gene (COI) was amplified using the primer cocktail C_FishF1t1/C_FishR1t1 [37]. Polymerase Chain Reaction (PCR) amplifications were done on a Veriti 96-well Fast thermocycler (ABI-Applied Biosystems) with a final volume of 10.0 µL containing 5.0 µL buffer 2X, 3.3 µL ultrapure water, 1.0 µL each primer (10 µM), 0.2 µL enzyme Phire Hot Start II DNA polymerase (5U), and 0.5 µL of DNA template (~50 ng). The following thermocycler regime was used: initial denaturation at 98 • C for 5 min followed by 30 cycles denaturation at 98 • C for 5 s, annealing at 56 • C for 20 s, and extension at 72 • C for 30 s, followed by a final extension step at 72 • C for 5 min. PCR products were purified with ExoSap-IT (USB Corporation, Cleveland, OH, USA) and sequenced in both directions. Sequencing reactions were performed at the Centre for Biodiversity Genomics, University of Guelph, Canada, using the BigDye Terminator v3.1 Cycle Sequencing Ready Reaction kit following standard protocols described in [38]. Sequencing was performed on an ABI 3730xl capillary sequencer (Applied Biosystems). Sequences and collateral information were deposited on BOLD [39] and are available as a public data set (dx.doi.org/10.5883/DS-BARBONYM, accessed on 15 January 2021, Table S1).
Both the mPTP algorithm and the GYMC use phylogenetic trees as input files. We reconstructed a maximum likelihood (ML) tree for the former using RAxML [51] based on a GTR + I + Γ substitution model. For the GYMC algorithm, we calculated an ultrametric, fully resolved tree using the Bayesian approach implemented in BEAST 2.6.2 [52]. Sequences were collapsed into haplotypes prior to reconstructing the ultrametric tree using RAxML, and Bayesian reconstruction was based on a strict-clock prior of 1.2% per million year [53]. Two Markov chains of 20 million each were run independently using Yule pure birth and GTR + I + Γ substitution models. Trees were sampled every 5000 states, after an initial burn-in period of 5 million. Both runs were combined with trees resampled every 20,000 states using LogCombiner 2.6.2, and the maximum credibility tree was constructed using TreeAnnotator 2.6.2 [52].
A final COI gene tree was reconstructed using the SpeciesTreeUCLN algorithm of the StarBEAST2 package [54]. This approach implements a mixed-model including a coalescent component within species and a diversification component between species that allows accounting for variations of substitution rates within and between species [55]. SpeciesTreeUCLN jointly reconstructs gene trees and species trees and therefore requires the designation of species, which were determined using the consensus of our species delimitation analyses. The SpeciesTreeUCLN analysis was performed with the same parameters as mentioned above.
Kimura 2-parameter (K2P) [56] pairwise genetic distances were calculated using the R package Ape 5.4 [57]. Maximum intraspecific and nearest neighbor genetic distances were calculated from the pairwise K2P distance matrix using the R package Spider 1.5 [58]. Haplotype extraction and haplotype network reconstruction were performed for the most widespread species using the R package pegas 1.0 [59].

Results
The total of 173 DNA barcodes used for this study comprised 154 sequences mined from BOLD and 19 sequences generated for Barbonymus specimens originating from Sumatra and Borneo. The newly generated sequences represent the first DNA barcode records of Barbonymus for Borneo. All the sequences were above 500 bp in length and no stop codons were detected, suggesting that the sequences collected represent functional coding regions. DNA-based species delimitation methods resulted in congruent delimitation schemes with 6 MOTUs for mPTP, sPTP, ABGD, RESL, and sGMYC ( Figure 2; Table S1). However, mGMYC resulted in the delimitation of 31 highly incongruent MOTUs. Therefore, the mGMYC partitioning scheme was discarded. The final consensus scheme consisted of six MOTUs (Figure 2; Table 2) showing a distinct barcoding gap, which is defined as the lack of overlap between maximum intraspecific and minimum interspecific genetic distance. Maximum intraspecific distances ranged from 0 (BOLD:ADN2907, BOLD:AED2516) to 0.018 (BOLD:AAD1940) ( Table 1). Minimum interspecific distances ranged from 0.026 for the two Barbonymus MOTUs (BOLD:AAE2136 and BOLD:AEB4313) to 0.08 for BOLD:AAD1940 (Table 1).   Conflicting species-level assignments were detected, particularly for previously published records from Java [9], where BIN BOLD:ADD1940 and BOLD:AAU0688, initially assigned to B. balleroides and B. gonionotus, match B. gonionotus and B. schwanefeldii, respectively (Table S1). These results extend the occurrence of B. schwanefeldii to Java Island and question the occurrence of B. balleroides in Java (Figure 3). Along the same line, the BIN BOLD:AED2516, initially assigned to B. gonionotus and highlighted as a potentially new taxon [16], likely corresponds to B. belinka, because B. collingwoodii is endemic to North Borneo, B. mahakkamensis corresponds to a distinct lineage (BOLD:ADN2907, Figure 2) restricted to East Borneo (Figure 3), and the occurrence of B. balleroides in Sumatra is uncertain (Text S1).  (Figure 2). This group is more closely related to MOTUs of B. altus from mainland Asia, with a MRCA dated at 4.5 Ma, than the MOTU assigned to B. gonionotus, which diverged from other Barbonymus MOTUs about 6 Ma.
Intraspecific phylogeographic patterns were further explored for B. gonionotus and B. schwanefeldii using sequences with revised species assignment (Figure 4). A total of 40 haplotypes was observed for both B. gonionotus and B. schwanefeldii. Haplotype networks were markedly different for both species, with a star-like structure for B. schwanefeldii ( Figure 4B) and a more scattered network for B. gonionotus ( Figure 4A). Most islands of Sundaland host haplotypes scattered across networks for both species; however, mainland Asia is much more represented in the haplotype network of B. schwanefeldii ( Figure 4B) than in B. gonionotus ( Figure 4A). schwanefeldii using sequences with revised species assignment (Figure 4). A total of 40 haplotypes was observed for both B. gonionotus and B. schwanefeldii. Haplotype networks were markedly different for both species, with a star-like structure for B. schwanefeldii ( Figure 4B) and a more scattered network for B. gonionotus ( Figure 4A). Most islands of Sundaland host haplotypes scattered across networks for both species; however, mainland Asia is much more represented in the haplotype network of B. schwanefeldii ( Figure 4B) than in B. gonionotus ( Figure 4A).

Discussion
The present study provides an update of the Barbonymus diversity in Sundaland through the aggregation of newly generated and recently published DNA barcode records, resulting in a reference library consisting of 173 sequences largely distributed across mainland Asia and Sundaland (the haplotype MK978151 was observed in five individuals in [16], resulting in a dataset in BOLD consisting of 169 sequences). DNA-based species delimitation methods largely agreed on the delineation of six MOTUs, except mGMYC with a much higher estimate, a fact that was already reported in other studies [22,27,46,48,60]. Aside from this noticeable exception, methods were concordant in their delimitations, and a barcoding gap was observed between all six MOTUs, with maximum K2P distances mostly <1% within MOTUs, thereby falling into previously observed ranges for Cypriniformes in Sundaland [9,16,21,22]. Along the same line, inferred ages of divergence among MOTUs are consistent with previous age estimates among Sundaland freshwater fishes, with most species originating during the last 5 Ma [21,22,26,28].
Discrepancies between BIN and species designation were observed within three MO-TUs, corresponding to (1) BOLD:ADD1940 with sequences assigned to both B. gonionotus and B. balleroides, (2) BOLD:AAU0688 with sequences assigned to both B. gonionotus and B. schwanefeldii, and (3) BOLD:AED2516 with sequences assigned initially to B. gonionotus. Most discrepancies could be related to B. gonionotus, which appears scattered across these three MOTUs.
Most sequences of BOLD:ADD1940 from mainland Asia were attributed to B. gonionotus [29,32,33,36], while sequences from Sundaland were called B. balleroides [9,16]. Sequences from BOLD:AAU0688 from Java were assigned to B. gonionotus, while previously published and newly generated sequences from Sumatra and Borneo were assigned to B. schwanefeldii. These results suggest multiple misidentifications or large-scale introgressive hybridization between B. gonionotus and B. schwanefeldii, as the former is not reported to occur in Java [3,[18][19][20]. While hybridization and introgression have been previously reported for Cypriniformes [61][62][63], such large-scale mitochondrial introgressions have never been reported for Sundaland fishes [22,23,25]. Along the same line, shared polymorphism through recent common ancestry might be responsible for these discrepancies; however, as inferred by this study, B. gonionotus is the most divergent species in Sundaland, and ancestral polymorphisms are usually detected across whole species distribution ranges [64], which is not the case here. The most likely hypothesis is that B. schwanefeldii occurs in Java, extending its range of distribution over all islands of Sundaland. Cases of translocation of B. schwanefeldii outside its distribution range have been reported, e.g., to Papua New Guinea [65], as a strategy to improve local fisheries by introducing species with fast growth rates. Numerous introductions have been reported in Java of both Sundaland species, non-native to Java, and exotic species [9]. This makes an introduction of B. schwanefeldii through translocations from either Sumatra or Borneo populations likely. However, the haplotype network of BOLD:AAU0688 indicates that Java specimens consist of mostly private haplotypes (Figure 4). If the B. schwanefeldii occurrence in Java resulted from recent introductions, most haplotypes would be shared with other populations in Sumatra and Borneo, which is not the case here. Furthermore, cases of MOTUs widely distributed in Sundaland were previously detected [15,28] and most BOLD:AAU0688 specimens from Java initially identified as B. gonionotus are juveniles and sub-adults, which makes misidentifications likely. These results suggest that Java populations previously assigned to mboxemphB. gonionotus actually correspond to B. schwanefeldii (Figure 3).
The present study further questions the occurrence of B. balleroides in Java, which was comprehensively sampled recently [9,21], resulting in the discovery of two distinct Barbonymus MOTUs. The MOTU BOLD:AAD1940 was sampled at the type locality of B. gonionotus in Surabaya (Text S1, Figure 3), and all mainland Asia samples of this MOTU were previously assigned to B. gonionotus. This result suggests that Barbonymus populations previously assigned to B. balleroides [9,16,20] actually belong to B. gonionotus (Figure 4).
The MOTU BOLD:AED2516 likely corresponds to B. belinka. Barbonymus sunieri, B. strigatus, and B. platysoma have been described based on single specimens from either Java or North Borneo, none of which have been observed for decades. In addition, B. collingwoodii is endemic to North Borneo and B. mahakkamensis belongs to a distinct lineage (BOLD:ADN2907, Figure 2) restricted to East Borneo (Figure 3). The type locality of B. balleroides is unknown, but the holotype refers to the Indo-Australian region (Text S1). However, synonymies suggest ample distribution of B. balleroides in Java and Borneo, though its occurrence in Sumatra remains to be confirmed. Thus, B. belinka is the only name available for this MOTU in Sumatra, considering known distribution ranges of recently observed Barbonymus species.
A single case of unrecognized diversity is detected within B. altus, with two MOTUs detected including BOLD:AAE2136 and BOLD:AEB4313. The MOTU BOLD:AEB4313 corresponds to a singleton mined from GenBank and originating from a specimen sampled in the Mekong River (Vietnam), which is consistent with the identification as B. altus considering its type locality is the neighboring Chao Phraya River in Thailand. Raw electropherograms are not available for this sequence, which makes its quality assessment impossible; however, its placement within B. altus seems to confirm that this singleton does not result from poor sequence quality.

Conclusions
The present study confirms the utility of DNA barcoding for clarifying species identity and distribution ranges in cases of conflicting records. This is particularly evident in Java, where large conflicts between historical records and recent reappraisals based on DNA sequences were recently detected [9,21,66], suggesting large knowledge gaps. The lack of historical records for B. schwanefeldii in Java seems to indicate perpetuated misidentifications of Barbonymus populations. Individual translocations of B. schwanefeldii from Sumatra and Borneo to Java alone fail to account for its new occurrence. Our study suggests that both MOTUs reported from Java correspond to B. gonionotus and B. schwanefeldii and highlights the degree to which Barbonymus species are morphologically similar and difficult to distinguish based on meristic counts alone. However, DNA barcodes are clustered into MOTUs, which are unambiguously captured by most DNA-based species delimitation methods. Our revised DNA barcode reference library opens new perspectives for the management of the inland fisheries of Sundaland by enabling fast and reliable species Diversity 2021, 13, 283 9 of 12 level identification of Barbonymus spp. Considering the importance of Barbonymus spp. in local artisanal fisheries, and the difficulty of performing stock assessments at species level due to overlapping meristic counts among species, this library can be readily used as an alternative.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/ 10.3390/d13070283/s1, Table S1: Results of the genetic species delimitation analyses. Text S1: Nomenclature of the ten nominal species of Barbonymus following Eschmeyer et al., 2018.